First thing first, I shall explain the encoding systems for Chinese characters. There is practically only one character set and encoding system for Simplified Chinese Characters that are used in Mainland China and a few places like Singapore and Malaysia. However, there are a bunch of character sets and encoding systems for Traditional Chinese Characters that are used in Taiwan and a few other places. The most popular code, which is not defined nor supported by the government, is called Big-5. The government on Taiwan defined a Chinese National Standard (CNS) codes for information exchange in 1986 and 1992.
In order to prepare a set of Chinese Data Processing utilities that can be used by any code, I designed an internal coding system based on the CNS-1992 code. Each character, either in ASCII, Latin-1, or various Chinese codes, is converted and stored in a structure of four bytes, in fact, an integer. The data processing programs are all written in this internal code. By this means I hope my efforts can be spent on the design of data processings instead of code manipulations.
Indeed, any code can be the internal code. The reason that I don't want to create a new code is obvious: let the experts do their job. The reasons that I choose CNS are
The name of this internal coding system is called S-Code, it was designed and implemented in late 1995. Although I have some second thoughts thereafter, but since it works so I do not want to change it any more. The implementation of S-Code consists of a suite of I/O and conversion programs. The application programs are groups into following four levels.
Many upper level I/O functions need a code to specify source/destination Chinese encoding system. The available systems and their corresponding defined integer are listed below.
0 | SCODE |
---|
5 | BIG5 |
---|
Here are some examples, and also useful utilities, that are written with S-Code.
Created: Dec 27, 1995
Last Revised: Jan 14, 1996
© Copyright 1995, 1996 Wei-Chang Shann