Unicode/ISO-10646 Unified "Ideographs"

The CJK character set is derived from a merger of a set proposed by Unicode (an extension of the EACC list) and a draft for a proposed Chinese standard GB 13000. Since February 1990, it has been developed by the CJK Joint Research Group. A set comprising 20,902 characters was issued in March 1992, and is variously called

It includes all hanzi/kanji/hanja from the following sets:

The successor of the CJK-JRG is the Ideographic Rapporteur Group (IRG) of ISO SC2/WG2. The IRG is compiling lists of additional characters: HCS-B comprises about 8000 characters, and China has asked for another 1000 in addition to these. There is a lot of competition for the remaining space in UCS-2.

Han Unification

This refers to the unification of Chinese-derived characters used in China, Taiwan, Korea, Japan and Vietnam, without which a 16-bit character set would be unattainable. The idea was that two characters having the same abstract shape would be unified, regardless of meaning or font differences. Users of different languages will require different visual representations of the codes, but this is regarded as a font/language issue to be handled by declarations extraneous to the codeset. Of course, the distinction between a variant form and a font difference is sometimes subjective.

Considerations of character identity are overridden by the Source Set Separation Rule: distinct codepoints in any of the following source sets are mapped to distinct codepoints in the new set:

The importance of this is that if you translate text encoded in a national character set into Unicode and back again, you get the same text you started with. The result is a fairly conservative unification, e.g. simplified forms of most radicals make for different characters.

Ordering

The ordering of the CJK characters is based on the following dictionaries, in order of priority:

  1. Kangxi Zidian (7th ed.) Zhonghua Bookstore, Beijing, 1989.
  2. Dai Kanwa Ziten (revised ed.) Taisyuukan Syoten, Tokyo, 1986.
  3. Hanyu Da Zidian (1st ed.) Sichuan Cishu Publishing, Chengdu, 1986.
  4. Dae Jaweon (1st ed.) Samseong Publishing, Seoul, 1988.

From the Unicode book, volume 2, page 14:

When a character is found in the KangXi Zidian, it follows the KangXi Zidian order. When it is not found in the KangXi Zidian and is found in Dai Kanwa Ziten, it is given a position extrapolated from the KangXi position of the preceding character in Dai Kanwa Ziten. When it is not found in either KangXi or Dai Kanwa, Hanyu Da Zidian and Dae Jaweon dictionaries are consulted in a similar manner.

Hence the ordering is approximately Kangxi radical, then stroke count. However, a number of simplified are treated separately, and are placed immediately after the corresponding traditional radical.


Part of Notes on CJK Character Codes and Encodings.