The Cangjie method (倉頡輸入法， 仓颉输入法)—originally spelt Changjei method and sometimes called Chongkit method (based on the Cantonese pronunciation)—is a system by which Chinese characters may be entered into the computer. Invented in 1976 by Chu Bong-Foo (朱邦復, 朱邦复 Zhu Bangfu), the method is named after Cangjie, the man historically attributed with the invention of the first writing system of China. Although the input method was initially based upon Traditional Chinese characters, it has since been revamped such that interoperability between Cangjie and the Simplified Chinese character set was made possible.
Unlike pinyin, Cangjie is based on the graphological aspect of the characters wherein each basic, graphical unit is represented by a basic character component, of which there are 24 in all, each mapped to a particular letter key on a standard QWERTY keyboard. An additional "difficult character" function is mapped to the X key. Within the keystroke-to-character representations, there also exist four subsections of characters: the Philosophical Set (corresponding to the letters 'A' to 'G' and representing the elements), the Strokes Set (corresponding to the letters 'H' to 'N' and representing the brief and subtle strokes), the Body-Related Set (corresponding to the letters 'O' to 'R' and representing various parts of the human anatomy), and the Shapes Set (corresponding to the letters 'S' to 'Y' and representing complex and encompassing character forms).
The basic character components in Cangjie are usually called "radicals"; nevertheless, Cangjie decomposition is not based on traditional Kangxi radical (Chinese character)|radicals, nor is it based on standard stroke order; it is in fact a simple geometric decomposition.
Cangjie is one of the very few IME|IME's that can be found on most modern personal computers without the user having to download or install any additional software. Cangjie's widespread availability must be credited to Mr. Chu, who has allowed the public to use his invention freely and free of charge.
The keys and "radicals"
The basic character components in Cangjie are called "radicals" 字根 or "letters" 字母.
There are 24 radicals but 26 keys;
each of the 24 radicals (the basic shape 基本字形) are associated with one or more auxiliary shapes 輔助字形.
The names of the 24 radicals, as well as their associated auxiliary shapes, are mnemonic:
<table border=1 style="border-collapse: collapse">
<tr><td rowspan=7>Philosophical group
<tr><td>C<td>金 Venus (metal)
<tr><td>D<td>木 Jupiter (wood)
<tr><td>E<td>水 Mercury (water)
<tr><td>F<td>火 Mars (fire)
<tr><td>G<td>土 Saturn (earth)
<tr><td rowspan=7>Stroke group
<td>H<td>竹 bamboo<td>the Eight Principles of Yong|slant and Eight Principles of Yong|short slant, the Kangxi radical 竹
<tr><td>I<td>戈 weapon<td>the Eight Principles of Yong|dot
<tr><td>J<td>十 ten<td>the cross shape
<tr><td>K<td>大 big<td>the X shape
<tr><td>L<td>中 centre<td>the Eight Principles of Yong|vertical stroke
<tr><td>M<td>一 one<td>the Eight Principles of Yong|horizontal stroke
<tr><td>N<td>弓 bow<td>the Eight Principles of Yong|crossbow and the Eight Principles of Yong|hook
<tr><td rowspan=4>Body parts group
<td>O<td>人 person<td>the Eight Principles of Yong|dismemberment, the Kangxi radical 人
<tr><td>P<td>心 heart<td>the Kangxi radical 心
<tr><td>Q<td>手 hand<td>the Kangxi radical 手
<tr><td>R<td>口 mouth<td>the Kangxi radical 口
<tr><td rowspan=6>Character shapes group
<td>S<td>尸 corpse<td>three-sided enclosure with an opening on the side
<tr><td>T<td>廿 twenty<td>two vertical strokes connected by a horizontal stroke
<tr><td>U<td>山 mountain<td>three-sided enclosure with an opening on the top
<tr><td>V<td>女 woman<td>a hook to the right, a V shape
<tr><td>W<td>田 field<td>four-sided enclosure
<tr><td>Y<td>卜 fortune telling<td>the 卜 shape and rotated forms
<tr><td>Collision/Difficult key<td>X<td>重/難 collision/difficult<td>(1) disambiguation of Cangjie code decomposition collisions, (2) code for a "difficult-to-decompose" part
<tr><td>Special character key<td>Z<td><td>auxiliary code used for entering special characters (no meaning of its own)
The auxiliary shapes of each Cangjie radical have changed slightly between different versions of the Cangjie method; this is one reason why different versions of the Cangjie method are not completely compatible.
Image:Keyboard_Layout_Cangjie.png|frame|none|A typical keyboard layout for Cangjie method
The basic rules
The typist must be familiar with several decomposition rules 拆字規則 that defines how to analyse a character to arrive at a Cangjie code.
- Direction of decomposition: left to right, top to bottom, and outside to inside
- Geometrically connected forms: take 4 Cangjie codes, namely the first, second, third, and last codes
- Geometrically unconnected forms: identify the first geometrically connected subform according to the direction of decomposition rule, then take the first and last codes of the subform; next consider the rest of the original form, and take the first, second, and last code of this subform. (If the second subform is itself geometrically unconnected, the second code taken is the last code in the first geometrically-connected subform of this second subform.)
The rules are subject to various principles:
- Conciseness (精簡) — if two decompositions are possible, the shorter decomposition is correct
- Completeness (原整) — if two decompositions of the same length are possible, the one that identifies a more complex form first is the correct decomposition
- Reflection of the form of the radical (字型特徵) — the decomposition should reflect the shape of the radical, meaning (a) the same code used twice or more is to be avoided if possible, and (b) the shape of the character should not be "cut" at a corner in the form
- * Partial omission (部分省略) — when the number of codes in a complete decomposition would exceed the permitted number of codes, the extra codes are ignored
- * Omission in enclosed forms (包含省略) — when part of the character need to be decomposed and the form is an enclosed form, only the shape of the enclosure is decomposed; the enclosed forms are omitted
The short list of exceptions
Some forms are always decomposed in the same way,
whether the rules say they should be decomposed this way or not.
Fortunately, the number of exceptions are few:
always 日 弓 (AN)
always 月 山 (BU)
always 竹 戈 (HI)
几 (small table)
always 竹 弓 (HN) (HU in original Cangjie)
always 卜 心 (YP)
always 卜 口 (YR)
always 人 土 (OG)
always 人 弓 (ON) (OU in original Cangjie)(The simplified character 气 is decomposed normally as 人 一 弓 OMN)
畿 minus the 田
always 女 戈 (VI)
always 中 弓 (LN)
always 弓 中 (NL)
The list is a list of exceptions only in the sense that normal decomposition would give more than one radical in many of the forms listed.
- * This character is geometrically connected, consisting of one part with a vertical structure, so we take the first, second, and last Cangjie codes from top to bottom.
- * The Cangjie code is thus 十 田 十 (JMJ), corresponding to the basic shapes of the codes in this examle.
- 謝 (xi?; to thank, to wither)
- * This character geometrically consists of unconnected parts arranged horizontally. For the initial decomposition, we treat it as two parts, 言 and 射.
- * The first part, 言, is geometrically unconnected from top to bottom; we take the first (, auxiliary shape of 卜 Y) and last parts (口, basic shape of 口 R) and arrive at 卜 口 (YR).
- * The second part is again geometrically unconnected, arranged horizontally. The two parts are 身 and 寸.
- ** For the first part of this second part, 身, we take the first and last codes. Both are slants and therefore H; the first and last codes are thus 竹 竹 (HH).
- ** For the second part of the original second part, 寸, we take only the last part. Because this is geometrically unconnected consisting of two parts, the first part is the outer form while the second part is the dot in the middle. The dot is I, and therefore the last code is 戈 (I).
- * The Cangjie code is thus 卜 口 (YM) 竹 竹 (HH) 戈 (I), or 卜 口 竹 竹 戈 (YRHHI).
- 谢 (simplified version of 謝)
- * This example is identical to the above, except that the first part is 讠; the first and last codes are 戈 (I) and 女 (V)
- * Repeating the same steps as the above example, we get 戈 女 (IV) 竹 竹 (HH) 戈 (I), or 戈 女 竹 竹 戈 (IVHHI).
In the beginning, the Cangjie input method was not a way to produce a character in any character set.
It was, instead, an integrated system consisting of the Cangjie input rules and a Cangjie controller board.
The controller board contains character generator firmware, which dynamically generates Chinese characters from Cangjie codes when characters are output,
using the hi-res graphics mode of an Apple II computer.
In the preface of the #References|Cangjie user's manual, Mr. Chu wrote in 1982
- Translation<br>In terms of output: The output and input, in fact, form an integrated whole; there is no reason that they be dogmatically separated into two different facilities.… This is in fact necessary.…
In this early system, when the user types "yk " (for example) to get the Chinese character 文, the Cangjie codes does not get converted to any character encoding; the actual string "yk " is stored.
In a very real sense, the Cangjie code of each character (string of 1 to 5 lowercase letters plus a space) was the encoding of that particular character.
A particular interesting "feature" of this early system is that if you send random lowercase words to the character generator,
it will attempt to construct Chinese characters according to the Cangjie decomposition rules,
sometimes causing strange, unknown characters to appear.
This unusual feature, "automatic generation of characters", is actually described in the manual
and is responsible for producing #number-of-characters|more than 10,000 of the about 15,000 characters that the system can handle.
The name Cangjie, evocative of creation of new characters, was actually very apt for this early version of Cangjie.
The presense of the integrated character generator also explains the historical necessity of existence of the "X" key as used for disambiguation of decomposition collisions: because characters are "chosen" when the codes are output,
every character that can be displayed must in fact have one and only one Cangjie decomposition.
It would not make sense?nor would it be
practical?for the system to provide a choice of candidate characters when some random text file is displayed;
the user would not know which of the candidates are correct.
Cangjie was designed to be an easy-to-use system to help promote the use of Chinese computing;
nevertheless, many users find Cangjie to be a difficult method.
Many of the perceived difficulties arise from poor instruction
- In order to input using Cangjie, one must know not only the names of the radicals, but also all their auxiliary shapes (which might not appear to make sense, though Mr. Chu had intended all the auxiliary shapes to be related to the basic shapes and "easy to remember"); it is common to find tables of the Cangjie radicals with their auxiliary shapes taped onto the monitors of casual computer users.
- One must also be familiar with the decomposition rules; unfortunately, a lot of casual computer users are not even aware of the existence of decomposition rules but rather type by guessing. This makes Cangjie a very difficult method.
Enough practice, however, can overcome the above problems.
A typist with sufficient practice in Cangjie touch types, much like an English typist;
it is entirely possible for a touch typist to type at 25 words (Chinese characters) per minute or better in Cangjie,
yet have difficulty remembering the list of auxiliary shapes or even the decomposition rules.
Experienced Cangjie typists can reportedly attain a typing speed between 60 wpm and over 200 wpm.
Cangjie, however, also have some "real" problems:
- Cangjie is not error forgiving (不容錯): The decomposition of a character depends on a predefined set of "standard shapes" (標準字形); however, because Cangjie is used in many different countries, the standard shape of a certain character in Cangjie is not always the standard shape of the same character the user has learnt. Learning Cangjie would then entail not only learning Cangjie itself, but also the standard shape of some characters. The difference between 溫 and 温, or 黑 and 黒, for example, illustrates the frustration learners of Cangjie might have to go through. The Cangjie IME is also not expected to handle mistakes in decomposition other than tell the user (usually by beeping) that there is a mistake.
- Punctuation marks are not geometrically decomposed, but rather given random-looking codes that begin with ZX followed by a string of three letters related to the ordering of the characters in the Big5 code. Typing punctuation marks in Cangjie thus becomes a frustrating exercise in either memorization or pick-and-peck.
- The user cannot type a character which he or she has forgotten how to write. This, of course, is not a real problem with Cangjie but a problem with all non-phonetic input methods. (This is not to say that phonetic input methods are superior; in fact they suffer from the opposite problem, namely that the user cannot type a character which he or she does not know how to pronounce.)
Finally, Cangjie is not a silver bullet.
In some situations it cannot be used at all.
Situations where Cangjie cannot be used
- Because Cangjie uses all 26 keys in an English keyboard, it cannot be used to input Chinese on cell phones. For cell phones, the Q9 method is the current norm because it is designed specifically for use on numeric keypads.
Most modern implementations of Cangjie IME's provide various convenience features:
- Some IME|IME's lists all characters beginning with the code you have typed; for example, if you type A, the system gives you all characters whose Cangjie code begins with A so that you can select the correct character if it is on the screen; if you type another A, the list is shortened to give all characters whose code begins with AA. Examples of such implementations include the IME in Mac OS X, and SCIM.
- Some IME's provide one or more wildcard keys, usuaully but not always * and/or ?, that allows the user to omit part(s) of the Cangjie code; the system will display a list of matching characters for the user to choose. Examples of such implementations include xcin, SCIM, and the IME in the Founder (University of Peking) typesetting systems.
- Some IME's provide an "abbreviation" feature, where impossible Cangjie codes are interpreted as abbrevations of the Cangjie codes of more than one character, allowing more characters to be input with less keys. An example of such an implementation is SCIM.
- Some IME's provide an "association" (聯想 lianxiang) feature, where the system anticipates what you are going to type next, and provides you with a list of characters or even phrases associated with what the user has typed. An example of such an implementation is the Microsoft Cangjie IME.
- Some IME's present the list of candidate characters differently depending on the frequency of use of the characters (how many of the same character has been typed by the user). An example of such an implementation is the Cangjie IME in NJStar.
Besides the wildcard key, many of the above features are very convenient for casual users but unsuitable for touch typists because they make the Cangjie IME unpredictable.
There are also various attempts to "simplify" Cangjie one way or another:
- Simplified Cangjie has the same radicals, auxiliary shapes, decomposition rules, and short list of exceptions as Cangjie, but only the first and last codes are used if more than 2 codes are required in Cangjie
- Chinese input methods for computers
- Taipei: Chwa! Taiwan Inc. (全華科技圖書公司). <cite>倉頡中文資訊碼 : 倉頡字母、部首、注音三用檢字對照</cite> The Cangjie Chinese information code : with indexes keyed by Cangjie radicals, Kangxi radicals, and zhuyin. Publication number 023479. — This is the user manual of an early Cangjie system with a Cangjie controller card.
- * The second-last paragraph on the first page in the section entitled "The Cangjie radical-based Chinese input method" (倉頡字母中文輸入法) states that<blockquote id="auxiliary">Translation<br/>This is no problem; there are also auxiliary forms to complement the deficiencies of the radicals. The auxiliary forms are variations of the shape of the radicals, and therefore easy to remember.</blockquote>
- * The last paragraph on the fifth page in the same section states<blockquote id="number-of-characters">Translation<br/>The dictionary appended in the back of this book is based on the 4800 standard, commonly used characters proclaimed by the Ministry of Education. Adding to this the characters that are automatically generated, the number of characters is about 15,000 (using the Kangxi dictionary as a basis).</blockquote>
- Part of the information from this article comes from :zh:%E5%80%89%E9%A0%A1%E8%BC%B8%E5%85%A5%E6%B3%95|the equivalent Chinese-language wikipedia article
- The decomposition rules come from the "Friend of Cangjie — Malaysia" web site at http://www.chinesecj.com/ The site also gives the typing speed of experienced typists.
- It might be difficult to find specific references to the "not error forgiving" property of Cangjie. The table at http://www.array.com.tw/keytool/compete.htm is one external reference that states this fact.
- http://www.arts.cuhk.edu.hk/Lexis/lexi-can/ The Chinese University of Hong Kong Research Centre for Humanities Computing Chinese Character Database: With Word-formations Phonologically Disambiguated According to the Cantonese Dialect: A Chinese character database covering the entire set of Big-5 chinese characters (5401 Level 1 and 7652 Level 2 Hanzi) as well as 7 additional ETen Hanzi. Cangjie input codes are shown for each character in the database. Note: The Hong Kong Supplementary Character Set (HKSCS - 2001) is not included in this database.
Category:Logographic writing systems
category:Input methods of Han characters
This article is licensed under the
GNU Free Documentation License. It uses material from the
Wikipedia article "Cangjie method".
||Last Modified: 2005-04-13