Chu nom Characters

Thomas Chan · Post by **Thomas Chan** » Tue Sep 24, 2002 2:13 pm

Mark wrote:
>
> How about one of us submits a Unicode proposal for characters
> that can be found in the http://140.111.1.40/ dictionary,
> other dialect dictionaries, Jurchen, Xixia, &c dictionaries
> (ok, maybe not Jurchen, Xixia, but still http://140.111.1.40/
> and dialect dictionaries) but not in the already existing
> Unicode standard?

Some "dialect characters" are just the inventions of the dictionary
compilers, and don't really exist elsewhere. Sometimes the compilers
will invent characters, or substitute what they think is the etymological
character, without regard to how people really write the word in
colloquial writings.

It's not a bad idea to submit, but they are starting to crack down on rarer
characters and variants, because it is already getting too unwieldy
to use, over 70K characters and growing...

Jurchen and Tangut are really part of systems separate from Han
characters, btw--they are roadmapped to go in Plane 1, rather than
Plane 2.

> About a month ago, I almost completed a proposal for Dongba
> (Naxiji), but I had troubles with MS Word and how to make it
> work properly with the proposal... I have the file, I will
> upload it if anybody is willing to help... :p

Are you in contact with people who use the script?

> Oh, and WinXP and Win2K support Unicode's latest version, 32
> bits and all. You just have to change 2 registry values on
> Win2K, on WinXP everything is already all set to go.

Do you know of any fonts for them? e.g., there's really only one font
that covers the Plane 2 characters.

Thomas Chan
tc31@cornell.edu

James Campbell · Post by **James Campbell** » Tue Sep 24, 2002 5:40 pm

You can also check the contents at:

http://www.cns11643.gov.tw/web/index.jsp

Mark,

I discussed this with a colleague recently about the shortcomings of Unicode. Is not writing something that can flow easily from your pen and controlled by your hand? Each and every individual character that I produce by my hand can either be created or already existent. There is no limitation to the ease with which I can create new characters.

Unicode is like a typewriter. It is limited in its scope. We do not have yet the technology of the "hand".

Creating characters does not have to be completely random or nonsense. I could just as easily create the character [亻因] and use it in a Minnan sentence and the reader would understand what I've written. OR, I should even be able to "create" 姻 in the same way, even though it already exists.

A futuristic solution to this problem is basically creating a program that understands the vectors of the shapes of letters and characters, encodes the directions of the strokes (I suppose postscript could be an example of this) and then decodes them when they are to be read.

For example, if I were typing some special Chinese characters like 亻因 above, and if using pinyin I could type 'yin' into my application and up jumps a choice of regular characters or created characters, I click on created characters where I get a choice of possible rhymes (諧聲字) such as 音, 因, 垔, 侌, etc. and then a separate list of radicals which can be compounded at will. So to 因 I could not only add 亻, but then 艹, and then even a 辶 if I really wanted to. Or how about that famous one hung up on signs all over the place but I've never seen on a computer: zhaocaijinbao. So, once I've created a character I could choose to have it listed together with my regular characters which will move to the front of the list the more I use it.

There should also be an advanced function for creating non-standard characters, all which would handle your directions automatically, without having to go into a font editor or position placement tool. You character would then be encoded by a standard mapping of the vectors and the lines, rather than a bitmap. With this feature in place, you only need to apply certain characteristics such as adding serifs or making it italic and so on without the creation of separate fonts.

With such a technology, the creation of millions of Chinese characters could be possible, just as it is by anybody's hand. There wouldn't be a need for that many; there would only be a need for as many as an individual would wish to use, and that's a freedom that our hands allow us, but computers so far do not. But by natural evolution, this many characters just would not occur by humans. We'd only need the ones we'd want to use for our dictionaries and dialects.

James

Mark · Post by **Mark** » Mon Sep 30, 2002 10:11 pm

James: I've often thought of such a solution, perhaps "combining" elements of hanzi in all allowed positions that one might combine on a "base hanzi" of nothing at all. Of course this might often produce awkward results, but it certainly is feasable if you only plan to use this feature every once in a while. However, if you plan to write a whole e-book of such hanzi, you will find that it takes up much, much more space than if each of the characters you create were part of the Unicode standard.

Also I have thought before of a solution for Hangeul encoding using combining jamo in different positions, and perhaps this could work but documents would take up more space! (or would they? I think they would) Plus, Hangeul has already been encoded in Unicode and many other encodings in a different way, so it's sort of useless now.

Thomas:

Indeed they are roadmapped to go to Plane 2, but they have not yet been encoded if I am correct.

Last time I checked, neither had the Donga script.

And I *sort of* have contact with users of the Dongba script, depending how you define users. If you mean people who use it on a daily basis, then that would be pretty hard to find unless it was somebody who used it for research etc., but it is not nearly as hard to find somebody speaking Naxi who can read and write the Dongba script.

And yes, I do know of 1 font that supports past Plane 1... Code2001 supports some of the new additions past 8bit.

Thomas Chan · Post by **Thomas Chan** » Mon Sep 30, 2002 11:11 pm

Mark wrote:
>
> James: I've often thought of such a solution, perhaps
> "combining" elements of hanzi in all allowed positions that
> one might combine on a "base hanzi" of nothing at all. Of
> course this might often produce awkward results, but it
> certainly is feasable if you only plan to use this feature
> every once in a while. However, if you plan to write a whole
> e-book of such hanzi, you will find that it takes up much,
> much more space than if each of the characters you create
> were part of the Unicode standard.
>
> Also I have thought before of a solution for Hangeul encoding
> using combining jamo in different positions, and perhaps this
> could work but documents would take up more space! (or would
> they? I think they would) Plus, Hangeul has already been
> encoded in Unicode and many other encodings in a different
> way, so it's sort of useless now.

Like for Han characters, that method of encoding hangul would take up
more space, because one is encoding each individual letter, and thus
every possible combination (as a sequence of "letters"), whereas with a
method that encodes the precomposed units, only the possible ones are
accepted and space is not wasted. For example, the number of components
necessary for Han characters is in the hundreds, and that doesn't take into
account positioning within the character. Let's pretend there are only 214
components--we know there are at least this many, as the number of Kangxi
radicals, but we know there at much much more. That would require 8 bits
(2 to the 8th power = 256) to store just one component. Repeat that for
every other component, and then also reserve some bits for positioning.
Compare that to a restricted list of precomposed Han characters of about
8,000-10,000, a functional amount, cf., telegraph code which encodes under
10,000--this would require only 13 or 14 bits (2 to the 13th or 14th power).

One con is that this method introduces the spectre of spelling mistakes,
like children writing the wrong radical, which is part of the reason why we have
50,000+ in the first place now, because the dictionary compilers have compiled
mistakes and preserved them for posterity. Flip through the Kangxi, and see all
the "X de ezi"--"erroneous form of X" entries. (On the other hand, this would
also create a cottage industry of "Chinese spellchecking" software.)

Another con is that processing speed goes down, because each character is of
variable length--moving six characters to the left will not be backtracking
twelve bytes or whatever, but a variable amount.

But as James said for the flexibility of being able to create characters
naturally like people have always done, a decomposed encoding allows for
unusual hangul combinations. For one thing, the precomposed hangul syllables
in Unicode (damn their waste, as there are some that do not occur in the
language at all) do not include any combinations that occur in medieval hangul
(such as those using archaic letters, like half-sios [z].)

About fonts--the decomposed method has been done before. It's called "johab",
and the fonts are very small, since you only need the individual letters in
the few various positions (upper left, bottom, etc). e.g., "g" in "ga" vs.
"tag". But for modern use, people have switched to the precomposed encodings...

> Indeed they are roadmapped to go to Plane 2, but they have
> not yet been encoded if I am correct.

Plane 1, but not encoded yet.

> Last time I checked, neither had the Donga script.

It might be listed under Naxi or Nakhi--I don't remember.

> And I *sort of* have contact with users of the Dongba script,
> depending how you define users. If you mean people who use it
> on a daily basis, then that would be pretty hard to find
> unless it was somebody who used it for research etc., but it
> is not nearly as hard to find somebody speaking Naxi who can
> read and write the Dongba script.

I'm not sure how the powers define contact with the user community. But it
seems they certainly want more than just people who know something about it,
but don't actually use it.

Contact and input from people who actually use (or study) a script is important;
right now, Khmer script is a mess because not enough efforts were made in the
past to involve people who write in it.

> And yes, I do know of 1 font that supports past Plane 1...
> Code2001 supports some of the new additions past 8bit.

But Kass' Code2001 only has Plane 1, not Plane 2. So all you get is Gothic,
Deseret, etc., but no Han characters. The only font I know of with Plane 2
characters is "Simsun (Founder Extended)".

Thomas Chan
tc31@cornell.edu

Thomas Chan · Post by **Thomas Chan** » Mon Sep 30, 2002 11:23 pm

James Campbell wrote:
>
> I discussed this with a colleague recently about the
> shortcomings of Unicode. Is not writing something that can
> flow easily from your pen and controlled by your hand? Each
> and every individual character that I produce by my hand can
> either be created or already existent. There is no limitation
> to the ease with which I can create new characters.

I sympathize with not being able to write the characters I want, if for even
a reason as having to wait for attention to be brought to it, getting it
catalogued and encoded, having fonts made and distributed, etc. Even for
long-standing characters that have been attested for decades, i.e., not new
frivolously created nonce characters. Many Cantonese dialect characters are
in this situation, since they have escaped the notice of native dictionary
compilers (or deliberately omitted, if they were conservatives). Another
flaw is that by the time such characters are collected and encoded, they might
have become historical curiousities as users have moved on to other characters,
especially for developing (dialect) orthographies.

> Or how about that famous one
> hung up on signs all over the place but I've never seen on a
> computer: zhaocaijinbao.

Mojikyou has got a few ligatures like that (I don't know about that one
specifically), but it is still a system of cataloging and encoding precomposed
characters:
http://homepage2.nifty.com/Gat_Tin/kanji/sinji.htm

Thomas Chan
tc31@cornell.edu

Richard · Post by **Richard** » Tue Oct 01, 2002 1:34 am

Hi everyone. Currently, I'm also a member of the Mojikyo Institute. As far as I know, there are really LOTS of Chinese Characters that can be found in the mojikyo database but can't be seen yet on this site, nor on Unicode. Recently, I've also read some articles that "if all languages are contented with unicode, then chinese characters will be in trouble. I don't agree well on this saying, since it poses a threat to the invention and existence of new Chinese characters. This website:http://140.111.1.40/ contains a huge mass of unencoded chinese chars. and if we count them together with the number of coded ones, numbers can reach 100000+ . I know a lot of dialectical or rare variants chinese characters. Some months ago, I bought a Chinese coin from the era of Xuan Tong(Emperor Pu-Yi). the character "xuan" has a different shape" ,unlike the other much "commoner" ones. However, I doubt it if that character is already in Unicode nor on this website. Let me cite another instance: When I was just young, I saw a Chinese character (min, same as the character for Min,Fukien Province) with 2 mountains on its top. As we all know it, Fujian is very mountainous (I guess that's why the character has 2 mountains on top of it).Sorry to say but I forgot the program who had shown it on t.v. Sadly, I couldn't find this character elsewhere. Nor do I have some sufficient documents /articles to be used as evidence. These 2 are examples that there are still many "undocumented" characters occuring today. Has anyone of you seen these characters?

Thanks a lot.

Regards,

Richard

Thomas Chan · Post by **Thomas Chan** » Tue Oct 01, 2002 3:34 am

Richard wrote:
>
> Hi everyone. Currently, I'm also a member of the Mojikyo
> Institute. As far as I know, there are really LOTS of Chinese
> Characters that can be found in the mojikyo database but
> can't be seen yet on this site, nor on Unicode. Recently,
> I've also read some articles that "if all languages are
> contented with unicode, then chinese characters will be in
> trouble. I don't agree well on this saying, since it poses a
> threat to the invention and existence of new Chinese
> characters. This website:http://140.111.1.40/ contains a
> huge mass of unencoded chinese chars. and if we count them
> together with the number of coded ones, numbers can reach
> 100000+ .

One thing we have to be careful about is what we consider two different
characters, and what we consider to be two allographs of the same
character. Generally, Mojikyou is more liberal than Unicode in what it
considers to be two different characters, whereas Unicode might just
consider them to be within the tolerances of font variation. It's important
for data processing that we do not have too many very similar characters to
choose from. For example, there are two ways to write "a" and "g" in
Latin script. If we had to worry about which one people might have typed
in, then functions like searches would fail unless the two were equated
somehow. However, deciding when two characters are the same or
different is not always clear-cut, and individual opinions do differ.

> I know a lot of dialectical or rare variants
> chinese characters.

Just out of curiousity, how do you manage to learn of such rare characters
without books like the _Kangxi Zidian_?

> Some months ago, I bought a Chinese coin
> from the era of Xuan Tong(Emperor Pu-Yi). the character
> "xuan" has a different shape" ,unlike the other much
> "commoner" ones. However, I doubt it if that character is
> already in Unicode nor on this website.

Could you scan or draw a picture of that? Or find a picture online?
I'd be interested in seeing what it looks like. Or a reference to a coin
catalog like Krause's?

> Let me cite another
> instance: When I was just young, I saw a Chinese character
> (min, same as the character for Min,Fukien Province) with 2
> mountains on its top. As we all know it, Fujian is very
> mountainous (I guess that's why the character has 2 mountains
> on top of it).Sorry to say but I forgot the program who had
> shown it on t.v. Sadly, I couldn't find this character
> elsewhere. Nor do I have some sufficient documents /articles
> to be used as evidence. These 2 are examples that there are
> still many "undocumented" characters occuring today. Has
> anyone of you seen these characters?

I have not seen either of those two, although the latter seems like
pure decoration--'mountains' don't quite add any extra phonetic or
semantic information.

Have you seen that "kanji no shashin jiten" site? (I posted the url to
its "shinji" page today in this thread in response to James' post on
zhaocaijinbao.) It's got a lot of unusual characters, though some seem
to be borderline cases that rely on a zhuanshu or lishu form in order to
make a point for an independent character.

Thomas Chan
tc31@cornell.edu

Richard · Post by **Richard** » Tue Oct 01, 2002 2:40 pm

Dear Thomas,

Hi! I find info about these characters on T.V. When I become addicted to one thing, I try searching for more and more info about the subject. I still remember thet I saw the variant for "min" probably about 2 years ago in a certain tv station devoted to chinese culture and language. Anyway, I would like to propose thewse 2 characters that I've mentioned to be added to the consortium. How can that be? can you help me? What are the steps/ processes involved in approving these rare characters that I've discovered? BTW, I also saw a rare variant of a certain character. I saw it on this site:http://140.111.1.40 . It is a variant of the character for Pin, a kingdom/state during the warring states period. It has a big mountain bushou. As you can see, the mountain bushou has 2 spaces,both on its left and right part. 2 fen(cent) characters are both placed on these 2 spaces that I've mentioned before. This character is a variant of the character for Pin Chou kingdom which existed earlier in Chinese history. I have another suggestion: why don't we establish a committee here in this site or on the Internet which would give some importance to Newly dicovered rare characters? That would help a lot and will be beneficial for all of us.
I discover rare Chinese Characters by going to Chinatown and interviewing some very old Chinese people. Some of them are already aged in their 80's. The oldest person that I got some info was 86 years old.
I really treasure Chinese Characters and I really value them. I think I wouldn't survive for a day without them. It took me a lot of time before I could print this site's character Dictionary. I spent a lot of money and time printing them for my own personal use. I guess it took me about 8000 to 10000 pages, so as to print them!! Some years ago, I tried to find a complete chinese dictionary, but in vain. Nor did I find a Kangxi or a hanyu da zidian.Out of desperation, I printed this sites CCDICT. Now, I can read ancient Chinese texts and decipher them.I searched for a lot of places. I even tried to go and search everybookstore here in Chinatown. In that whole area, I only found 2 bookstores, only carrying small copies of the guo yu cidian.People there informed me that this is the only dictionary available, aside from the cihui.Running out of luck, I found this site and printed this site's char. dict. ,which was my last remedy.

Thanks a lot.

Regards,

Richard

Mark · Post by **Mark** » Mon Oct 07, 2002 8:19 pm

lol. you printed the dictionary here? rotflmao!

Richard · Post by **Richard** » Tue Oct 08, 2002 8:43 am

Dear Mark,

Yes. I did print the dictionary, since I am really in need of a complete chinese char. dictionary. The results are really perfect. I find a lot of info about rare and unusual chinese chars.

Richard

Chinese languages

Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters

Re: Chu nom Characters