Page 2 of 3

Posted: Mon Feb 26, 2007 12:10 pm
by russellhltn
One start might be to understand the translation rules. I'm not talking about the ICU, but the Thai rules that we are trying to implement. I've looked at the site and can't quite figure out the rules. I can't teach a computer to do something I don't understand myself.

Start with Armenian?

Posted: Mon Feb 26, 2007 12:33 pm
by ConlinC
If you want to try your hand at a simpler writing system than Thai, you could work on Armenian, which has transliteration rules that may be more straightforward.

Here's a link to Armenian transliteration rules:
http://transliteration.eki.ee/pdf/Armenian.pdf
This page lists 5 different romanization standards. I believe ICU has implemented the ISO 9985 standard, but we need the American Library of Congress standard (ALA-LC 1997--the third column), since it produces fewer letters with diacritics (accents and other marks). We need the East Armenian version of standard (the values *not* in parentheses). The footnotes provide information about letters that have context-sensitive transliterations.

Thanks!

Good first step

Posted: Wed Feb 28, 2007 9:54 am
by HaleDN
I'm glad to see that this thread got moved to a new forum so that it doesn't get lost among the other discussion posts. I wish that there could be an even better way to raise the visibility of this need to a larger audience since I'm guessing that there are not many people that would have the skills necessary to help with this (unless someone wants to jump in and learn how to do this).

Script transliteration?

Posted: Wed Mar 21, 2007 1:57 pm
by The_Earl
Is there a reason you are writing a script transliterator? Can you simply create a UCM file as outlined here:
http://www.icu-project.org/userguide/co ... -data.html

I would think that ICU would automatically do the transliteration if you gave it the correct file.

Thanks
Barrie

Posted: Thu Mar 22, 2007 8:44 am
by ConlinC
I'm using a script transliterator because I need to convert the data from one alphabet to another.

The UCM file and functionality you mentioned is for maintaining the data in the same alphabet, but converting it from one internal representation to another. For example, you might be familiar with PC codepages 437 and 850. Codepage 850 represents a capital A with an acute ( Á ) with the internal codepoint 181, while codepage 437 uses this same codepoint 181 to represent one of the line drawing characters ( ╡). So, if you move data between machines using these two codepages, you must do a conversion...this is what the UCM file is for.

Thanks for taking the time to look into this.

Chinese MLS

Posted: Fri Mar 23, 2007 8:39 pm
by leejj
In regard to Traditional Chinese, they are still trying to figure out which romanization system to use. Supposedly this is one of the hold ups in getting MLS out to Taiwan. Do you have any details regarding this? I am currently a resident of Taiwan and would be willing to assist those facing these problems.

Josh

Traditional Chinese

Posted: Mon Mar 26, 2007 8:10 am
by ConlinC
Thanks Josh. Yes, this project is a pre-requisite for the rollout of MLS in Taiwan. MLS was able to rollout in Hong Kong, because the ward clerks there are generally comfortable with romanized names and able to type both the Chinese and romanized name for each member. Since clerks in Taiwan are generally less comfortable with romanized names, we will be generating these names in an automated way through ICU's Pinyin transliteration functionality. This requires, however, some significant work on the back-end membership database that communicates with MLS (moving to a new environment where ICU is supported, revising a few key parts of the data structure, etc). We are actively working on this.

Thank you for your offer of help. For now, I think we've resolved our Traditional Chinese issues and are working on more general issues that are involved with this change. If in the future we need assistance with Traditional Chinese, I'll certainly let you know. Thank you.
Regards,
Cindy

Posted: Sat Apr 14, 2007 9:32 am
by leejj
Here's another program you may be interested in displaying multiple characters and converting to NCRs and other code equivalents.

Babelpad

It's free and has many functions that I've found useful in displaying Chinese on different devices.

Josh

Thanks!

Posted: Mon Apr 16, 2007 7:59 am
by ConlinC
Hi Josh,
I downloaded Babelpad and it looks like it has some very helpful features. Thank you very much for the tip!
Cindy

Posted: Thu Sep 06, 2007 7:27 am
by tom777-p40
Cindy,

I am facing a challenge to transliterate Japanese properly when text contains both Kanji and Hiragana. ICU treats Chinese characters as Chinese and transliterates these characters into Pinyin Chinese even these characters are part of Japanese context. So I am interested to know how to create a custom "Kanji to Hiragana" transliterator. Is it possible to share what steps did you take to create your own custom transliterator with the ICU rules? Thanks in advance for your help.

Tom