Opportunity to help Church developers
Posted: Thu Feb 22, 2007 8:48 am
I am a Church employee and have enjoyed reading your ideas and thoughts in the forums. Just wanted to make you aware of a need we currently have, in case you would like to volunteer to help with one of our current challenging and interesting projects.
I'm a developer on a Church system that must take members' names recorded in any alphabet or writing system (such as Greek, Thai, Japanese, etc) and produce a "Romanized" version of the name (i.e. the name written in the standard Latin A-Z alphabet used for English and other western languages) for use throughout the Church.
We are using an open source library called International Components for Unicode (ICU) to accomplish this transliteration, and have found that for certain writing systems, the ICU library needs improvement.
Specifically, ICU's Thai-to-Latin transliteration produces names that are not pronounceable. For example, ICU transliterates the name "ออ่นเสมอ, อังคณา" as "Xx̀nsemx, Xạngkhā". There is a well-documented, standard set of rules called the "Royal Thai General System of Transcription" that can produce pronounceable romanizations of Thai names (it transliterates the example name above into "A-onsamoe, Angkhana"), but these rules have not been implemented in the ICU software.
It would be very helpful to us if someone would add the "Royal Thai General System of Transcription" standard rules to the transliterators available in the Java version of the ICU library. You don't need to speak or read Thai to be able to do this. Yes, knowing Thai would make the task easier, but if you can visually match a Thai character between the Unicode documentation and the Royal Transliteration rules, you can write the software.
If you are interested, here are some links that might help get you started:
- Documentation of the Royal Thai standard is here:
http://en.wikipedia.org/wiki/Royal_Thai_General_System_of_Transcription
(can be slow to load, but will eventually come up)
and here:
http://www.arts.chula.ac.th/~ling/tts/principles_eng.pdf
- A website with a demo implementation of the standard is here:
http://www.arts.chula.ac.th/~ling/tts/
(we need the option labeled "Roman" on this site)
- General information about the ICU library is here:
http://www-306.ibm.com/software/globalization/icu/index.jsp
- The portion of the ICU user guide that discusses transliterations and how to implement your own is here:
http://icu.sourceforge.net/userguide/Transform.html
and here:
http://icu.sourceforge.net/userguide/TransformRule.html
- A demonstration of the ICU transliteration functionality is found here:
http://demo.icu-project.org/icu-bin/translit
- A chart of the Thai character assignments in Unicode is here:
http://www.unicode.org/charts/PDF/U0E00.pdf
- A Unicode editor that will allow you to enter the Thai characters by codepoint is here:
http://www.unipad.org/main/
- Sample Thai text for testing can be found here:
http://www.unicode.org/standard/translations/thai.html
Thanks in advance to anyone who can help.
Regards,
Cindy
I'm a developer on a Church system that must take members' names recorded in any alphabet or writing system (such as Greek, Thai, Japanese, etc) and produce a "Romanized" version of the name (i.e. the name written in the standard Latin A-Z alphabet used for English and other western languages) for use throughout the Church.
We are using an open source library called International Components for Unicode (ICU) to accomplish this transliteration, and have found that for certain writing systems, the ICU library needs improvement.
Specifically, ICU's Thai-to-Latin transliteration produces names that are not pronounceable. For example, ICU transliterates the name "ออ่นเสมอ, อังคณา" as "Xx̀nsemx, Xạngkhā". There is a well-documented, standard set of rules called the "Royal Thai General System of Transcription" that can produce pronounceable romanizations of Thai names (it transliterates the example name above into "A-onsamoe, Angkhana"), but these rules have not been implemented in the ICU software.
It would be very helpful to us if someone would add the "Royal Thai General System of Transcription" standard rules to the transliterators available in the Java version of the ICU library. You don't need to speak or read Thai to be able to do this. Yes, knowing Thai would make the task easier, but if you can visually match a Thai character between the Unicode documentation and the Royal Transliteration rules, you can write the software.
If you are interested, here are some links that might help get you started:
- Documentation of the Royal Thai standard is here:
http://en.wikipedia.org/wiki/Royal_Thai_General_System_of_Transcription
(can be slow to load, but will eventually come up)
and here:
http://www.arts.chula.ac.th/~ling/tts/principles_eng.pdf
- A website with a demo implementation of the standard is here:
http://www.arts.chula.ac.th/~ling/tts/
(we need the option labeled "Roman" on this site)
- General information about the ICU library is here:
http://www-306.ibm.com/software/globalization/icu/index.jsp
- The portion of the ICU user guide that discusses transliterations and how to implement your own is here:
http://icu.sourceforge.net/userguide/Transform.html
and here:
http://icu.sourceforge.net/userguide/TransformRule.html
- A demonstration of the ICU transliteration functionality is found here:
http://demo.icu-project.org/icu-bin/translit
- A chart of the Thai character assignments in Unicode is here:
http://www.unicode.org/charts/PDF/U0E00.pdf
- A Unicode editor that will allow you to enter the Thai characters by codepoint is here:
http://www.unipad.org/main/
- Sample Thai text for testing can be found here:
http://www.unicode.org/standard/translations/thai.html
Thanks in advance to anyone who can help.
Regards,
Cindy