Wednesday, June 4, 2008

Chinese Class - Traditional Support - From Beijing Chinese School.com > Learning Chinese > Chinese Computing and Technology > Adsotrans.com Forum

Traditional Support
Home New Posts

Login:    Pass:   Log in or register for standard view and full access.

trevelyan -

We've fixed the issues with automatic traditional character recognition that character pointed out
in another thread. The updated code (and database) is available for download. Anything from
version v5-022 should work:

http://adsotrans.com/downloads/adso-v5.022.tar.gz

Have also edited our "advanced editing page" so that traditional characters can be edited. Right
now we will fail to parse traditional words if they do not exist in our database, even if the
simplified counterpart does. All about maintaining the integrity of the database.

Suggestions on how to improve the system for users/contributors who want to deal mostly with
traditional Chinese are welcome. Do we need separate editing and annotating pages? I'm not sure
but would like to make whatever changes are necessary to get the fanti crowd more involved.

More details on the Adso blog.

Pleco Software Learn Chinese with our Dictionaries for Palm and Pocket PC.
Learn Chinese in China Learn to speak Chinese 1MonthChinese.com -Mandarin School in China.
Chinese Textbooks Wide range, cheap, varied languages. Also Chinese cartoons, toys, gifts.
Study Chinese in Beijing Affordable Mandarin language courses at BLCU with ChinaUnipath.com.
HNHSoft Dictionary Learn Chinese on Smartphone and PDA with real person's voice.
XueXueXue IQChinese Get beyond the plateau.Take your Mandarin to a new level.
Chinese in Lijiang Short term Chinese study in a beautiful town with a focus on daily life.
MandarinTube Chinese Access to current everyday Chinese language and culture, 24/7.
Learn Chinese Homestay Chinese course, cultural activities & volunteer events in China.
Learn Chinese Online 1-on-1 instant tutoring, diverse courses, native teachers. FREE trial now!
Nihao Chinese Progam Free one-on-one Chinese lesson. Win 5-years of free lessons now!

About Ads (and how to hide them) -- Your message here

character -

Quote:

Right now we will fail to parse traditional words if they do not exist in our database, even if
the simplified counterpart does. All about maintaining the integrity of the database.

Automatic conversion seems dauntingly difficult: http://www.cjk.org/cjk/c2c/c2cbasis.htm

I guess the internet could be harnessed to see if traditional "matches" exist for simplified
phrases. The results could be reviewed before inclusion in Adso.

trevelyan -

The academic team at ChinesePod is using some Adso-related tools to help with lesson preparation,
which is helping us flag some of the issues that still exist with duoyinci and pushing forward the
project.

Manual review is definitely critical. The best solution is really to find some people who are
interested in this sort of thing and are coming at text analysis from a fanti perspective. Then
religiously fixing the problems they complain about.

character -

Quote:

Then religiously fixing the problems they complain about.

Going entirely to apache licensing would be favorite.

---------------

./adso -f file1.txt --code --extra-code "<REDUCE> AND <PRINT chinese><PRINT / ><PRINT
chinese_utf8s><PRINT / ><PRINT chinese_utf8c><PRINT / ><PRINT english><PRINT / ><PRINT
pinyin><PRINT / ><PRINT myclass><PRINT newline>" > file2.txt

This produces an empty file. Do I need to be using the non-latin database for this to work?

Until this is fixed, is there any chance of an enhanced vocab mode which includes the pinyin in
addition to everything else it outputs?

-----------------

./adso -f file1.txt -ie utf8 -is traditional -oe utf8 -os traditional --vocab > file2.txt

1) Wenlin says file2.txt has ~1200 UTF-8 format violations
2) Wenlin seems to be saying that the "U+3000 Ideographic space" in the input is processed into
"U+FFFD Replacement character" (which displays as a control character).

trevelyan -

I'm generally happy to let people use the adso materials commercially provided they attribute the
materials and contribute back to the project. I don't think it's onerous to send an email asking
for permission.

On the traditional side, can you mail me the file you're using so that I can take a look at it
myself. email address is david.lancashire at google.com. I think the command is working for me so
I'd like to replicate things exactly. You are compiling from source right?

trevelyan -

Thanks to pressure from Mark at toshuo.com, the annotation engine is now outputting popups in
traditional characters (when input is traditional characters). Will be working on hooking up the
editing functionality for the traditional stuff later this week and will post when that's done.

All times are GMT +8. The time now is 06:41 PM.

chinese language lessons, learn chinese language, learn to speak chinese, mandarin learn, study chinese, study chinese in china, studying chinese, china chinese in learning, chinese language class, chinese language exchange, chinese language in china, chinese language lesson, chinese language study, chinese language tutor, learn chinese beijing, learn chinese in beijing, learn chinese pinyin, learn to speak mandarin chinese, learning chinese online, learning the chinese language, mandarin learn online, study chinese china, studying chinese online

No comments: