Wednesday, June 4, 2008

Learn to speak Chinese - Downloadable dictionary file? - Page 2 - From Beijing Chinese School.com > Learning Chinese > Chinese Computing and Technology > Adsotrans.com Forum

Downloadable dictionary file?
Home New Posts

Login:    Pass:   Log in or register for standard view and full access.

Page 2 of 2 < 1 2

wulong -

Here's a zipped CSV file that I extracted from the latest development database. Format is

simplified,traditional,pinyin,english

http://e.den.li/adso-csv.zip (2.3M)

Hope this helps.

PS. Here's the same data but in a single sqlite table:

http://e.den.li/adso.single.db.gz (6.8M)

Schema:

Code:

CREATE TABLE entries (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      simp TEXT UNIQUE,
      trad TEXT UNIQUE,
      pinyin TEXT UNIQUE,
      english TEXT
    );

Pleco Software Learn Chinese with our Dictionaries for Palm and Pocket PC.
Learn Chinese in China Learn to speak Chinese 1MonthChinese.com -Mandarin School in China.
Chinese Textbooks Wide range, cheap, varied languages. Also Chinese cartoons, toys, gifts.
Study Chinese in Beijing Affordable Mandarin language courses at BLCU with ChinaUnipath.com.
HNHSoft Dictionary Learn Chinese on Smartphone and PDA with real person's voice.
XueXueXue IQChinese Get beyond the plateau.Take your Mandarin to a new level.
Chinese in Lijiang Short term Chinese study in a beautiful town with a focus on daily life.
MandarinTube Chinese Access to current everyday Chinese language and culture, 24/7.
Learn Chinese Homestay Chinese course, cultural activities & volunteer events in China.
Learn Chinese Online 1-on-1 instant tutoring, diverse courses, native teachers. FREE trial now!
Nihao Chinese Progam Free one-on-one Chinese lesson. Win 5-years of free lessons now!

About Ads (and how to hide them) -- Your message here

woliveri -

Thanks Wulong,

I have two problems.

1. Excel cannot open the entire and so I sucked it into Access and because the delimiters ( | )
don't seem to be consistant so I have pinyin together with characters in some rows and others are
ok.

2. The other file, single table with all data, appears not to be a valid archive.

trevelyan -

My laptop got tanked by a QQ install last week, which has stopped Adso-related work until I can
get it fixed. I'll take a look at those corrupted tables when I'm back up and running.

I don't see why you can't dump in CEDICT format if you want. Part of the point of the database
release is that it should be relatively simple to reformat data. The easiest way to access most of
the data is to look at the table ("expanded_unified"). The SQL command "SELECT * from
expanded_unified" will get you most of what you need.

The hard way of doing things is to look up the first character in the table character_index
("GB2312") or index_utf8s (simplified). The pkey in those tables corresponds to the table number
containing all entries beginning with that character. If a character is listed in the index with a
pkey of 84, for instance, all words starting with that character will be found in table _84.

woliveri -

Hi Trevelyan,

Thanks for the reply. It seems expanded_unified contains Chinese and Pinyin but no English.

This is the query:

Questions:
1. how do I get the English translation

Thanks,

Bill

wulong -

Quote:

1. Excel cannot open the entire and so I sucked it into Access and because the delimiters ( | )
don't seem to be consistant so I have pinyin together with characters in some rows and others are
ok.

That's a dump directly from sqlite. You might have to fixup a few rows to get it to work.

Quote:

2. The other file, single table with all data, appears not to be a valid archive.

It's a gzip file. You need to use WinZip or WinRAR if you're in Windows. If you're on Mac OS X, it
should be built in.

Quote:

I don't see why you can't dump in CEDICT format if you want. Part of the point of the database
release is that it should be relatively simple to reformat data. The easiest way to access most of
the data is to look at the table ("expanded_unified"). The SQL command "SELECT * from
expanded_unified" will get you most of what you need.

I don't even see expanded_unified. Can you point me to the archive that has the database with this
table?

What I need is a simple list (simplified, traditional, pinyin, english) similar to what cedict
gives. The database I have doesn't make it easy to do this which is why I had to resort to using
ruby in order to pull everything together.

woliveri -

wulong,

Yes, I have Winzip but it fails to open the archive saying it's corrupt or other error.

I'm using SQLite Maestro to view the tables (see the above graphic in my previous post), Seems
like a nice application but still cannot export to file without having memory errors or other
issues.

http://www.sqlmaestro.com/products/sqlite/maestro/

wulong -

@woliveri

Hmm... haven't used Windows in awhile, but I remember running into issues with winzip and plain
gzip files. Here's a zip file for you: http://e.den.li/adso.single.zip

Hopefully this one works better.

If you've installed sqlite3, you can get a dump file from the command line:

Code:

C:\path>sqlite adso.single.db
sqlite> .separator ,
sqlite> .output adso.csv
sqlite> select * from entries;
sqlite> .quit

There will be a new file called adso.csv in the same directory you started sqlite.

perjp -

Wulong, the schema for the sqlite database seems to be incorrect:

Code:

CREATE TABLE entries (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      simp TEXT UNIQUE,
      trad TEXT UNIQUE,
      pinyin TEXT UNIQUE,
      english TEXT
    );

The UNIQUE tags should not be there.
In the current table, there is only one entry with the pronounciation a1, which can't be correct.
The csv file has the same problem.

I've tried several times to access the download site http://www.adsotate.com/downloads/ but i've
never been successful. Is there any other way of accessing the raw data file?

trevelyan -

There will be a new release in a matter of days: database plus software plus several months of
updates. In the meantime, an older version is still online at:

http://www.adsotrans.com/downloads/

The adsotate.com server had technical problems and is offline..

ABCinChina -

First of all, I would like to thank Trevelyan for putting together such a useful dictionary which
help me in reading online text as well as my work translations.

Is it just me, or is the current database down? I need to create a PHP script that connects to the
database and reformats the data to the format I need. I've downloaded sqlite3 & sqlite maestro and
need some tips on how to get them to dump in the needed format since I can't seem to access this
script. http://adsotrans.com/downloads/v5/php_script.txt Can somebody please give me some pointers?

Here's what I'm trying to do. Ideally, I would like to import the full Traditional and Simplified
database into Kingsoft Powerword 2007 which accepts .txt files in ANSI format that looks like the
example below.

乾淨|[gan1 jing4]\r\nclean\r\ntidy\r\nneat\r\n <--format

Ends up looking like this...
乾淨 <-- lookup word
[gan1 jing4] <-- pinyin & definitions
clean
tidy
neat

All times are GMT +8. The time now is 06:42 PM.

Learn Chinese, Chinese Course, Learn mandarin, Learning Materials, Mandarin audio lessons, Chinese writing lessons, Chinese vocabulary lists, About chinese characters, News in Chinese, Go to China, Travel to China, Study in China, Teach in China, Dictionaries, Learn Chinese Painting, Your name in Chinese, Chinese calligraphy, Chinese songs, Chinese proverbs, Chinese poetry, Chinese tattoo, Beijing 2008 Olympics, Mandarin Phrasebook, Chinese editor, Pinyin editor, China Travel, Travel to Beijing

No comments: