Browse Subversion Repository
Contents of /misc/data/README
Parent Directory
| Revision Log
Revision 114 -
( show annotations)
( download)
Mon Apr 21 01:56:41 2008 UTC
(15 years, 11 months ago)
by mir
File size: 987 byte(s)
Added ramdom Japanese data generator.
| 1 |
Random Japanese data generator |
| 2 |
|
| 3 |
*What's this? |
| 4 |
|
| 5 |
This program is a data generator for who need Japanese |
| 6 |
data for performance test and so on.. |
| 7 |
|
| 8 |
Data is generated by using *.csv which is a part of mecab-ipadic. |
| 9 |
|
| 10 |
*How to use? |
| 11 |
1. Compile datagen.c |
| 12 |
|
| 13 |
gcc -o datagen datagen.c |
| 14 |
|
| 15 |
2. Execute datagen in *.csv directory |
| 16 |
|
| 17 |
./datagen 1000 2000 |
| 18 |
|
| 19 |
Argument #1 means number of bytes for each generated Japanese sentence. |
| 20 |
Argument #2 means number of rows for total generated data. |
| 21 |
|
| 22 |
Above means 1000bytes * 2000rows = total 2MB Japanese data. |
| 23 |
|
| 24 |
*Does this program fit your needs? |
| 25 |
|
| 26 |
It depends on if generated data should be valid Japanese or not. |
| 27 |
Data is generated by random Japanese word choice, so if you want to |
| 28 |
do performance test with N-gram, this is good for you. |
| 29 |
|
| 30 |
*License |
| 31 |
Dictionaly files (*.csv) is a part of mecab-ipadic so these license are |
| 32 |
depends on mecab-ipadic. Please seee bellow. |
| 33 |
|
| 34 |
http://mecab.sourceforge.net/ |
| 35 |
|
| 36 |
Tritonn Project has copyright of the others things and distributed under LGPL v2. |
|