+86 -18210599182
contact@dataoceanai.com
Speechocean: AI Data Resource and Data Service Provider
Till now, Speechocean has 10,273 hours off-the-shelf Korean and Japanese speech recognition corpora, 32 hours speech synthesis corpora, 107,664 entries of Lexicon, 400,038 sentences for NLP and 1,066 OCR images can be licensed.
Please check the forms below:
Speech Recognition Corpus | ||
Language | Speakers | Hours |
South Korean | 4,870 | 2,960 |
North Korean | 1,202 | 965 |
Japanese | 8,482 | 6,348 |
Speech Synthesis Corpus | ||
Language | Gender | Hours |
South Korean | Male | 11 |
South Korean | Female | 13 |
Japanese | Male | 8 |
Lexicon | |
Language | Entries |
South Korean | 107,664 |
NLP | |
Language & Content | Sentences |
Japanese SMS Corpus with POS and NER | 200,011 |
Chinese-English-Japanese-Korean Parallel Corpus | 200,027 |
CV | |
Language & Content | Pieces |
Japanese OCR Images | 1,066 |
Speechocean always devoted itself to providing engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI. Our business involves various domains such as speech recognition, speech synthesis, computer vision, lexicon, and natural language processing and provides relevant services for the design, collection, transcription, annotation, etc. of data.
If you have any further inquiries, please do not hesitate to contact us.
Email: marketing@speechocean.com
SpeechOcean uses cookies and similar technologies to enhance and personalize your customer experience. By clicking “Accept All”, you grant SpeechOcean permission to collect, use, and share information about your website interactions with our third-party marketing partners (such as our advertising and analytics partners) to tailor your digital experiences, our services, and advertising content for you. If you click “Decline All”, your digital experience, our services, and advertising content may not be personalized or targeted to you directly.