Tel.

+86
-18210599182

E-mail

contact@dataoceanai.com

Leave Info.
TOP

Korean and Japanese Databases Overview

2020.09.26

Speechocean: AI Data Resource and Data Service Provider

 

Till now, Speechocean has 10,273 hours off-the-shelf Korean and Japanese speech recognition corpora, 32 hours speech synthesis corpora, 107,664 entries of Lexicon, 400,038 sentences for NLP and 1,066 OCR images can be licensed.

 

Please check the forms below:

 

                           Speech   Recognition Corpus                           

    Language     

Speakers    

Hours

South Korean

4,870

2,960

North Korean

1,202

965

Japanese

8,482

6,348

 

                                Speech Synthesis   Corpus                          

Language

Gender

Hours

South Korean

Male

11

South Korean

Female

13

Japanese

Male

8

 

                            Lexicon                       

Language

Entries

South Korean

107,664

 

                                                                       NLP                                                                                

Language &   Content

Sentences

Japanese SMS Corpus with POS   and NER

200,011

Chinese-English-Japanese-Korean   Parallel Corpus

200,027

 

                                        CV                                             

Language &   Content

Pieces

Japanese OCR Images

1,066

 

Speechocean always devoted itself to providing engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI. Our business involves various domains such as speech recognition, speech synthesis, computer vision, lexicon, and natural language processing and provides relevant services for the design, collection, transcription, annotation, etc. of data.

 

If you have any further inquiries, please do not hesitate to contact us.

Email: marketing@speechocean.com


Telephone
Leave Information
Member