Tel.

+86
-10-62660053

E-mail

contact@speechocean.com

Leave Info.
TOP
Southeast Asian Databases Overview
2020.09.12

Speechocean: AI Data Resource and Data Service Provider

 

Till now, Speechocean has 13,125 hours off-the-shelf Southeast Asian speech recognition corpora, 18 hours speech synthesis corpora and 5 Southeast Asia pronunciation lexica can be licensed.

 

Please check the forms below:

 

Speech Recognition Corpus
                              Language                                 Speakers       Total Hours   
Indonesian1,0632,769
Malay1,7262,075
Tagalog257424
Thai1,2163,463
Tamil1,4331,019
Vietnamese1,4461,595
Singapore English404710
Filipino English207326
Filipino American English100172
Indonesian English804378
Vietnamese American English100194


 

Speech Synthesis Corpus
           Language                          Gender                              Hours              
PortugalFemale13.45
PortugalMale4.21


 

Lexicon
                    Language                               Entries           
Thai114,572
Malay101,799
Vietnamese104,088
Burmese100,000
Khmer101,895


Speechocean always devoted itself to providing engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI. Our business involves various domains such as speech recognition, speech synthesis, computer vision, lexicon, and natural language processing and provides relevant services for the design, collection, transcription, annotation, etc. of data.


If you have any further inquiries, please do not hesitate to contact us.

Email: marketing@speechocean.com


Telephone
Leave Information
Member