+86 -18210599182
contact@dataoceanai.com
Speechocean: AI Data Resource and Data Service Provider
In recent years, the economy of Southeast Asian countries has achieved rapid development. Therefore, amount of AI technologies are implemented in Southeast Asian and drives the demand for more speech corpus.
In addition to providing one-stop data solution, Speechocean also has 13,000 hours off-the-shelf Southeast Asian speech corpora and 3 Southeast Asia pronunciation lexica can be licensed.
Please check the forms below:
ASR Corpus | ||
Language | Speakers | Total Hours |
Thai | 1,216 | 3,463 |
Indonesian | 1,069 | 2,800 |
Malay | 1,726 | 2,075 |
Vietnamese | 1,070 | 1,264 |
Urdu | 583 | 1,148 |
Tagalog | 257 | 507 |
Singapore English | 404 | 710 |
Filipino English | 207 | 326 |
Filipino American English | 100 | 172 |
Vietnamese American English | 100 | 194 |
Pakistani American English | 100 | 199 |
Lexicon | |
Language | Entries |
Urdu | 101,211 |
Vietnamese | 104,088 |
Malay | 101,935 |
Speechocean always devoted itself to providing specialized engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI. Our business involves various domains such as speech recognition, speech synthesis, computer vision, lexicon, and natural language processing and provides relevant services for the design, collection, transcription, annotation, etc. of data.
If you have any further inquiries, please do not hesitate to contact us.
Email: contact@speechocean.com