Tel.

+86
-10-62660053

E-mail

contact@speechocean.com

Leave Info.
TOP

Asian Audio Databases Overview

2020.10.30

Speechocean: AI Data Resource and Data Service Provider


Till now, Speechocean Asian voice datasets cover more than 20 Asian countries, more than 30 languages and dialects, with a total duration of over 100,000 hours and nearly 3,000,000 entries can be licensed.

Please check the forms below:

 

Speech Recognition   Corpus

Language

Total Hours

East Asia (Mandarin,   Cantonese, Tibetan, Uighur, Chinese dialects, Japanese, South Korean, North   Korean, etc.)

75,000+

Southeast Asia (Vietnamese,   Indonesian, Malay, Thai, Tagalog, Portuguese, etc.)

12,000+

South Asia (Hindi, Tamil,   Urdu, etc.)

6,500+

West Asia (Arabic, Turkish,   Greek, etc.)

2,000+

Central Asia (Russian,   Kazakh, etc.)

2,500+

 

Speech Synthesis   Corpus

Language

Hours

Mandarin, Cantonese, Arabic, Turkish, Japanese, Korean

400+

 

Lexicon

Language

Entries

East Asia (Mandarin, Cantonese, Sichuanese, Uighur, Korean,   etc.)

2,000,000+

Southeast Asia (Thai, Vietnamese, Malay, Khmer, Burmese,   etc.)

500,000+

South Asia (Hindi, Urdu, etc.)

200,000+


Speechocean always devoted itself to providing engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI. Our business involves various domains such as speech recognition, speech synthesis, computer vision, lexicon, and natural language processing and provides relevant services for the design, collection, transcription, annotation, etc. of data.


If you have any further inquiries, please do not hesitate to contact us.

Email: marketing@speechocean.com


Telephone
Leave Information
Member