Tel.

+86
-10-62660053

E-mail

contact@speechocean.com

Leave Info.
TOP
Chinese Dialects Speech Databases Overview
2020.08.30

Speechocean: AI Data Resource and Data Service Provider   


 Till now, Speechocean has launched multiple off-the-shelf Chinese dialect databases that can be licensed, including:   

-- 30,000+ hours speech recognition corpora 

 -- 18 hours Cantonese speech synthesis corpora 

 -- 5 kinds of lexica with 700,000+ entries in total   


 Please check the details below:   


Speech Recognition Corpus

Dialect

Speakers

Hours

Sichuanese

1,002

2,151.6

Shanghainese

1,000

2,409

Cantonese

6,134

7,600.5

Hokkien

748

553

Tibetan

1,011

638.7

Uyghur

2,480

1,768.13

Kazakh

406

576.7

Multi-accented Mandarin

12,047

23,250.2

 

Speech Synthesis Corpus

Dialect

Gender

Hours

Cantonese

Female

9.37

Cantonese

Male

9.56

 

Lexicon

Dialect

Entries

Sichuanese

100,346

Shanghainese

129,992

Cantonese

100,810

Hokkien

262,099

Tibetan

106,818


Speechocean always devoted itself to providing engineering data products and services to enterprises and scientific research institutions in the whole industry chain of AI. Our business involves various domains such as speech recognition, speech synthesis, computer vision, lexicon, and natural language processing and provides relevant services for the design, collection, transcription, annotation, etc. of data.

 

If you have any further inquiries, please do not hesitate to contact us.

Email: marketing@speechocean.com


Telephone
Leave Information
Member