In comparison with open-source data, we provide more excellent engineering data.
Rich data collection experience in 70+ countries and regions; and all data are recorded by a local speaker.
With high data security, all data collected are totally authorized by speakers.
Sampling (of gender, age and accent) is made in balance; and the speakers are selected in strict accordance with customer demands.
Offline recording. Our control over real information of the speakers is more reliable.
Real-time control over recording quality; data below the standard are recorded again on site.
A team of linguistic experts all over the world assists in making the lexicon.
With a strong data making capability, we attracted 60,000+ people to participate in 50,000+ hours data recording in 2018, and annotated data 140,000+ hours.
Independently developed recording software and processing tools are used to complete data delivery efficiently in high quality.
Precision rate of the database reaches at least 95%; and Kaldi model verification service can be provided.