Data Collection

Advantages in Collection Service

Global Superior Collection Resources

With nearly two decades of accumulated global superior resources, the company is able to collect information and data on s and scenarios in 190+ spoken and written languages for different human races to support the implementation of localization projects in close to a hundred countries. 

Industrial scenarios: The company has its collection resources and capability extend to dozens of industry segments in nearly a hundred countries worldwide; these industry segments include intelligent driving, intelligent medical service, smart city, smart home, intelligent finance, intelligent education, intelligent application, and intelligent hardware, among others.

Service line: The company has professional hardware equipment for computer vision applications, including depth cameras, infrared cameras, millimeter-wave radar, Xsens glove combinations, and human body 3D scanner, to support the collection of multiple types of data such as 2D images and videos, infrared depth images and 3D point clouds; it also has microphone arrays, recording pens, professional recording studios and industrial recording equipment that support audio data capture of different languages and segment scenarios, as well as text and OCR data collection for close to a hundred languages.

Core Technology Guarantee

The company has a whole-process technology platform to guarantee the management and implementation of standardized collection. 

Algorithm control is used for real-time quality inspection, to achieve data quality control at the source.

It also possesses more than a hundred core technologies, patents, and software copyrights, as well as an independently researched and developed integrated data processing platform, to assure efficiency and quality.

Authoritative Qualification Certification

The company has strict controls over the safe production of data to ensure data security and compliance.

The production management process complies with mainstream global regulations and requirements.

The company has obtained qualification certification of authoritative systems including ISO/IEC27701 and ISO/IEC 27001, with strict control exerted over each of data production.

Data Collection Service

Collection Service Capacity

Speech Recognition Data Collection

Data in 170 languages can be collected by using mobile phones, desktops, microphone arrays, recording pens, televisions, and other devices in specified background environments such as quiet environments, public places, large conferences, vehicle exhibitions, and inside vehicles.


Global languages collection

Kids speech collection

Elders speech collection

Multi-channel far-field recording collection

Multi-people conversational speech collection

Conference collection

In-car speech collection

Emotional speech collection

Voiceprint re

Speech Synthesis Data Collection

The company supports speech synthesis data recording of native speakers in 170+ global languages, with specialized recording studios and commercial recording equipment to assure data quality.

Dialect data recording

Kids speech recording

Multi-style tone recording

Average tone recording

Emotional speech data collection

Audiobook tone recording

Host tone recording

Virtual idol tone recording

Song recording

Text Collection

The company has a pool of global professional human resources specialized in dozens of industries to support the collection and making of corpora in 190+ languages under different professional backgrounds, such as medical care, finance, and so on.

Sign language corpus

Financial QA corpus

Transfer transaction corpus

Medical QA corpus

Medical guide corpus

Life services corpus

Community QA corpus

Comment and scoring corpus

Man-machine interaction corpus

Social medium corpus

Academic corpus

Parallel co

Image and Video Collection

Collection of 2D & 3D images and videos including scenarios in different places around the world, handwritten notes in a hundred languages, people of different races and complexions, general s, animals and plants, vehicles, and other s.

3D visual data of sign language

Facial data

Human body gesture data

Children image data

Relative face data

Sports scenario data

Road traffic data

General data

Finger joint operation data

Application Scenarios

Intelligent Driving

It covers the collection of image data of vehicles and pedestrians in different environments, cities, and road conditions. It supports 2D and 3D data collection and can be used in DMS and OMS for the collection of multi-channel speech data on driver and passenger behavior from different positions when vehicles are moving at different speeds.


Smart Home

It covers the collection of speech and visual interaction data in home scenarios, including speech, voiceprints, emotional speech, faces, body movements, gestures, and other data. It can be used in scenarios such as wakeup command word control, face recognition for access control, voiceprint control, intelligent refrigerator control, robot cleaner control, and other scenarios.

Sports Life

It covers the collection of different sports images captured in environments with different light exposures, including basketball, football, hip-hop, and badminton. It can also be used in smart coaches, smart judges, and other scenarios.

Smart Security

In public area such as supermarkets, office buildings, construction sites, and stations, the collection of pedestrian behavior from a security perspective is utilized for technologies like biometric recognition and behavior monitoring. This is extensively applied in urban road surveillance, vehicle and pedestrian flow monitoring, and public safety prevention.

Smart City

It covers the collection of speech, image, and text data in intelligent buildings, intelligent transportation, self-service public facilities, and other applications.

Internet Applications

It covers the collection of speech, text, relation, and image data required by search engines, speech interaction, special photo effects, and identification applications in smart devices.


Service Process

Demand evaluation

Collection scheme customization

Trial collection

Quality confirmation

Batch collection