In the priciest office block in the central province of Henan’s Luoyang city, some two-dozen people stare at blurry street photos on their computer screens, carefully drawing squares around vehicles and pedestrians.
“I’m labeling vehicles [and] people in all different colors. Purple for bikes, green for humans, baby blue for three-wheelers,” said employee Liu Yajing of the data labeling firm Intellect Growth Technology.
Her work is part of an autonomous vehicle project. The data labels will help train a driverless car to identify and avoid hitting other vehicles or humans. Liu opens the next fuzzy photo of the same street corner, taken from a different angle. She highlights a car, a bike and a pedestrian.
Data labeling is repetitive work, but it’s the starting point for most artificial intelligence applications. China’s state council issued plans for the country to be a leader in AI by 2030, which includes preferential policies and tax breaks for local firms.
How it started
Ding Yijun said his path to become one of the investors in the data labeling firm was sheer luck. “Some people in online chat groups said we could earn some extra money [data labeling],” Ding said.
The people identified themselves as contractors of the Chinese tech giant Baidu, which was a claim Ding and his friends thought could be a scam since they were in a third-tier city like Luoyang. Still, they took a leap of faith and accepted some freelance projects.
“The business at first was based on trust. There were no contracts,” Ding said.
Once they finished the work, the clients paid on time. As their workload got larger, their clients suggested Ding and his friends register a company so that they could sign work contracts.
“[That is when] we found our clients were really big [Chinese] companies like Baidu and Alibaba,” Ding said.
They registered Intellect Growth Technology in 2019, just as investment in the sector tailed off, Ding said.
After a slow first half of 2020, the pandemic has unexpectedly accelerated the company’s growth. “Infrared sensing and facial recognition gates [have] popularized rapidly from the pandemic,” Ding said. “It has been a busy year.”
AI technology requires intensive human labor. According to Ding, the firm went from 30 employees last year to the current 150, plus another 50,000 part-time workers – including contract workers at one of the company’s warehouse sites. “We also cooperate with some schools and even prisons to do data labeling,” he said.
Prison labor is common in China, as part of the rehabilitation process called “reform through labor.”
The U.S. bans imports of products made with prison labor under federal statue 19 U.S.C. 1307, which is also the same law the American customs agency cited earlier this year when it held up cotton and tomato products produced in China’s Xinjiang region, based on information that Uyghur minorities were forced into labor.
Ding’s firm does not export its services but he hopes to expand overseas because competition within China is fierce.
“The entry requirements into this industry are low. Companies compete by offering a lower price,” he said.
Low labor costs
Much of the data labeling sector became concentrated in lower-cost areas such as Henan province, where the disposable income per capita in 2020 was 24,800 yuan ($3,800), 20% lower than the national average in China.
Worker Hu Jinhua said he earns at least $11,000 annually, plus commission. “My salary is good. The average wage here in Luoyang is about half [of what I’m getting],” Hu said.
The 22-year-old worked for the firm in his university days and joined the staff on graduating a year ago. Hu said he can sort through thousands of photos a day. The more data he labels, the faster machines learn.
For now, most workers the firm hires have at least a high school education. That might change as the data labeling evolves.
“The hardest job is labeling data from the medical industry. We even had a project to label coronavirus material,” Ding said. “We tried to find students from medical schools to do this kind of job.”
The better the company is at the job, the faster that type of work disappears. Ding’s staff used to label all types of automobiles in photos but, thanks to their work, machine algorithms can now do most of it. Only blurry photos require his staff to label by hand.
“Sometimes, those of us in the industry are worried that we are on the way to destroying ourselves,” Ding said.
The transformation might not happen that quickly, however. One of the projects his firm is working on for a client is to check a transcription software. Worker Guo Rui listens to between 40 and 50 minutes of audio samples a day and compares them to the machine transcriptions. “The [transcripts] are not that accurate,” Guo said.
Luckily, her job is protected –– for now.
Additional research by Charles Zhang