A.I. Is Learning From Humans. Many Humans.
BHUBANESWAR, India — Namita Pradhan sat at a desk in downtown Bhubaneswar, India, about 40 miles from the Bay of Bengal, staring at a video recorded in a hospital on the other side of the world.
The video showed the inside of someone’s colon. Ms. Pradhan was looking for polyps, small growths in the large intestine that could lead to cancer. When she found one — they look a bit like a slimy, angry pimple — she marked it with her computer mouse and keyboard, drawing a digital circle around the tiny bulge.
She was not trained as a doctor, but she was helping to teach an artificial intelligence system that could eventually do the work of a doctor.
Ms. Pradhan was one of dozens of young Indian women and men lined up at desks on the fourth floor of a small office building. They were trained to annotate all kinds of digital images, pinpointing everything from stop signs and pedestrians in street scenes to factories and oil tankers in satellite photos.
A.I., most people in the tech industry would tell you, is the future of their industry, and it is improving fast thanks to something called machine learning. But tech executives rarely discuss the labor-intensive process that goes into its creation. A.I. is learning from humans. Lots and lots of humans.
Before an A.I. system can learn, someone has to label the data supplied to it. Humans, for example, must pinpoint the polyps. The work is vital to the creation of artificial intelligence like self-driving cars, surveillance systems and automated health care.
Tech companies keep quiet about this work. And they face growing concerns from privacy activists over the large amounts of personal data they are storing and sharing with outside businesses.
Earlier this year, I negotiated a look behind the curtain that Silicon Valley’s wizards rarely grant. I made a meandering trip across India and stopped at a facility across the street from the Superdome in downtown New Orleans. In all, I visited five offices where people are doing the endlessly repetitive work needed to teach A.I. systems, all run by a company called iMerit.
There were intestine surveyors like Ms. Pradhan and specialists in telling a good cough from a bad cough. There were language specialists and street scene identifiers. What is a pedestrian? Is that a double yellow line or a dotted white line? One day, a robotic car will need to know the difference.
What I saw didn’t look very much like the future — or at least the automated one you might imagine. The offices could have been call centers or payment processing centers. One was a timeworn former apartment building in the middle of a low-income residential neighborhood in western Kolkata that teemed with pedestrians, auto rickshaws and street vendors.
In facilities like the one I visited in Bhubaneswar and in other cities in India, China, Nepal, the Philippines, East Africa and the United States, tens of thousands of office workers are punching a clock while they teach the machines.
Tens of thousands more workers, independent contractors usually working in their homes, also annotate data through crowdsourcing services like Amazon Mechanical Turk, which lets anyone distribute digital tasks to independent workers in the United States and other countries. The workers earn a few pennies for each label.
Based in India, iMerit labels data for many of the biggest names in the technology and automobile industries. It declined to name these clients publicly, citing confidentiality agreements. But it recently revealed that its more than 2,000 workers in nine offices around the world are contributing to an online data-labeling service from Amazon called SageMaker Ground Truth. Previously, it listed Microsoft as a client.