- Decode with Adrija
- Posts
- When Bias Turns Caterpillar Into A Pest
When Bias Turns Caterpillar Into A Pest
From AI labs to Aadhaar centres, bias looks the same in every loop.
A few weeks ago, I watched Humans in the Loop. I was neck-deep in edits, but I did not want to skip it; I went to the last show in Delhi. The film isn’t about technology. It’s about the people who make technology work—and who quietly disappear inside them.
A few weeks later, I’m still thinking about it. Because the film captures the humans who actually make technology work, and it’s not Musk or Zuckerberg.
Aranya Sahay’s film follows Nehma, an Oraon woman from Jharkhand who, after a divorce, takes a job as a data-labeler: drawing boxes on images, teaching an AI to “see.”
When she refuses to tag a caterpillar as a “pest” because, in her knowledge, it eats rotten plant parts, her supervisor marks her choice wrong. That tiny, human judgement is treated as an error.
The film lets that sting sit.
Image by Walter del Aguila
It’s a question we often ask ourselves when reporting on Artificial Intelligence: Whose knowledge counts when machines learn?
Sahay’s film is inspired by Karishma Mehrotra’s story ‘Human Touch’ on FiftyTwo, published in 2022.
Going back to the film (for those of you who haven’t seen it but want to), there are no spoilers because the film isn’t quite a mystery. We all use “automated” tools every day—chatbots, photo tagging, navigation. The film makes you face that tiny detail we choose to forget: Many of those tools are trained, in part, by hands like Nehma’s.
The data annotation project is huge. Industry and analysts put India’s annotation potential in the headlines: a nascent market worth hundreds of millions (and projected to grow into billions), with tens of thousands of people already in the workforce and many more expected.
There’s a name for the phenomenon: ghost work. Mary L. Gray and Siddharth Suri coined it to describe the invisible human labour that props up supposedly “automated” systems. Ghost work is not mythical; it’s hundreds of thousands of real jobs, often low-paid and precarious, performed by people whose faces never show up in product demos.
Numbers make it concrete.
Industry notes and reporting place tens of thousands of annotation workers in India; one recent industry snapshot cites around 70,000 professionals engaged in data annotation work, concentrated in smaller cities and rural hubs — and many of them are women.
These workers feed datasets that big models then replicate and globalise.
The caterpillar in the film was the perfect metaphor to illustrate everything that is wrong with technology systems: it is exclusionary, biased, and does not care for the truth.
If you are wondering why I am still thinking of a caterpillar, it is because biased training data becomes biased decisions, which can amplify into disasters.
Face-recognition and image models perform worse on darker skin; a study by two researchers from MIT Media Lab and Microsoft Research called ‘Gender Shades’ famously found error rates as high as ~35% for darker-skinned women in commercial classifiers, while lighter men were almost never misclassified. That isn’t an academic quibble — it’s a predictable outcome when the people and contexts that produce training data are narrow.
A few months back, my colleague Hera Rizwan had done a story on the introduction of Facial Recognition Technology in the take-home-ration scheme.
Disha Verma, a human rights and technology researcher, had told her then, “FRT often fails with elderly or dark-skinned individuals.”
You see how a biased tech system leads to real problems?
Bias also shows up in subtler, cultural ways. When you ask image generators for a “beautiful woman,” popular models disproportionately return young, slim, fair-skinned faces — a loop that normalises a narrow standard of beauty and erases other aesthetics.
Navigation systems trained on popular urban routes mislead drivers in less-mapped landscapes; in India, there have been several dangerous misdirections when mapping datasets fail to account for rural or seasonal roads. These aren’t bugs. They’re the arithmetic of whose data was used to train the system.
Two days after the film I was in Jharkhand reporting on tech and welfare. The film’s quiet indignation found a louder echo there.
Digitisation that promises efficiency often becomes a maze that denies basic rights. Aadhaar mismatches, biometric failures, and uncompromising e-KYC requirements have left people without ration and rights.
The labour of fixing these “digital errors” — standing in queues, travelling to enrolment centres, reconciling names and dates — falls disproportionately on women because household work is still not considered to be work.
So we have two loops feeding each other.
An Oraon woman corrects a machine’s eye on a caterpillar. A mother in another Jharkhand village spends days correcting a machine’s record of her child’s birthdate.
Both must translate messy, lived reality into tidy, machine-friendly inputs. Both are punished when their lived truths don’t fit the formats someone else designed.
That reveals where the real bias lives: not only in datasets, but in design choices.
When labels are decided far from the place of impact, models harden into decisions that erase local knowledge. When identity systems treat messy lives—different names, worn fingerprints from manual labour, children’s undocumented births—as fraud or error. And when geography, last name, the number of phones you own, and who are your friends shape who is seen and who is denied.
We need to ask ourselves: What happens when the truths of people like Nehma — the small, careful, lived understandings of land and life — are constantly corrected by systems that can’t recognise them? What kind of intelligence survives when we keep erasing the human in the loop?
And, maybe most uncomfortably, when the next AI tool dazzles us with convenience, will we stop for a second to ask: whose invisible hands trained it, and whose mistakes it will refuse to see?
MESSAGE FROM OUR SPONSOR
Looking for unbiased, fact-based news? Join 1440 today.
Join over 4 million Americans who start their day with 1440 – your daily digest for unbiased, fact-centric news. From politics to sports, we cover it all by analyzing over 100 sources. Our concise, 5-minute read lands in your inbox each morning at no cost. Experience news without the noise; let 1440 help you make up your own mind. Sign up now and invite your friends and family to be part of the informed.
🔥What Caught My Attention
Justice Vs Myth#JusticeForZubeenGarg has become the defining refrain of Assam’s mourning. It has echoed across rallies, vigils, and social media — where grief has blurred into rage, and rage into a storm of misinformation and conspiracy theories about how the 52-year-old artist died in Singapore. Read the Decode story. |
![]() | Warning From IndiaIn this piece, Aman Sethi argues that Britain’s proposed national digital ID scheme (Brit Card) risks repeating many of the harms seen in India’s Aadhaar system — mass surveillance, exclusion of vulnerable populations, and data breaches. |
![]() | Spam And ShameInside a Facebook group that sprung up around FDA’s sudden nod to leucovorin as a possible autism treatment, the Wired story describes how parents, marketers, and conspiracy theorists flooded the space — mixing medical advice, speculation, shaming, affiliate links and confusion. |
Incompatibility AlertAnd if you care about the climate, Le Monde has an interesting piece that argues that AI’s energy demands (data centers, training models, etc.) are increasingly at odds with efforts to decarbonize. It warns against blind techno-optimism. |
Got a story to share or something interesting from your social media feed? Drop me a line, and I might highlight it in my next newsletter.
See you in your inbox, every other Wednesday at 12 pm!
Was this email forwarded to you? Subscribe