AI Censorship: How Algorithms Shape What You Can Say
AI content moderation — the use of machine learning to detect and suppress speech — now governs billions of online interactions. Its errors, inconsistencies, and hidden biases constitute a new form of speech control that operates without transparency or accountability.
When most people think of censorship, they imagine a government official deciding to ban a book or prosecute a speaker. Modern AI censorship is nothing like that. It operates invisibly, at massive scale, through probabilistic classifications that no one fully controls.
Platforms use AI for multiple speech-affecting functions:
Content removal: AI classifiers scan posts for violations of platform policies — hate speech, misinformation, violence, spam. They remove or flag content without human review for a large fraction of decisions.
Shadowbanning: AI systems can reduce the distribution of content without removing it — reducing its visibility in feeds, search results, and recommendations. Shadowbanned speakers may not know their content has been suppressed; their posts still appear to them but reach far fewer people.
Recommendation: Algorithmic content recommendation determines what content users see. Choices about what to amplify and what to suppress are embedded in the recommendation system.
Account actions: AI systems make initial decisions about account suspensions, strikes, and bans.
The problems with AI censorship are structural:
Error rates: Even a 1% error rate applied to billions of posts means millions of wrongful suppression decisions. These errors fall disproportionately on speakers with non-mainstream speech patterns, speakers of non-dominant languages, and speakers discussing topics the training data labeled problematic.
Opacity: AI moderation decisions are not fully explainable even to the engineers who built them. Speakers who are wrongly suppressed often have no clear path to understanding what policy they allegedly violated.
Feedback loops: AI systems trained on past moderation decisions can perpetuate and amplify historical biases in what has been classified as problematic speech.
Inconsistency: The same content may be treated differently depending on who posts it, when, in what language, and with what surrounding context — with no principled explanation for the variation.