
Content Moderation
When content moderation systems become opaque arbiters of acceptable speech.
Episode 9: The Rules Behind the Rules
Sociable Systems
The Guidelines You Can Read
Every major platform publishes community guidelines. They're right there, linked in the footer, available for inspection. No hate speech. No harassment. No violent threats. No misinformation. No spam.
The language is clear. The categories seem reasonable. The document exists precisely so that users can understand the rules of participation.
Here's what the guidelines don't tell you:
How the algorithm ranks your content. What signals trigger reduced distribution. Why your post reached 47 people instead of 4,700. Whether you've been quietly demoted. What you'd need to change to reach your previous audience again.
The community guidelines are the visible rules. The ranking algorithm is the actual governance. And the ranking algorithm is opaque in ways that make credit scoring look transparent.
The Shadowban That Doesn't Exist
Platforms deny shadowbanning exists. This is a semantics game.
What they mean: "We don't have a button labeled 'shadowban' that makes your account invisible."
What users experience: Their content stops appearing in feeds. Their replies get buried. Their reach collapses. They can still post. Nobody sees it.
Platforms prefer terms like "reduced distribution" or "limited recommendation eligibility." These phrases describe the same functional outcome while avoiding the word that makes it sound punitive.
The distinction matters because it shapes what you can contest.
If you've been banned, you know. You receive a notification. You can (theoretically) appeal. There's a decision you can point to.
If you've been "reduced," you may not know at all. Your content still exists. Your account still functions. You just... fade. The algorithm stopped showing your work to people, and nobody told you why.
You cannot appeal a decision you cannot prove happened.
The Inference Problem
Say you're a creator. Your videos used to get 50,000 views. Now they get 800. Nothing obvious changed. Same format. Same quality. Same posting schedule.
What happened?
You don't know. You cannot know. The platform won't tell you.
Maybe you triggered a classifier. Maybe you used a word that got added to a suppression list. Maybe your audience's behavior patterns shifted in ways the algorithm interpreted as declining interest. Maybe your content got caught in a policy update you weren't informed about. Maybe nothing happened and this is just variance.
You're left to experiment. Change your thumbnails. Adjust your language. Post at different times. Read forums where other creators share theories about what the algorithm "wants."
This is cargo cult optimization. You're watching outputs and trying to reverse-engineer inputs from a system designed to resist exactly that. The platform knows the rules. You know your view count. The asymmetry is by design.
The Moderation Lottery
Content moderation at scale requires automation. This isn't controversial. No platform can employ enough humans to review every post in real time.
What's controversial is how the automation fails, and who absorbs the failure.
Automated systems make errors. They flag satire as sincerity. They miss context that humans would catch. They enforce rules inconsistently across languages and cultures. They penalize discussions about harmful content as though they were the harmful content itself.
These errors aren't random. They pattern. They hit some communities harder than others. They affect some topics more than others. They create a landscape where certain conversations become structurally risky to have.
And the affected users cannot see the pattern. Each person experiences their own individual moderation event. They don't see the aggregate. They don't know if they're alone or part of a systematic failure. They have a sample size of one.
The platform sees the pattern. The platform has the data. The platform knows which classifiers are misfiring and in what directions. This information is not shared.
The Appeal to Nowhere
When you do get a visible moderation action (post removed, account suspended, strike issued), you can usually appeal. The button exists. The form accepts input.
What happens next is less clear.
Your appeal goes... somewhere. Someone (or something) reviews it. A decision comes back, often in language nearly identical to the original notice. "We reviewed your appeal and determined that the original decision was correct."
What was the review process? Who conducted it? What evidence was considered? What standard of proof applied? Can you submit additional context? Can you speak to a human?
These questions don't have public answers.
The appeals process satisfies the formal requirement that a review mechanism exists. It does not constitute interrogation in any meaningful sense. You cannot see the reasoning. You cannot challenge the framework. You get an outcome, and the outcome is final unless you're famous enough to make noise on a competing platform.
This is explanation without interrogation. The ceremony of due process without its substance.
The Reality Engine
Here's where content moderation opacity gets epistemically strange.
The algorithm doesn't just decide what you're allowed to say. It decides what everyone else sees. It constructs the information environment in which public reality gets negotiated.
If certain topics get systematically suppressed (whether by policy or by classifier error), those topics become harder to discuss. If certain framings get amplified and others get buried, the amplified framings become "what people are saying." If certain sources get preferred in ranking, those sources become authoritative.
The platform is not just moderating content. It's editing reality.
And the editing is invisible. You see your feed. You assume it represents something like "what's happening" or "what people think." You don't see the shaping. You don't see what got removed, downranked, or never shown to you. You experience the output as a window when it's actually a painting.
Clarke's magic, operating at the level of shared truth.
The Legibility Problem
Content moderation has a unique opacity challenge: the rules change constantly.
Credit scoring models update periodically. Insurance pricing factors shift over time. But social platforms adjust their ranking algorithms continuously, sometimes multiple times per day. They respond to news events, gaming attempts, advertiser pressure, regulatory scrutiny.
A post that was fine yesterday might be suppressed tomorrow. Not because the policy changed officially, but because the classifiers got updated, the weights got adjusted, the context shifted.
You cannot stay current with rules that won't hold still.
The platform might publish a policy update. (They often don't.) But the policy document describes intent. The classifier implements reality. The gap between them is where content goes to die, and you'll never see the gap because the classifier's behavior is proprietary.
When Speech Becomes Score
Platforms increasingly think about users the way credit bureaus think about borrowers.
Your account has a reputation score. (They don't call it that.) Your content history affects how your future content gets treated. Strikes accumulate. Trust levels rise and fall. The algorithm remembers.
This makes sense as a spam-fighting measure. Accounts with histories of policy violations probably warrant more scrutiny.
But it also means your past constrains your present in ways you cannot see. Maybe that joke that got flagged three years ago is still counting against you. Maybe your "trust score" got dinged by an automated error you never knew about. Maybe you're in a bucket labeled "borderline" and all your content gets extra friction.
You don't know. You cannot audit your own reputation. You experience the drag without seeing the weight.
What Interrogation Would Require
Apply the Clarke test. If content moderation were genuinely interrogable, users would need:
Distribution transparency. How many people was this post shown to? How does that compare to my baseline? If it was suppressed, why?
Classifier disclosure. What automated systems evaluated my content? What signals triggered what classifications?
Policy specificity. Which specific rule did I allegedly violate? Not the category. The actual standard applied.
Evidence access. What did the reviewer (human or automated) actually see? Can I see what they saw?
Substantive appeal. Can I present context? Can I challenge the framework? Can I speak to someone with authority to change the decision?
Track record visibility. What is my account's standing? What factors are counting for or against me? Can I see my own history the way the platform sees it?
None of this is standard. Most platforms offer none of it. The ones that offer some offer it partially, grudgingly, and in response to regulatory pressure rather than user demand.
The Private Governance Problem
A generation ago, speech governance was primarily a matter of constitutional law. The government couldn't restrict your expression without due process. Private parties could set their own rules, but their reach was limited.
Now the reach isn't limited. A handful of platforms control the infrastructure through which most public communication flows. Their rules govern more speech than most governments ever dreamed of regulating.
And their rules are opaque. Not in the "secret government surveillance" sense. In the "proprietary business logic" sense. They're not hiding policy from constitutional scrutiny. They're hiding ranking algorithms from competitive pressure.
The effect is the same: governance without transparency. Authority without accountability. Power exercised at scale with no right of interrogation.
Clarke's threshold crossed at civilizational scale, defended as terms of service.
Tomorrow
Public-sector eligibility systems. Where algorithms decide who gets benefits, who gets investigated, and who gets cut off. The opacity moves from private platforms to government services.
Same question: Where does opacity end debate?
(Spoiler: the fraud detection model has opinions about your zip code.)
Catch up on the full series:
- Ep 8: [The Price of Being Known]
- Ep 7: [The Number That Speaks for You]
- Ep 6: [The Authority of the Unknowable]
- Ep 5: [The Calvin Convention]
- Ep 4: [The Watchdog Paradox]
- Ep 3: [The Accountability Gap]
- Ep 2: [The Liability Sponge]
- Ep 1: [We Didn't Outgrow Asimov]
Enjoyed this episode? Subscribe to receive daily insights on AI accountability.
Subscribe on LinkedIn