AI in Content Moderation: Scale, Judgement and Accountability

Insights from Sam Freeman, Founding Forward Deployed Engineer at Cinder

Why content moderation has fundamentally changed

Content moderation has always been about enforcing boundaries: deciding what content can remain visible, what must be removed, and how platforms protect users while meeting legal obligations. Historically, this work was carried out almost entirely by human moderators, either reviewing content before it went live or responding to reports after publication.

For this article, we interviewed Sam Freeman, a Founding Forward Deployed Engineer at Cinder, which provides a range of solutions from content moderation to responsible AI and “provides mission-critical infrastructure to the world’s biggest companies to protect their brand, product, and users from junk, abuse, and manipulation”. Freeman notes that this model simply no longer works at modern scale. The volume, velocity, and diversity of user-generated content mean that “having a human look at everything” is no longer feasible, regardless of budget or staffing. As a result, AI has shifted from being a support tool to becoming a core structural component of moderation systems. Crucially, Sam is clear that this shift isn’t about removing humans from content moderation systems; it’s about redefining their role.

How AI moderation actually works in practice

One of Sam’s key observations is that effective moderation systems rarely rely on a single AI model. Instead, they’re designed as pipelines that enable content to flow through progressively more sophisticated and, in turn, more expensive checks. He describes this as a funnel. At the top, fast and targeted systems such as classifiers trained to detect nudity, known terrorist imagery, or previously identified illegal content filter out obvious cases. Only content that is ambiguous or context-dependent is passed to larger language models or, ultimately, human review.

Sam cautions against the industry instinct to “throw everything into one LLM and hope it figures it out.” In his experience, this approach quickly hits a ceiling on both accuracy and cost. Combining multiple, more narrowly scoped systems is often faster, cheaper, and more reliable. This is particularly true when policies are complex or highly granular.

It’s at this final stage of human review that questions of delivery and responsibility arise. When content reaches this point, the work is often not handled directly by the platform itself. At scale, many companies rely on outsourced moderation operations, contracting third-party providers to supply trained reviewers and manage day-to-day moderation workflows. In practice, human judgement sits alongside, and is increasingly shaped by, the technical systems that route, prioritise, and constrain what reviewers see.

The hidden challenge: policies written for humans, not machines

A recurring theme in Sam’s advice is that policy quality matters as much as model quality. Most platform policies were written for human interpretation, not for machine execution. They often contain overlap, ambiguity, and edge cases that even trained moderators struggle to interpret consistently.

When these policies are fed directly into AI systems, inaccuracies are almost inevitable. Sam points out that platforms frequently misdiagnose this as a model failure when, in reality, the underlying issue is unclear policy design. The result is more work for human moderators, who are forced to correct decisions or relabel content to compensate. Clear, well-structured policies make AI moderation significantly more effective and reduce long-term operational burden.

Adapting to policy and regulatory change

Sam draws an important distinction between internal policy changes and regulatory change. Internal updates, such as redefining categories of hate speech, can often be handled quickly using LLM-based systems, albeit with a short-term drop in accuracy while models relearn boundaries.

Regulatory change is more disruptive. Sam gives examples where compliance requires architectural changes rather than policy tweaks: introducing age assurance, enforcing jurisdiction-specific rules, or geo-blocking content. He also highlights how poorly aligned or outdated regulation can actively undermine safety outcomes.

One striking example he discusses, is the unintended impact of EU privacy rules that restricted scanning of private communications. When platforms complied literally, child safety reporting dropped sharply, not because harm decreased but because detection mechanisms were switched off. For Sam, this illustrates a wider problem: regulations that don’t reflect how modern systems actually operate can force platforms into choices that reduce safety rather than improve it.

Accountability still sits with the platform

A central point Sam makes is that using AI doesn’t dilute accountability. Regardless of how many vendors, tools, or outsourcing partners are involved, legal responsibility remains with the platform hosting the content. He explains how this is typically managed in practice through layered quality assurance: business process outsourcing (BPO) providers (usually, third-party firms contracted to supply large-scale moderation work) auditing their own moderators, platforms auditing vendor outputs, and, in more mature systems, additional alignment checks to ensure consistency across reviewers. These mechanisms may help manage contractual risk, but they do not transfer regulatory liability.

Sam also notes the growing influence of external actors such as payment providers and insurers, who increasingly require evidence of robust moderation and governance practices before underwriting risk. In his view, this is accelerating a shift away from informal assurances toward demonstrable, auditable control.

The changing role (and wellbeing) of human moderators

These architectural changes also have consequences for the people still doing moderation work. One of the most under-discussed effects of AI-led moderation systems is their impact on moderator wellbeing. Automation reduces the amount of harmful material that humans are exposed to, either by removing it before review or by limiting human involvement to specific, high-value cases.

He observes that the role of moderators is evolving from “looking at everything” to acting as specialists and quality controllers, reviewing samples of AI decisions, handling appeals, and feeding nuanced judgements back into the system as training data. This not only improves model performance over time but also creates a more sustainable and psychologically safer working environment. That’s not to say that psychological risk is eliminated altogether, since edge cases and appeals can be among the most disturbing material, but it does reduce indiscriminate exposure.

Practical advice for small and medium platforms

Sam is particularly blunt about the position of smaller platforms. Many aren’t avoiding safety responsibilities deliberately; they simply don’t know what applies to them until enforcement arrives.

His advice is pragmatic. Start with a genuine risk assessment focused on where harm actually occurs, not just where fines might be issued. Use off-the-shelf moderation tools if necessary – imperfect coverage is better than none. Most importantly, write policies down, even if they evolve. Platforms already have implicit rules in mind; documenting them is the first step toward enforceable, automatable moderation.

Sam stresses that the greatest risk for SMEs is not bad faith, but ignorance. And ignorance is rarely treated kindly by regulators.

Beyond compliance

Throughout the discussion, Sam returns to the same conclusion: content moderation is no longer just a technical problem. It is a governance challenge that spans policy design, system architecture, human judgement, and regulatory interpretation.

Platforms that treat AI as a shortcut to compliance tend to optimise for growth and short-term cost savings. Those that invest in structured, transparent, and well-governed moderation systems are better equipped to adapt as expectations change. As Sam puts it, the real question is no longer whether AI is used in moderation, but whether it is used responsibly, with clear accountability and humans firmly in the loop.

Industry leading evaluation

Best practices in online safety

A global online safety community

AI in Content Moderation:
Scale, Judgement and Accountability

Share this article