Content moderation for UGC in the world of generative AI

Can content moderators unpick the reality of generative AI content? Possibly not.
27 June 2023

Content moderation – it’s not like it used to be.

• Content moderation for UGC is complex in the GenAI world.
• AI works at scale, humans for nuance.
• There is no technology good enough yet to tell AI deepfakes from human creation.

Content moderation for user-generated content (UGC) is getting more and more complex as the media in which content is published across the internet diversifies.

That means the nature of how, when, and along what lines the job of content moderation is done has shifted significantly since the days when the internet was mostly made up of text, and it was, above all, a fairly considered exercise to add UGC to the electronic world, rather than the work of seconds.

In Part 1 of this article, we spoke to Alex Popken, VP of Trust and Safety at WebPurify, a content moderation company that combines AI and human moderation in its mix, to explore a rising tide of harmful UGC, and how content moderation was working to combat it.

In Part 2, we explored the importance of – and the thorns around – the business of regulating UGC, which has become increasingly complicated in the pan-global world of the modern internet.

While we had Alex in the chair, we asked her about the rate of change, and the challenge of content moderation of harmful UGC in that rapidly evolving world – particularly in the world of generative AI.


You moderate content across the board, irrespective of what the platform is – company sites, social media platforms and so on. As the world of the internet has evolved, has it become more difficult to tell where the lines of harm are? Or are they still fairly straightforward across the media in which people are creating harmful UGC?


I think it’s very difficult to tell where the lines are now. We talked in Part 2 about where the line is between freedom of speech and censorship. Freedom of speech is obviously critically important, but not if it’s used to incite violence, or spew hatred, or otherwise make the internet a cesspool.

You have to strike that important balance where you allow counter-speech, but don’t allow content that can harm individuals and communities of people.

That’s why it’s really important to have clear and transparent guidelines, enforcement practices and moderation that is consistently applied. And it’s really important to be accountable and own up when you make a mistake.

It’s helpful when clients work with us to create a workflow for our moderators, because it’s one thing to have a policy that says “No hate speech,” but what does that mean in practice? And how can we make that a symptom-based decision tree that removes things like personal bias from the decisions being made?

So we really try to work with our clients to take this complex policy and make it as objective and straightforward as possible.

Content moderation around hate speech UGC.


And of course, hate speech has evolved. There’s the straightforward version – recognized hateful words and epithets – but increasingly there are more subtle ways of doing things. Words with secret meanings to only those “in the know.” How does WebPurify divide up the workload? Is it all human? AI-assisted? Algorithmic?


It’s a combination of both AI and human reviewers. AI can typically detect that which is obviously violative – or obviously benign. The most clear-cut examples of hate speech, for example, can usually be spotted by AI.

But as you mentioned, depending upon different languages or countries or groups, there are coded language contexts, things that are really difficult to detect with machines. And that’s where we layer on human review. So, again, the workflow is a combination: AI solves for scale, it can review vast amounts of content, and detect that which is obviously egregious or benign. But typically, a subset of content is sent for human review. And that is the stuff that’s more nuanced, that really requires that human eye.

Content moderation for UGC inthe Metaverse.

Content moderation – not just text, like it used to be.

It can also be really important to make sure that you’re engaging your user base and getting user reports, because our humans aren’t perfect, either. And we don’t have every single native language solved for, we don’t have people in every single country moderating content. Getting that signal from your user base on that which may have evaded machine and human examination is really important.


A benign feedback loop? Things get missed because the humans don’t have linguistic nuances solved for in all languages, so you get feedback from users who are more on the ground level to show you what you missed, and flag it for the future.

That raises the question of free speech again, doesn’t it? Things that are classed as hate speech in one jurisdiction wouldn’t be in another, and other things that would be no problem in some places would be very touchy in others. It boggles the mind how content moderation of UGC can operate across borders and jurisdictions when we can’t entirely agree what hate speech is worldwide.


Yeah. When you’re operating on a global scale, it’s so hard. And typically, what platforms will do is have a set of global principles and standards, like “No hate speech, harassment, violence.”

But as you mentioned, you’re also dealing with countries that have local laws and regulations. And so that makes it extremely hard to please everyone, or to apply rules consistently, because maybe it’s not a one-size-fits-all approach.

It’s really important that as companies are crafting their guidelines, they’re partnering with external bodies and local civil society. That way, they’re considering user feedback on policies, how they’re implemented, and really understanding what that unique cultural context is, because, for instance, if we take nudity as an example, nudity in the US is used very differently than nudity in the Middle East. But from an enforcement standpoint, it can be extremely challenging to solve for all of these nuanced cases.


What does that do to your brain on a day-to-day basis? Content moderation for harmful UGC means seeing all the worst things on the internet, probably more regularly than most people ever have to, doesn’t it? Or, because of the AI’s ability to spot the obvious, is that no longer so much of an issue?


Yeah – we make sure that we are taking all the steps possible to make sure that we’re prioritizing moderator wellness. Some people think content moderation of harmful UGC means our moderators see the worst of the worst. And it can be – we absolutely do have teams that specialize in child sexual abuse material, which is definitely among the worst.

But we also have workflows that are about reviewing personalized products, and there’s a lot of benign content. So you’re kind of grappling with two things here.

For the folks who are exposed to sensitive content, we make sure that we’re implementing capabilities in our tools that minimize unnecessary exposure to harmful content. That can involve the blurring of graphic content, or grayscale, which has been proven to reduce the psychological stress of being exposed to graphic content.

We mandate breaks for our team and ensure they have access to 24/7 counselling – all of these things are critically important.

Content moderation for harmful UGC requires regular breaks.

Regular breaks are important in content moderation.

And then on the flip side, if you’re consistently exposed to benign content, that can take its toll too, because it’s like looking for a needle in a haystack – it can get quite repetitive and monotonous.

So there are definitely tactics that we employ there, making sure that our offices are really fun, with folks recharging and taking breaks between sessions.

But we also definitely underscore to our moderators how critically important the work is that they’re doing, and that goes a long way in job satisfaction. For example, our team that reviews child sexual abuse material put 500 child sexual predators behind bars just last year. That’s incredibly meaningful.

Content moderation of generative AI UGC.


We talked earlier about how you use AI to solve for scale. What impact is generative AI having in terms of the scale of the challenge of content moderation for harmful UGC these days?


It’s definitely making moderation more complex, and it’s starting to pose novel, advanced challenges.


Such as?

Content moderation for UGC still has some way to go.

While it helps moderate at scale, there’s some way to go with generative AI yet.


Such as copyright. It’s something that all the IP lawyers of the world are going to be struggling with, where people use generative AI to create copyrighted content in the likeness of a particular artist, when the artist has not provided consent, and gets none of the profit.

We’re seeing more convincing scams. So, you know, people using this technology for voice cloning or to create really convincing phishing emails, for example.

What really concerns me here is content authenticity – deepfakes or disinformation. And the reason why that concerns me is that there’s no widely-available technology today – that I’m aware of – that can effectively determine the authenticity of content and whether or not it has been generated by AI versus being created by humans.

Therein lies the challenge.

Therein lies the risk.

And that risk can be quite existential if people are exposed to subtle disinformation. That really erodes trust in platforms, in governments, and in society as a whole.

That is the most concerning thing I see in terms of moderation. OpenAI has AI to moderate its AI, so it has AI layered on top of ChatGPT. And that is really scanning both the user input of the model and also the output of the generative AI, which has been trained on vast amounts of internet data.

That is looking to impose guardrails or capture misuse. But again, it’s relatively straightforward to be able to detect a nude image or profanity. It’s a very different thing and a very complex thing to detect disinformation or subtle biases. So that’s the bucket of content that I’m probably most concerned with.

The other thing I’ll say about content moderation is that humans are being used to test AI. So we are seeing job descriptions like prompt engineers, or people whose job it is to ferret out weaknesses.

So they’re trying to intentionally trip up these generative AI models by asking them to do certain things, like a red team exercise where they’re trying to expose these gaps, so that the developers can make sure they’re plugging these gaps on the back end. But it’s challenging, and I don’t think moderation has been fully figured out here. I think we’re going to see challenges in this space for years to come.


Content moderation for UGC – it’s one of the boom industries of the next 50 years.

The modern internet. Content moderator – Phoebe Buffay.