Pilotfish with eggplant: understanding how Hatebase detects hate speech in (near) real-time

pilotfish
n. A small fish which swims alongside a shark in a symbiotic relationship, devouring parasites in return for protection Wikipedia

When we relaunched Hatebase at the end of last year, we deployed not only a wealth of new data attributes (e.g. targeted groups, plurals, transliterations) and a new API (now v4.1); we also included a complete bottom-up rebuild of HateBrain, the natural language processing (NLP) engine at the heart of Hatebase, which is responsible for (as of this month) 738,000 regionalized, timestamped hate speech sightings.

We’ve launched an official changelog on GitHub to help users keep track of the many enhancements to HateBrain’s capabilities but here’s a quick rundown of what you’ll find inside the box.

DIY

At the top of the list are the new HateBrain endpoints released in API v4.1: analyze and get_analysis. These endpoints enable Hatebase users to interact directly with HateBrain, retrieving an assessment of any user-submitted content.

Being able to analyze custom content means that moderators of ecosystems with large volumes of user content can now incorporate HateBrain into their own workflows, reducing the need for human moderation. For researchers, HateBrain is now available to help classify external datasets.

Here’s how to use the new API endpoints:

- Register for a plan on Hatebase
- Once approved, self-provision an API key
- Review the API documentation and then submit a piece of content to the new analyze endpoint, which will return a request_id
- Use the request_id with the get_analysis endpoint to receive an assessment of your submitted content

Weighted, probability-based analysis

Instead of simply returning true or false to determine whether a given piece of text constitutes hate speech, HateBrain now provides a measure of probability. Specifically, these are the odds that the content in question contains language used in a hate speech context. This percentage is derived from the application of myriad weighted rules. For instance, if a known term is preceded by an inflammatory adjective, the resulting probability will inch higher depending on the weight assigned to the “preceding adjective” rule. Likewise, if a piece of user content seems to be directed at an individual (e.g. “you’re a…” or “she’s a…”) then the probability will rise according to the weight assigned the “targeted individual” rule.

Probability outputs delivered through the API also come with a short explanation of how some of these rules were applied, although for reasons of length and complexity these explanations tend to be extremely short.

Now with fewer homonyms

One of the challenges of prior builds of HateBrain was avoiding cross-language homonyms: (usually short) words which have one meaning in one language and a completely different meaning in another language. For example, the English word “broad,” which is an archaic slur for female, is also the Breton word for “nation.” Until recently, portions of Hatebase’s vocabulary were restricted from automated analysis because of the high likelihood of these sorts of false positives.

This problem has been solved using real-time language detection (generally accurate for content exceeding 50 characters) and a form of recursive regional usage analysis whereby each valid sighting reinforces geographical regions where the context of a term has already been verified as relevant.

A side benefit of incorporating language detection is that HateBrain is now increasingly multilingual, reducing the artificially high proportion of English language sightings that resulted from the application of English linguistic rules to non-English languages (and non-Latin character sets).

Swimming with sharks

HateBrain leverages a lexicon of “corollary” language in addition to hate speech vocabulary. We refer to these significant n-grams as “pilotfish” after the smaller fish which swim symbiotically with sharks, devouring parasites in return for protection. The largest number of pilotfish in HateBrain are “intensifiers” — generic insults, slurs, and other units of malignant language which tend to accompany hate speech more frequently than homonyms, thus increasing the probability assessment that a given piece of content contains hate speech.

The next largest group of pilotfish in HateBrain are xenophobic references, which include everything from allusions to social parasitism (e.g. “handouts,” “welfare,” “on the dole”) to attempts at dehumanization (e.g. “cockroach,” “termite,” “animal”) to allusions to white supremacy (e.g. “sturmabteilung,” “RaHoWa,” “race mixing”).

Sometimes an eggplant is just an eggplant…

…and sometimes it’s a phallic allusion intended to sexually harass women in emails or chat sessions. Emojis are the third largest group of pilotfish in HateBrain, and include a wide variety of ethnic, nationalist and sexual double entendres that were never intended by the Unicode Consortium.

Leetspeak obfuscation

Like emojis, leetspeak (e.g. “l33tsp34k”) is sometimes repurposed to obfuscate hate speech in public conversation, and so HateBrain now incorporates a rudimentary leet encoder/decoder to unobfuscate common constructions in vocabulary and pilotfish.

Future enhancements

We’re continuing to iterate the HateBrain engine by adding additional rules and pilotfish, with a particular focus on increasing HateBrain’s multilingual capabilities. At this point, the challenge of throughput has, for the first time, surpassed the challenge of reducing false positives. Our database is ingesting tens of thousands of datapoints every day and converting approximately 10% into timestamped, geotagged sightings, so future enhancements are likely to focus on increasing capacity as much as linguistic refinement.

Cover photo by Matt Helbig