Do you speak multiple languages and want to help build a more hate-resistant world? If you answered “yes” then we need your help with improving our ability to monitor hate speech around the world. We need native speakers of various languages (see below) to volunteer as little as one hour of their time helping us to identify hate speech vocabulary.

The Sentinel Project first launched our Hatebase online hate speech monitoring software in 2013 and since then it has collected over 650,000 sightings of hate speech relating to more than 1,100 terms in 90 languages worldwide. The software’s unique purpose gained early media coverage and its data gathering accomplishments have attracted widespread usage. Today, more than 1,000 academics, NGOs, research organizations, government agencies, and even businesses now use data from Hatebase to better understand and respond to the problem of online hate.

While we’re proud of the success that Hatebase has seen, we also acknowledge the limitations of both its technology and dataset. That’s why we’re working hard to improve both aspects of Hatebase, which is needed now more than ever as nationalism, political polarization, and xenophobia increase worldwide. We’ll cover the technological improvements in a future post but today we’re looking at how to improve the linguistic element of hate speech monitoring and, very importantly, how you can help us.

As mentioned above, Hatebase currently looks for keywords that indicate hate speech in 90 languages from around the world. However, over 60% of those terms are in English despite only an estimated 20% of the world’s population speaking English (only 5% as their first language). This imbalance isn’t by design – English just happens to be our working language and it was the easiest for gathering data when we started out. Conversely, while a similar number of people speak Spanish as a first language, it only accounts for about 8% of Hatebase’s vocabulary. Other major languages like French, Arabic, and Portuguese are similarly underrepresented.

We want to fix this imbalance, bringing much more multilingual coverage to Hatebase. This task is very challenging since language is full of nuance and local variation while it also constantly evolves. That’s why we need native speakers of various languages to step up and help us to understand what words are used for attacking people based on race, religion, ethnicity, gender, sexual orientation, disability, and socioeconomic class.

Our list of 25 priority languages is below, though we welcome submissions in any language. To contact us and help out, send an email to: contact@hatebase.org

Priority languages

  • Arabic
  • Burmese
  • Chinese (Cantonese)
  • Chinese (Mandarin)
  • Dutch
  • English
  • Farsi
  • French
  • German
  • Hindi
  • Spanish
  • Portuguese
  • Indonesian
  • Italian
  • Polish
  • Romanian
  • Russian
  • Sinhalese
  • Swahili
  • Swedish
  • Tagalog
  • Thai
  • Turkish
  • Urdu
  • Vietnamese