The Sentinel Project recently organised a workshop at the 16th annual UN Internet Governance Forum held in Katowice, Poland. Our session titled “The Challenges of Online Harms : Can AI moderate Hate Speech?” brought together human rights experts with computer scientists who research and develop AI-based hate speech detection systems in an effort to formulate a rights-respecting approach to tackling hate. Our hope is that bridging the monumental gap between these communities will help to drive new initiatives and outlooks, ultimately leading to more effective and responsible ways of tackling online hate speech. We adopted a multi-stakeholder approach, reflecting the need for social, political, and computational voices to be heard in order to develop feasible and effective solutions.

The impact of hate speech on fragile states has risen exponentially in recent years as a result of misinformation that creates an environment for hate speech to spread rapidly across social media. There are concerns that this has contributed on a large scale to persecution, armed conflict, and mass atrocities including genocide in various countries. 

A key challenge with online hate is finding and classifying it — the sheer volume of hate speech circulating online exceeds the capabilities of human moderators, resulting in the need for increasingly effective automation. The pervasiveness of online hate speech also presents an opportunity since these large volumes of data could be used as indicators of spiraling instability in certain contexts, offering the possibility of early warning and intervention to stem real-world violence.

Artificial Intelligence (AI) is now seen as the primary method that tech companies use to find, categorize, and remove online abuse at scale. However, in practice AI systems are beset with serious methodological, technical, and ethical challenges, such as (1) Balancing freedom of speech with protecting users from harm, (2) Protecting user privacy from the platforms deploying such technologies, (3) Explaining the rationales for their decisions that are rendered invisible due to the opaqueness of many AI algorithms, and (4) Mitigating the harms stemming from the social biases they encode.

Part 1: Categorising, understanding, and regulating hate speech using AI









Giovanni De Gregorio invited the audiences not to view the questions around hate speech and AI just from a technological standpoint but also to think about the social dimension of this problem. If we look at the issue of automated moderation of hate speech as a purely technological problem then we are missing important elements because it is also a matter of both context and language. There is a dearth of training datasets in AI for some languages, especially places like Eastern and Southern Africa, the automated moderation of content is limited. There are small projects carried out to translate a small piece of information into data to teach AI a particular language. Considering the examples of Myanmar and Ethiopia, it is critical to understand not only what AI can detect but also the incentives that social media platforms have for developing more accountable AI systems, especially considering the issue of different languages. This is not a problem that should be attributed just to AI. There are also other issues relating to understanding the incentives guiding social media content moderation and the nuances in the protection of free speech in context. The question is both technological and social at the same time.

Vincent Hofmann addressed online hate speech moderation from a legal perspective. He highlighted the importance of the fundamental rights of individuals and the provision to legally appeal the automated decision, made by AI when the user is confronted with it. It is necessary to explain the final decision and procedure to appeal to the user, in a manner that is sensible and easily understandable. The decisions made by the content moderation team of private companies have a fundamental impact on individual rights and have the ability to directly influence the local political debate and freedoms associated. It is increasingly becoming difficult in moderation without AI in the initial stages of online moderation. The challenge arises in the later stages,  when the human moderators from private companies have a limited understanding of the cultural context. This frequently occurs in countries where English is not widely spoken and the team is largely unfamiliar with the local ways of working. 

Lucien Castex spoke as a representative of the French National Consultative Commission on Human Rights (CNCDH). The commission proposed a bill in early 2019 as an attempt to tackle online hate speech by conducting several interviews on the topic. This bill relied heavily on the digital platforms (private sector) to take down online content. Castex highlighted a number of drawbacks to this approach. There was a significant risk of mass removal of content, especially in the case of content that comes under the grey area. The proposed legislation enabled pulling down content within twenty four hours which does not provide enough time to adequately evaluate the content. This coupled with the threat of large fines being levied on platform providers resulted in over censorship which proved to be harmful to the freedom of speech and expression online. Castex also observed a massive use of AI during content removal moderation that deployed non-transparent algorithms and indicated a clear need to understand the cultural and linguistic context to make the usage of automated tools more effective. The over take down of online content reinforces dominant positions which poses a risk to small actors. Taking into account the freedom of speech and privacy in content removal moderation, the CNCDH decided to omit most of the key provisions in the bill and established an observatory to study online hate speech in greater detail. The CNCDH is currently conducting extensive research on AI and its implications on human rights as follow up to hate speech, the publication of which is expected to release in March 2022.

Part 2: Tackling conflicts and ethical challenges in Global South and Middle East

Neema Iyer spoke about the online abuses that women face during elections. Her organization, Pollicy, conducted a hate speech study titled Amplified Abuse that looked at online abuse against women politicians during the 2021 Ugandan general election. Mapping out the local context of hate and improving machine learning techniques requires a lot of quality data which is currently scarce. The team used the Hatebase repository and conducted a few lexicon building workshops to gather inputs. Pollicy’s work has shown that the abuse directed towards women is often gendered and sexualised, with women being targeted based on their personal lives while men are targeted based on their political positions. The existing biases impact Africans women and people of colour in other parts of the world. Women tend to have their accounts restricted for certain periods of time if they openly discuss about queer issues or racism, since the content gets flagged as hate speech. Iyer also stressed on the need for private sector companies to fund indigenous researchers from the Global South that are compensated appropriately for their time. 

Rotem Medzini presented on the co-regulatory model implemented during his research on antisemitism at The Israel Democracy Insitute (IDI). The model is divided into two parts. The first part contains common criteria for identifying hate speech and balancing it with freedom of expression. The common criteria are scaled to enable technology companies that provide online social platforms to more easily define a uniform policy on moderating hate speech. Each continuum supports a choice between two poles. On the left side, more lenient options enable less intervention to restrict freedom of expression. On the right side, stricter options lead to the deletion of more content. The second part of the model is a procedural guide on how to implement the policies within the organization. IDI also provides managers with steps on how to implement the criteria into their organization and online platform and then make a decision on content that violates these policies.

Raashi Saxena presented the Hatebase initiative and highlighted the importance of understanding how the online world affects the offline world, especially in the context of violence and mass atrocities. Hate speech is not a new phenomenon. The past incidents pf genocides required large-scale coordination efforts from multiple fronts, including significant institutional infrastructure and financial resources. However, in the present day anyone with an internet connection and smartphone can potentially reach a large audience that even reaches beyond national boundaries to spread their propaganda. It is exceedingly difficult to pinpoint the original source of the information. The sheer volume of growing information online makes it difficult to moderate without automation. There is a lack of a universally accepted definition of online hate speech, due to which several social, cultural and ethical conundrums arise. Human moderators are poorly compensated and the task at hand takes a massive toll on their mental well-being. Saxena also highlighted the linguistic nuances that influence the identification of online hate speech. The dialects and slang spoken in one country are typically vastly different from other countries even when they share a common language. She also stressed the need for human intervention in training the technology that can aid in making decisions in ambiguous cases with respect to hate speech.

The Citizen Linguist Lab is an opportunity for anyone from across the world to contribute towards Hatebase’s lexicon of keywords based on nationality, ethnicity, religion, gender, sexual discrimination, disability, and class in order to identify and monitor potential incidents of hate speech as well as providing the necessary social and cultural nuance One does not have to be a professional to contribute. Along with a global network of people and organizations working on related issues, such as openness, information sharing, collaboration, counter-messaging, informed policy making, and education, Citizen Linguist Lab contributors will support communities in making better decisions. To that end, Hatebase encourages counter-messaging as a better solution to hate speech than censorship. We advocate for the right to hold and express opinions, no matter how disagreeable, as one of the distinguishing characteristics of a free and open society.

Key takeaways and ways forward 

  • AI cannot understand context or language. Companies invest very little money in moderators and focus more on developing the AI systems.
  • It is imperative to understand how moderation is operated and to then enforce transparency obligations in order to ensure that this understanding becomes and remains public knowledge.
  • Accessing data freely is essential for understanding content moderation. One of the key points is access for research which also safeguards the privacy of users’ personal data.
  • Moderation of images can be difficult and may not be considered obviously hateful without important context. An example of this is the sending of pictures of machetes to women candidates during the Ugandan general election
  • Development of policy should be done by a wide variety of stakeholders in order to create standard practices
  • Having good linguistic coverage is a challenge and local context is important. Words that are considered to be offensive in one culture or dialect may be inoffensive in the same language as it is spoken in a different region
  • Mapping out the local context of hate and improving machine learning techniques requires large amounts of high-quality data

Practically speaking, the enforcement of policies to restrict hate speech comes down to service providers, especially social media companies. There is a need for them to adopt hate speech policies that adequately balance freedom of expression with the need to curtail content that can contribute to violence. At the same time, there is a need for such companies to develop different notification schemes and responses for national contact points, trusted reporters, and users. The full recording of our workshop can be viewed here.