Combating Toxic Memes with Artificial Intelligence

Home / Combating Toxic Memes with Artificial Intelligence

19 May 2022

Along with the increased consumption of streamed movies, zoom meetings and home-based learnings, memes have also been gaining steady popularity across social media platforms during the course of the pandemic.

Usually intended to be funny or satirical in nature, memes are presented as an image with accompanying text. Noting its growing prevalence, TIME had also published a collection of the best memes of 2020, reflecting the similar lived realities that people from different corners of the world were sharing.

However, as much as memes have helped many to cope with the stress of the pandemic, they have also gained notoriety for peddling out falsehoods, hate speech and cyberbullying.

Weaponisation of Memes – Humour, Satire or Something Darker?

Cloaked behind the thin veil of anonymity or “humour”, meme creators are enabled and empowered further through the seamless hyper connectivity of social media platforms such as Facebook, Twitter and Instagram.

At the height of the Covid-19 pandemic, they were known to be powerful vehicles in spreading conspiracy theories, discouraging the use of vaccines and pushing out misinformation. Hate memes targeting Asians incited racist online attacks which eventually led to real-world violence.

Social media platforms have attempted to regulate the spread of toxic content including memes by employing thousands of content moderators to identify and flag out hateful content. However, many of these content moderators suffered from post-traumatic stress disorder (PTSD) having been exposed to content ranging from violence to graphic child abuse for hours on a daily basis.

Facebook was held accountable for the toll it took on the current and former content moderators who developed mental health issues on the job and agreed to pay them $52 million in compensation.

Artificial Intelligence as the Silver Bullet?

Meanwhile, social media platforms have been turning to artificial intelligence (AI) to automatically detect harmful content while trying to significantly reduce the otherwise laborious and evidently harmful process of manually sifting through toxic content.

However, detecting hateful content – memes in particular – has turned out to be very challenging even for AI. The ability to analyse and decide if a meme was satirical or hateful requires a holistic understanding of both the modalities such as the image and text, as well as the context in which these modalities present themselves in. While humans can intrinsically understand the combined meaning of captions and images in memes, machines have a hard time performing such a complex task.

For example, given a mean meme as shown in figure 1, the multimodal machine learning algorithms should be able to extract “Love the way you smell today” from the text and make out the image of the skunk. The algorithms should also then combine the textual and visual information to decide if this meme would be considered as a mean meme.

Humans will be able to instinctively identify this as a mean meme because we know that skunks are commonly associated with a foul smell and deduce that the meme is being sarcastic. However, machines may not necessarily have this contextual knowledge or common sense to make an accurate assessment of the meme.

Combating Toxic Memes with Artificial Intelligence Figure 1

Fig 1. Example of a mean meme that was part of the Hateful Memes Dataset from Facebook’s Hateful Memes Challenge.

As one can imagine, it is challenging to teach common sense to a human much less to a machine. For example, a machine would most likely be able to recognise an image of a cat after viewing thousands of various cat images. However, it is hard to provide a machine with a large enough set of raw data to recognise context because context is more abstract and is usually derived from complex reasoning.

Working to address this limitation, social media platforms have been collaborating closely with the research and academic community to ensure the swift translation of research findings into industry practice as well as the scaling up of AI operations.

Facebook organised a Hateful Memes Challenge, which crowdsourced multimodal vision-and-language machine learning solutions to detect hateful memes. The goal was to design AI solutions that could replace or offload content moderators in regulating toxic content. The one-year contest received over 3,000 participants and 276 submitted machine learning solutions from around the world.

Why Is Artificial Intelligence Not Intelligent Enough?

Despite these concerted efforts, the existing AI solutions – as cutting-edge as they are – are simply not good enough to accurately moderate mean or toxic memes. The best performing AI algorithm, DisMultiHate, with its estimated 75% accuracy, still struggles to outperform humans at their 84% accuracy when it comes to detecting toxic memes. The winning solution from Facebook’s Hateful Memes Challenge yielded in a close 73% in its accuracy score.

Developed by the Social AI Studio at the Singapore University of Technology and Design (SUTD), DisMultiHate is also a multimodal deep learning algorithm designed to identify hateful memes. It goes a step further by enabling the discernment of the target or victim in such memes, which in turn corresponds to a higher likelihood of the meme being hateful.

Additionally, the DisMultiHate algorithm uses computer vision and natural language processing techniques to extract the multimodality features from memes. The multimodality features are subsequently projected onto a latent vector space – an abstract multi-dimensional space – to disentangle the latent vector representation of the target in the memes.

Just as it is helpful for us take a step back to better process our thoughts when trying to make sense of a complex issue, this latent vector space allows the machines to be able to better decipher seemingly inconsequential, abstract variations and make meaningful interpretations from a latent vector space projection.

But even with these combination of advanced features, existing AI solutions fail to match up to human performances for two main reasons.

Firstly, most of the current AI solutions are not able to understand the cultural context in these toxic memes. For example, the AI solutions may not understand the Singlish used in Singapore-based memes or understand the references in local events.

Secondly, memes are fast evolving. AI solutions, which depend on data, particularly past examples of toxic memes, would find it hard to adapt or predict new toxic memes created by innovative Internet users who are churning out thousands of memes daily. An example of this is the new anti-vaccine memes that were created during the COVID-19 pandemic, and such memes were unobserved previously.

Smart Machines for Smart Nations

However, research work on distilling the intricate nuances of the human language is also constantly evolving and developing. One of the ways to do that is to understand and imitate how humans are kept constantly updated of changes in knowledge, culture and values.

SUTD’s Social AI Studio is exploring the option of “training” multimodal hateful meme detection algorithms to understand Southeast Asia contexts through news articles from Southeast Asia countries’ mainstream news media outlets as well as Wikipedia in various Southeast Asian native languages.

The goal is to eventually enable the algorithms to understand various Southeast Asian languages and equip them with adequate contextual knowledge to better detect hateful memes that are based in the region.

While AI researchers continue to design new, sophisticated algorithms which can outrun and outsmart the propagation of toxic memes, it is also becoming increasingly important for netizens to also practise social responsibility in calling out offensive memes.

Netizens can easily report inappropriate content which violate conduct policies via feedback channels on social media platforms. In doing so, they not only contribute crucial feedback and complex datasets that these AI models require, but also help to swiftly dismantle online spaces that breed contempt.

About the Author

Combating Toxic Memes with Artificial Intelligence - Assistant Professor Roy Ka-Wei Lee

Roy Ka-Wei Lee is an Assistant Professor at the Information Systems Technology and Design Pillar (ISTD) and Design and Artificial Intelligence (DAI) programme from the Singapore University of Technology and Design (SUTD). He leads the Social AI Studio at SUTD which aims to design next-generation socially-enabled AI system.
For more info on Social AI Studio, click here (