Combating Toxicity in Online Games

September 15, 2020 Phaedra Boinodiris

AI holds great promise in the ability to stem online toxic behavior from within games, but there are complications that can be daunting. By tackling the three pillars of Culture, Forensic Technology, and Governance Standards, a game publisher can use AI to help their community managers create safe environments that also balances free speech factors.

Incidents of Griefing (otherwise known as cyber-bullying in games) on the Rise

Incidents of cyber-bullying in general are more than double what they were in 2007. 87% of young people have seen cyber-bullying online. Roughly 4 in 10 Americans have personally experienced online harassment, and 62% consider it a major problem. Many want technology firms to do more, but they are divided on how to balance free speech and safety issues online. One of the most prolific mediums for this online harassment is in online games.

Community Manager tactics

Game publishers have repeatedly tried to address the problem of toxicity but face numerous challenges common to Internet-based forums, especially those permitting anonymity. They have struggled to decide on broader strategic issues, such as how to balance free speech with ensuring a safe and less hostile environment, an issue shared by old-guard social networks like Facebook and Twitter.

Community managers deploy various strategies with varying levels of success to mitigate toxicity. A few strategies that have worked to varying degrees include rewarding good behavior from in-game play, making avoiding those players that have established patterns of toxic behavior easy, establishing protocols to ensure that no one is above the rules, and crowd sourced reporting from the community.

The Promise of AI and the reality of Bias

Artificial Intelligence could be a key tool in the repertoire of community managers to combat griefing. Below is a short list of ways that community managers might use AI to curate communities that are more hospitable to more gamers.

1) Auto-gift good behavior using in-game rewards

2) Auto-BLOCK a player for consistent bad behavior in a new way (a-la Black Mirror's White Christmas episode)

3) Flag a community manager when toxic behavior is occurring for further analysis

4) In game alerts to griefers that warn them about very specific ways in which their behavior is unacceptable, with varying degrees of consequences to their behaviors from a text warning, short term account freezes, figure and voice blocks (per above), all the way to an account suspension.

5) Enhanced tools for communities to self police powered by AI

6) The ghost-block: allow the user to continue to post, but block the visibility of their posts to other users for a period of time or until a benchmark of acceptable content or behavior has been met.

This being said, it is hard to have a conversation about toxicity without raising concerns of bias against the opposing side. There are ways to combat bias - both in human and AI systems as it pertains to online toxicity such that the desired approach to balancing a safe environment and allowing free speech is achieved. There is inherent bias in both humans and of course the AI systems that we fallible humans produce, from our personal politics and perspective on the world to the methodology used to build and train the AI systems that may impart bias from the lowest levels of the system. The key to avoiding bias is the formation of clear guardrails of behavior and response, and the complete transparency in the decision making process enforcing the rules at the guardrail point. AI is scary to people because the decision making process is often not transparent - what factors went into the decision, how were they weighted, and do the decisions track similarly across disparate circumstances and population sets. These weights vary across game properties. What would be considered highly toxic in a game for 9-year olds may be considered as 'light banter' in an adult first-person-shooter game.

The three pillars to combat Bias

In lieu of thinking that one might simply solve this challenge with technology alone, we instead believe that there are three major areas of investment and resources that can mitigate the risk of bias when addressing toxicity:

Culture
Forensic technology
Standards and Governance

In this article we will tackle how each of these pillars can be used to both combat toxicity and mitigate the risk bias, finding a balance of free speech and protection of the game culture & community for an online gaming environment.

Pillar 1: Culture

Corporate or studio culture is the foundation that can determine how large a bias problem you may have. From the teams developing the games, environments, and forums; to the teams curating the data used to train the AI and developing the AI algorithm is both Diverse and Inclusive. The more diverse and inclusive this team can be the better. You want as many voices and points of view represented in this group. Often within a singular and isolated group: a team of developers, a single-game delivery team, or a gaming organization as a whole; systematic bias may subsist in the very framework of the organization and platform. We have seen it often in recent history across technology platforms where diverse populations interact with a tool created by an isolate one. Often the only way to fully tackle the issue is to bring in outside help and expertise.

Establish a Red Team vs Blue Team paradigm. This is a special set of teams that will poke holes in assumptions associated with the data. They will think about unintended consequences of various approaches. - This does not simply mean doubling your manual review process. The red team can specifically leverage ML/AI to generate challenge language to test the boundaries of the blue team in a double-blind manner - the Blue team judges language provided to them in a review queue that consists of randomly selected "real" hits from the production toxicity detector, generated "new" potentially toxic comments from the red team AI, and human-engineered test & edge cases. The results of this review queue are then fed back into the training dataset for the production toxicity detector.

Maintain a strong Feedback loop- as AI is planned, developed, and deployed into the toxicity mitigation process, ensure that you have a feedback loop put in place for gamers to give you feedback so that you may consider whether to tweak the data used to train the AI or tweak the algorithm.

Ensure that you incorporate ethical considerations in the Design Thinking process. as you think about your intended audiences for your game and run your empathy maps, incorporating predictive models toward how various psych profiles will react to different toxicity mitigation approaches will be key.

Another key component to culture is considering the effect that the AI will have on your community managers behaviors. For example, one would want to consider how to continually motivate people to train (and correct) the model, rendering transparent these contributions to management so that they understand who is providing value. One would need to correct the perception that if a community manager has a different answer from the AI that the manager is automatically wrong. When a community manager takes the time to correct a flawed model, they may inadvertently be flagged for being less productive than just using the model alone (with the error). These issues must carefully considered when planning an AI implementation.

SnowFlakes, Karens, and pop-culture references

One of the key parts of reducing or mitigating bias or tuning AI behavior in general is to understand the trade-offs that are being made. Say a new term pops up that suddenly has a strong and new meaning (e.g., snowflake, Karen), which was previously entirely benign. You find this term growing in use and so contemplate removing anything associated with it from the data set. Now, by removing this term, you may have biased the data in an unintended way -- with the snowflake example, the trained model might not use that term negatively, but the AI might further be undertrained and underperform on matters related to snow, or for Karen, it might actually ignore, perform poorly with, or discriminate against people named Karen. AI that is intrinsically interpretable (and related forensic technology) can help you dissect the trade-offs to understand how to readjust the model just to remove the undesired data and behavior. For example, you might train the model with more snow related discussion that includes the word snowflake as positive and train it to detect bullying when used in other contexts.

Pillar 2: Forensic technology Tools

Today, many organizations use forensic technology to mine AI training sets for bias. These forensic technologies generate labels that can then detail whether there is an inherent bias in the dataset against any specific group of people (by race, gender, age etc). As language can change quickly and memes can spread rapidly, it's important for Forensic technology tools to be dynamic and to understand the feedback loops. What this in effect means is that whatever AI is used to mitigate toxicity must continue to be an augmentation to what community managers are already doing and can never be a complete replacement for human oversight. Having the community manager provide feedback to the AI tool over time will help it become better at recognizing hate speech in the proper context.

Hate speech is a tougher challenge

Tackling online toxicity is tougher than tasks on tabular data because there can be a lot of different manifestations of cyberbullying and there are many different ways where there can be biases. There's not necessarily protected attributes like race or gender involved. Unfortunately for applying machine learning, there exists no representative, benchmark, “gold standard” corpus of hate speech, a trait common to many real-world problems. Language is nuanced, and subtle changes in the order of words or punctuation of speech can be the difference between a benign and toxic comment.

Natural Language Processing (NLP) has evolved heavily in the last two decades. Pattern-based feature extraction yielded to complex grammar models, giving way to mathematical models for language extraction and classification by the late 2000's. This shift in approach was largely due to the complexity and high cost of training and execution time of the grammar-based models. This inflection point marked an important shift in the adoption and accessibility of NLP, but also added to its mystique, as inferences were no longer easily understood rule-based correlations. Today there are many options for NLP, ranging from the simplistic to the complex in terms of capability, training difficulty, and implementation speed. In most approaches a large unlabled sample dataset will provide the source of the initial training for the algorithm, and then transfer learning will be applied on the result - applying the identified structures in the untrained dataset to the identified structures in the much smaller final training dataset.

Companies tackling this challenge must have a primary training dataset that comes from a diverse source of language and context for the sample set they are ultimately trying to identify. In the case of toxicity, this is challenging, as most large public datasets used in the pretraining of common models, such as Wikipedia, are actively curated to remove toxic language. It is this mechanism that leads identities (e.g. "gay") to be flagged as toxic by many transfer-trained models: The substantial raw training data only contained "gay" in the "good" context of identity. Transfer learning from the smaller negative training set then pulled the mathematical vector associated with the use of "gay" and weighted it heavily as likely toxic. NLP technology selection and training methodology are of the utmost importance because of this core challenge. Choosing incorrectly will lead to the team to endlessly chase perceived "edge" cases as the algorithm-reported accuracy will be high, but the real world results my be poor. Occasionally this even leads to the abandonment of the AI model as confidence is lost in the process.

An alternative approach to the NLP problem is to fuse grammar-based rules into an algebraic model - removing a dimension of freedom from the equation by adding a constraint that text fed to and evaluated by the system will adhere to a predefined set of grammar. This has the benefit of not requiring a large initial dataset to train an arbitrary mathematical grammar model - the grammar model is defined in syntax without training. This approach allows for the system to react to previously unseen conditions with a greater degree of success. IBM Research pioneered this technology over the past 20 years, evolving into a declarative information extraction model called System T, which later became the underpinnings of IBM Watson.

An important part of building trust in AI/ML is to lower the curtain behind the decision making process. For any weighted decision that comes out of an ML engine, a direct decision path should be able to be mapped from the result back thru the logic to the input parameters. By extension, the training data set could be then queried to find the input datasets that leveraged weight to the engine so they could be re-verified for correctness - especially in situations where "correct" may be a moving target as culture and perception shifts. The key to tackling bias in these datasets would be that if you have a model that's been trained already (like Jigsaw's model), you could try to break down the prediction results by topic or keyword to see if there are accuracy differences. A good Forensic Technology can also make suggestions with respect to how the algorithm is tweaked with respect to its approach towards Fairness, and even test your algorithm by generating dummy datasets.

Notable Open Source Efforts in Forensic Tech

There are several notable open source efforts to curate hate speech models.

The Online Hate Index (OHI) research seeks to improve society's understanding of hate speech across platforms. The OHI is developing a nuanced measurement methodology which decomposes hate speech into constituent components that are easier for humans to rate than a single omnibus question (i.e. "is this comment hate speech?").

Jigsaw unintended bias in toxicity classification

In this competition, coders are challenged to build a model that recognizes toxicity and minimizes unintended bias with respect to mentions of identities. They are using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. By developing strategies to reduce unintended bias in machine learning models, participants help the Conversation AI team build models that work well for a wide range of conversations.

Pillar 3: Governance Standards

There should be governance of the overall platform, but also specifically governance of the AI, including through transparent reporting mechanisms such as FactSheets. This third pillar that organizations use to combat bias in AI is informed by a robust AI Ethics Board. These are standards that the org promises the market to abide by should they ever procure or deploy an AI that directly affects its employees or its clientele in a way that might impact them. Certainly removing access to a game due to perceived bad behavior would be included as a use case for this board to consider.) The board should necessarily be diverse and not be lead by the CTO or CIO or head of any product in the org. There should be the ability for employees to submit complaints anonymously.

Governance is immensely important, as toxicity, and the public perception of what is toxic will evolve over time. Organizations must stay ahead of the shifting perception, which occasionally may run counter to the core culture that propelled their platform into popularity. At its core, a solid governance organization will provide the stability of path and guidance of what is right for the organization and the values that it stands for. As AI/ML engines become central to many of these organizations, ensuring that they constantly reflect the priorities and standards of the organization becomes both challenging and paramount. Without transparency of their training path and decision making process, the entire engine could be rendered invalid after any major shift in policy or platform, requiring a complete re-baseline of the system.

A single pane of glass

Toxicity can extend into every facet of a gaming environment. When presented with the opportunity to introduce non-curated content, there will always be a vector for toxicity to invade a community. Live player audio, video sharing, image decorations of in-game items all can and have recently been vectors for inclusion of toxic content. At the scale of the gaming community and breadth of the possible vectors, the only tenable solution to platform-wide toxicity is the inclusion of AI into the core of the game engine. This way a single pane of glass can be used to monitor and remediate toxicity across a whole platform, while also ensuring consistent standards, transparency of decision making, and the ability to detect previously unseen toxicity thru a combination of ML engine and red-team based moderator testing.

Conclusion

In conclusion, AI holds great promise in the ability to stem online toxic behavior from within games, but there certainly are complications that can be daunting. Bringing together the 3 pillars into an executable plan, within a reasonable timeline and budget is a mountainous challenge, and is one that stymies many organizations from implementing effective anti-toxicity and anti-bias programs into their platforms. Individual workstreams are needed around the three pillars of Culture, Forensic Technology and Governance Standards. Bringing in outside expertise to assist with the formulation of the principles, governance, and technology is often a path that can reduce the delivery risk to the organization, and give an objective perspective to any organizational bias that may exist within a company as a whole. AI can be leveraged to reduce the manual workload and increase the likelihood of catching new types of toxic comments before the become a national dialog. Game publishers owe it to themselves to begin work on these areas now, as creating safe environments that also balance free speech factors will continue to rise as a top issue for gamers and the expansion of the gaming community into prominence.

Author Bios:

Phaedra Boinodiris

Phaedra Boinodiris FRSA has focused on inclusion in technology since 1999. She is the former CEO of WomenGamers.com, having started the first scholarship program in the US for women to pursue degrees in game design and development. She is currently pursuing her PhD in AI and Ethics from UCD in collaboration with NYU.

Garrett Rowe

Garrett Rowe has lead the architecture, design, and integration of foundational data and AI/ML technologies into some of the largest organizations in the world. Over the last 20 years he has held leadership positions in Naval Aviation, Emergency Management/Counterterrorism, IT Consulting, Engineering, and Sales organizations, bringing together diverse teams to solve challenging cross-functional problems.

Relevant Links:

https://www.pewresearch.org/internet/2017/07/11/online-harassment-2017/

Gamergate: https://www.washingtonpost.com/news/the-intersect/wp/2014/10/14/the-only-guide-to-gamergate-you-will-ever-need-to-read/

https://www.washingtonpost.com/technology/2019/02/26/racism-misogyny-death-threats-why-cant-booming-video-game-industry-curb-toxicity/

https://www.wired.com/story/videogames-anti-toxicity-valorant-launch/

https://www.polygon.com/2012/10/17/3515178/the-league-of-legends-team-of-scientists-trying-to-cure-toxic

https://venturebeat.com/2020/03/11/riot-games-launches-player-dynamics-to-help-improve-multiplayer-experiences/

https://www.gamasutra.com/blogs/SarahRobinsonYu/20190502/341880/Mitigating_Toxic_Behaviour_4_Lessons_from_Overwatch.php

https://www.ubisoft.com/en-us/game/rainbow-six/siege/news-updates/4h4MEtQwhxnOtNrH6KGMi2

https://developer.ibm.com/technologies/artificial-intelligence/models/max-toxic-comment-classifier/

https://www.broadbandsearch.net/blog/cyber-bullying-statistics

https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/