Artificial intelligence is hardly confined by international borders, as businesses, universities, and governments tap a global pool of ideas, algorithms, and talent. Yet the AI programs that result from this global gold rush can still reflect deep cultural divides.

New research shows how government censorship affects AI algorithms—and can influence the applications built with those algorithms.

Margaret Roberts, a political science professor at UC San Diego, and Eddie Yang, a PhD student there, examined AI language algorithms trained on two sources: the Chinese-language version of Wikipedia, which is blocked within China; and Baidu Baike, a similar site operated by China’s dominant search engine, Baidu, that is subject to government censorship. Baidu did not respond to a request for comment.

The researchers were curious whether censorship of certain words and phrases could be learned by AI algorithms and find its way into software that uses those algorithms. This might influence the language that a chatbot or a voice assistant uses, the phrasing by a translation program, or the text of autocomplete tools.

The type of language algorithm they used learns by analyzing the way words appear together in large quantities of text. It represents different words as connected nodes in a physical space; the closer words appear, the more similar their meaning.

A translation program might infer the meaning of an unknown word by looking at these relationships in two different languages, for example.

The UCSD researchers found key differences in the resulting AI algorithms that the researchers said seem to reflect the information that is censored in China. For example, the one trained on Chinese Wikipedia represented “democracy” closer to positive words, such as “stability.” The algorithm trained on Baike Baidu represented “democracy” closer to “chaos.”

Roberts and Yang then used the algorithms to build two programs to assess the sentiment—the positive versus negative meaning—of news headlines. They found that one trained on Chinese Wikipedia assigned more positive scores to headlines that mentioned terms including “election,” “freedom,” and “democracy,” while the one trained on Baidu Baike assigned more positive scores to headlines featuring “surveillance,” “social control,” and “CCP.” The study will be presented at the 2021 Conference on Fairness Accountability and Transparency (FAccT) in March.

In recent years, researchers have highlighted how race and gender biases can lurk in many artificial intelligence systems. Algorithms trained on text scraped from the web or old books, for instance, will learn to replicate the biases displayed by the human authors of that text. In 2018, researchers at Google demonstrated cultural biases in image recognition algorithms, which may, for example, recognize only Western wedding scenes.

article image

The WIRED Guide to Artificial Intelligence

Supersmart algorithms won’t take all the jobs, But they are learning faster than ever, doing everything from medical diagnostics to serving up ads.

Roberts notes that the differences seen in their study may not be due entirely to government censorship. Some may be the result of self-censorship or simply cultural differences between those writing the encyclopedia articles. But she says it is important to recognize that government policy can cause other forms of bias to lurk in AI systems. “We see this as a starting point for trying to understand how government-shaped training data appears within machine learning,” Roberts says.

Roberts says researchers and policymakers need to consider how governments in the future might influence how AI systems are trained in order to make censorship more effective or export particular values.

Graeme Hirst, a professor at the University of Toronto who specializes in computational linguistics and natural language processing, has a few qualms with the study methodology. Without carefully studying the differences between Chinese Wikipedia and Baidu Baike, Hirst says, it is hard to ascribe variations in the algorithms to censorship. It is also possible that Chinese Wikipedia contains anti-Chinese or overtly pro-democracy content, he says. Hirst adds that it is unclear how the sentiment analysis was done and whether bias may have been introduced there.

Others see it as a welcome contribution to the field.

“In a certain sense, this is not surprising,” says Suresh Venkatasubramanian, a professor at the University of Utah who studies AI ethics and cofounded the FAcct conference.

Venkatasubramanian points out that AI algorithms trained on Western news articles might contain their own anti-China biases. “But I think it’s still important to do the work to show it happening,” he says. “Then you can start asking how it shows up, how do you measure it, what does it look like and so on.”


More Great WIRED Stories

You May Also Like

You’ll NEVER lose your iPhone or Apple Watch again with this genius trick

WONDERING where your iPhone has gone? There’s an easy way to track…

Gamers rush to buy from GoG summer bundle sale – limited time bundles up to 91% off

GoG is currently holding a huge sale with up to 91% off…

Sleep-deprived people can improve how they walk if they get at least six hours of sleep each night

This gives new meaning to the phrase, no rest for the weary.…

Overwatch 2 players on Xbox furious as irritating bug kicks them out of matches

OVERWATCH 2 players have noted a number of errors since last week’s…

Artificial intelligence is hardly confined by international borders, as businesses, universities, and governments tap a global pool of ideas, algorithms, and talent. Yet the AI programs that result from this global gold rush can still reflect deep cultural divides.

New research shows how government censorship affects AI algorithms—and can influence the applications built with those algorithms.

Margaret Roberts, a political science professor at UC San Diego, and Eddie Yang, a PhD student there, examined AI language algorithms trained on two sources: the Chinese-language version of Wikipedia, which is blocked within China; and Baidu Baike, a similar site operated by China’s dominant search engine, Baidu, that is subject to government censorship. Baidu did not respond to a request for comment.

The researchers were curious whether censorship of certain words and phrases could be learned by AI algorithms and find its way into software that uses those algorithms. This might influence the language that a chatbot or a voice assistant uses, the phrasing by a translation program, or the text of autocomplete tools.

The type of language algorithm they used learns by analyzing the way words appear together in large quantities of text. It represents different words as connected nodes in a physical space; the closer words appear, the more similar their meaning.

A translation program might infer the meaning of an unknown word by looking at these relationships in two different languages, for example.

The UCSD researchers found key differences in the resulting AI algorithms that the researchers said seem to reflect the information that is censored in China. For example, the one trained on Chinese Wikipedia represented “democracy” closer to positive words, such as “stability.” The algorithm trained on Baike Baidu represented “democracy” closer to “chaos.”

Roberts and Yang then used the algorithms to build two programs to assess the sentiment—the positive versus negative meaning—of news headlines. They found that one trained on Chinese Wikipedia assigned more positive scores to headlines that mentioned terms including “election,” “freedom,” and “democracy,” while the one trained on Baidu Baike assigned more positive scores to headlines featuring “surveillance,” “social control,” and “CCP.” The study will be presented at the 2021 Conference on Fairness Accountability and Transparency (FAccT) in March.

In recent years, researchers have highlighted how race and gender biases can lurk in many artificial intelligence systems. Algorithms trained on text scraped from the web or old books, for instance, will learn to replicate the biases displayed by the human authors of that text. In 2018, researchers at Google demonstrated cultural biases in image recognition algorithms, which may, for example, recognize only Western wedding scenes.

article image

The WIRED Guide to Artificial Intelligence

Supersmart algorithms won’t take all the jobs, But they are learning faster than ever, doing everything from medical diagnostics to serving up ads.

Roberts notes that the differences seen in their study may not be due entirely to government censorship. Some may be the result of self-censorship or simply cultural differences between those writing the encyclopedia articles. But she says it is important to recognize that government policy can cause other forms of bias to lurk in AI systems. “We see this as a starting point for trying to understand how government-shaped training data appears within machine learning,” Roberts says.

Roberts says researchers and policymakers need to consider how governments in the future might influence how AI systems are trained in order to make censorship more effective or export particular values.

Graeme Hirst, a professor at the University of Toronto who specializes in computational linguistics and natural language processing, has a few qualms with the study methodology. Without carefully studying the differences between Chinese Wikipedia and Baidu Baike, Hirst says, it is hard to ascribe variations in the algorithms to censorship. It is also possible that Chinese Wikipedia contains anti-Chinese or overtly pro-democracy content, he says. Hirst adds that it is unclear how the sentiment analysis was done and whether bias may have been introduced there.

Others see it as a welcome contribution to the field.

“In a certain sense, this is not surprising,” says Suresh Venkatasubramanian, a professor at the University of Utah who studies AI ethics and cofounded the FAcct conference.

Venkatasubramanian points out that AI algorithms trained on Western news articles might contain their own anti-China biases. “But I think it’s still important to do the work to show it happening,” he says. “Then you can start asking how it shows up, how do you measure it, what does it look like and so on.”


More Great WIRED Stories

You May Also Like

Northern Lights light up the skies over Scotland following a huge solar flare

The Northern Lights dazzled stargazers in Scotland at the weekend following a…

Millions of PS5 owners make three common mistakes and the first wastes your money

EVEN long-time PlayStation gamers might be making a few silly mistakes. The…

Boston Dynamics reveals updates to its robot dog – including a fifth limb

US robotics firm Boston Dynamics has revealed a new product line for Spot,…

TV Struggles to Put Silicon Valley on the Screen

The first scene in the new Apple TV+ series WeCrashed perfectly sets…