Disinformation Researchers Raise Alarms About A.I. Chatbots

In 2020, researchers at the Center on Terrorism, Extremism and Counterterrorism at the Middlebury Institute of International Studies found that GPT-3, the underlying technology for ChatGPT, had “impressively deep knowledge of extremist communities” and could be prompted to produce polemics in the style of mass shooters, fake forum threads discussing Nazism, a defense of QAnon and even multilingual extremist texts.

OpenAI uses machines and humans to monitor content that is fed into and produced by ChatGPT, a spokesman said. The company relies on both its human A.I. trainers and feedback from users to identify and filter out toxic training data while teaching ChatGPT to produce better-informed responses.

OpenAI’s policies prohibit use of its technology to promote dishonesty, deceive or manipulate users or attempt to influence politics; the company offers a free moderation tool to handle content that promotes hate, self-harm, violence or sex. But at the moment, the tool offers limited support for languages other than English and does not identify political material, spam, deception or malware. ChatGPT cautions users that it “may occasionally produce harmful instructions or biased content.”

Last week, OpenAI announced a separate tool to help discern when text was written by a human as opposed to artificial intelligence, partly to identify automated misinformation campaigns. The company warned that its tool was not fully reliable — accurately identifying A.I. text only 26 percent of the time (while incorrectly labeling human-written text 9 percent of the time) — and could be evaded. The tool also struggled with texts that had fewer than 1,000 characters or were written in languages other than English.

Arvind Narayanan, a computer science professor at Princeton, wrote on Twitter in December that he had asked ChatGPT some basic questions about information security that he had posed to students in an exam. The chatbot responded with answers that sounded plausible but were actually nonsense, he wrote.

“The danger is that you can’t tell when it’s wrong unless you already know the answer,” he wrote. “It was so unsettling I had to look at my reference solutions to make sure I wasn’t losing my mind.”

Mitigation tactics exist — media literacy campaigns, “radioactive” data that identifies the work of generative models, government restrictions, tighter controls on users, even proof-of-personhood requirements by social media platforms — but many are problematic in their own ways. The researchers concluded that there “is no silver bullet that will singularly dismantle the threat.”

Source: | This article originally belongs to Nytimes.com