GOOGLE has taken heat for developing an artificial intelligence program that could be dangerous to humans if applied on a large scale.

A Google dataset for training AI was loaded with bad, uncorrected data and the potential for harm is clear, reports argue.

Google has spent $4billion dollars acquiring AI startups

1

Google has spent $4billion dollars acquiring AI startupsCredit: Getty Images – Getty

Google’s GoEmotions dataset used 58,000 Reddit comments to prepare an artificial intelligence program for gauging human emotion.

The program directed English-speaking humans to tag each Reddit comment with a label from one or more of 27 emotion categories.

The emotions are meant to cover the full scale of human sensibility, from admiration to excitement to disgust and beyond – the 28th label was “neutral”.

The applications for data labeling tech range from content moderation to resume reviews and beyond – anything that requires a large-scale analysis of human-generated text could benefit from an AI-powered emotional analysis.

Scientists built a 'self-aware' robot as concern over AI sentience grows
Fear of 'killer robots' using AI to choose human targets grows as Ukraine war rages

If only it could be set up for success.

SurgeAI reviewed the work and found that Google’s human labelers misidentified about 30% of comments out of a sample of 1,000.

The harsh condemnation of Google’s work took issue with two key aspects of the methodology: context and the complexity of English for non-American speakers.

Google’s human labelers were given “no additional metadata” about each Reddit comment – but a post’s meaning change entirely depending on what thread it’s posted in.

SurgeAI gives an example of a failure caused by a lack of context:

‘We SERIOUSLY NEED to have Jail Time based on a person’s race‘ means one thing in a subreddit about law, and something completely different in a subreddit about fantasy worldbuilding.”

Most read in Tech

Also, Google’s labelers were indeed English speakers, but from India and not the United States.

Almost half the web traffic on Reddit comes from users based in the United States – the site is highly Americanized with slang and cultural references that could be hard for a foreign English speaker to understand and accurately label.

SurgeAI compels readers to imagine the difficulty of explaining the common Reddit throwaway comment “orange man bad” to a non-American with no knowledge of US politics.

Ultimately, the dataset is simply plugged with bad data and would yield a bad program, TheNextWeb argues.

“This particular dataset is specifically built for decision-making related to human outcomes,” Tristan Greene writes for the site.

“Every single time the AI makes a decision that either rewards or punishes any human, it causes demonstrable harm to other humans.”

Zach Wilson's mom hits gym & slams claim she's cashing in on son's 'hook-up'
Heartbreaking plea of mom's son, 12, who 'wants her home' after she vanished

Google’s GoEmotions dataset is so innately flawed, Greene calls for dismantling it and starting over.

Of all the major tech players jostling around Silicon Valley, Google has invested the most in AI with almost $4billion spent on acquiring AI-based startups.

This post first appeared on Thesun.co.uk

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

How Apps Commandeered the Age-Old Idea of Takeout

At the turn of this century, when Seamless was launched, mostly as…

OnePlus 8T review – Beefy smartphone with top features and rapid charging gives iPhone 12 a run for its money

THE latest smartphone from the tech buffs at OnePlus has arrived –…

16 Best Recycled and Upcycled Clothes (2024): Leggings, Sneakers, T-shirts

I love clothes, but the fashion industry is having a crisis all…

July was the hottest month in recorded history, NOAA says

One month after North America set a record for the hottest June ever,…