GOOGLE has taken heat for developing an artificial intelligence program that could be dangerous to humans if applied on a large scale.

A Google dataset for training AI was loaded with bad, uncorrected data and the potential for harm is clear, reports argue.

Google has spent $4billion dollars acquiring AI startups

1

Google has spent $4billion dollars acquiring AI startupsCredit: Getty Images – Getty

Google’s GoEmotions dataset used 58,000 Reddit comments to prepare an artificial intelligence program for gauging human emotion.

The program directed English-speaking humans to tag each Reddit comment with a label from one or more of 27 emotion categories.

The emotions are meant to cover the full scale of human sensibility, from admiration to excitement to disgust and beyond – the 28th label was “neutral”.

The applications for data labeling tech range from content moderation to resume reviews and beyond – anything that requires a large-scale analysis of human-generated text could benefit from an AI-powered emotional analysis.

Scientists built a 'self-aware' robot as concern over AI sentience grows
Fear of 'killer robots' using AI to choose human targets grows as Ukraine war rages

If only it could be set up for success.

SurgeAI reviewed the work and found that Google’s human labelers misidentified about 30% of comments out of a sample of 1,000.

The harsh condemnation of Google’s work took issue with two key aspects of the methodology: context and the complexity of English for non-American speakers.

Google’s human labelers were given “no additional metadata” about each Reddit comment – but a post’s meaning change entirely depending on what thread it’s posted in.

SurgeAI gives an example of a failure caused by a lack of context:

‘We SERIOUSLY NEED to have Jail Time based on a person’s race‘ means one thing in a subreddit about law, and something completely different in a subreddit about fantasy worldbuilding.”

Most read in Tech

Also, Google’s labelers were indeed English speakers, but from India and not the United States.

Almost half the web traffic on Reddit comes from users based in the United States – the site is highly Americanized with slang and cultural references that could be hard for a foreign English speaker to understand and accurately label.

SurgeAI compels readers to imagine the difficulty of explaining the common Reddit throwaway comment “orange man bad” to a non-American with no knowledge of US politics.

Ultimately, the dataset is simply plugged with bad data and would yield a bad program, TheNextWeb argues.

“This particular dataset is specifically built for decision-making related to human outcomes,” Tristan Greene writes for the site.

“Every single time the AI makes a decision that either rewards or punishes any human, it causes demonstrable harm to other humans.”

Zach Wilson's mom hits gym & slams claim she's cashing in on son's 'hook-up'
Heartbreaking plea of mom's son, 12, who 'wants her home' after she vanished

Google’s GoEmotions dataset is so innately flawed, Greene calls for dismantling it and starting over.

Of all the major tech players jostling around Silicon Valley, Google has invested the most in AI with almost $4billion spent on acquiring AI-based startups.

This post first appeared on Thesun.co.uk

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

How old is Technoblade and does he have cancer?

TECHNOBLADE, a Minecraft YouTuber, has revealed he has been diagnosed with cancer.…

Bonkers Microsoft Surface Duo phone with TWO screens lands in UK – but it’s ludicrously expensive

MICROSOFT’S quirky double-screened Surface Duo smartphone has finally landed in the UK.…

This Christmas, It’s ‘Firmageddon’ as Climate Change Hits Oregon

This story originally appeared in The Guardian and is part of the…

37-million year-old ‘saber-toothed tiger’ is up for auction with a starting price of $90,000

The skeleton of a creature commonly known as a ‘saber-toothed tiger’ is…