Harvard University assistant professor Himabindu Lakkaraju studies the role trust plays in human decisionmaking in professional settings. She’s working with nearly 200 doctors at hospitals in Massachusetts to understand how trust in AI can change how doctors diagnose a patient.
For common illnesses like the flu, AI isn’t very helpful, since human professionals can recognize them pretty easily. But Lakkaraju found that AI can help doctors diagnose hard-to-identify illnesses like autoimmune diseases. In her latest work, Lakkaraju and coworkers gave doctors records of roughly 2,000 patients and predictions from an AI system, then asked them to predict whether the patient would have a stroke in six months. They varied the information supplied about the AI system, including its accuracy, confidence interval, and an explanation of how the system works. They found doctors’ predictions were the most accurate when they were given the most information about the AI system.
Lakkaraju says she’s happy to see that NIST is trying to quantify trust, but she says the agency should consider the role explanations can play in human trust of AI systems. In the experiment, the accuracy of predicting strokes by doctors went down when doctors were given an explanation without data to inform the decision, implying that an explanation alone can lead people to trust AI too much.
“Explanations can bring about unusually high trust even when it is not warranted, which is a recipe for problems,” she says. “But once you start putting numbers on how good the explanation is, then people’s trust slowly calibrates.”
Other nations are also trying to confront the question of trust in AI. The US is among 40 countries that signed onto AI principles that emphasize trustworthiness. A document signed by about a dozen European countries says trustworthiness and innovation go hand in hand, and can be considered “two sides of the same coin.”
NIST and the OECD, a group of 38 countries with advanced economies, are working on tools to designate AI systems as high or low risk. The Canadian government created an algorithm impact assessment process in 2019 for businesses and government agencies. There, AI falls into four categories—from no impact on people’s lives or the rights of communities to very high risk and perpetuating harm on individuals and communities. Rating an algorithm takes about 30 minutes. The Canadian approach requires that developers notify users for all but the lowest-risk systems.
European Union lawmakers are considering AI regulations that could help define global standards for the kind of AI that’s considered low or high risk and how to regulate the technology. Like Europe’s landmark GDPR privacy law, the EU AI strategy could lead the largest companies in the world that deploy artificial intelligence to change their practices worldwide.
The regulation calls for the creation of a public registry of high-risk forms of AI in use in a database managed by the European Commission. Examples of AI deemed high risk included in the document include AI used for education, employment, or as safety components for utilities like electricity, gas, or water. That report will likely be amended before passage, but the draft calls for a ban on AI for social scoring of citizens by governments and real-time facial recognition.
The EU report also encourages allowing businesses and researchers to experiment in areas called “sandboxes,” designed to make sure the legal framework is “innovation-friendly, future-proof, and resilient to disruption.” Earlier this month, the Biden administration introduced the National Artificial Intelligence Research Resource Task Force aimed at sharing government data for research on issues like health care or autonomous driving. Ultimate plans would require approval from Congress.
For now, the AI user trust score is being developed for AI practitioners. Over time, though, the scores could empower individuals to avoid untrustworthy AI and nudge the marketplace toward deploying robust, tested, trusted systems. Of course that’s if they know AI is being used at all.
More Great WIRED Stories