The Movement to Hold AI Accountable Gains More Steam

A New York City law requires algorithms used in hiring to be “audited” for bias. It’s the first in the US—and part of a larger push toward regulation.
Stack of papers with multicolored paperclips on edges of the pages
Photograph: MirageC/Getty Images

Algorithms play a growing role in our lives, even as their flaws are becoming more apparent: A Michigan man wrongly accused of fraud had to file for bankruptcy; automated screening tools disproportionately harm people of color who want to buy a home or rent an apartment; Black Facebook users were subjected to more abuse than white users. Other automated systems have improperly rated teachers, graded students, and flagged people with dark skin more often for cheating on tests.

Now, efforts are underway to better understand how AI works and hold users accountable. New York’s City Council last month adopted a law requiring audits of algorithms used by employers in hiring or promotion. The law, the first of its kind in the nation, requires employers to bring in outsiders to assess whether an algorithm exhibits bias based on sex, race, or ethnicity. Employers also must tell job applicants who live in New York when artificial intelligence plays a role in deciding who gets hired or promoted.

In Washington, DC, members of Congress are drafting a bill that would require businesses to evaluate automated decisionmaking systems used in areas such as health care, housing, employment, or education, and report the findings to the Federal Trade Commission; three of the FTC’s five members support stronger regulation of algorithms. An AI Bill of Rights proposed last month by the White House calls for disclosing when AI makes decisions that impact a person’s civil rights, and it says AI systems should be “carefully audited” for accuracy and bias, among other things.

Elsewhere, European Union lawmakers are considering legislation requiring inspection of AI deemed high-risk and creating a public registry of high-risk systems. Countries including China, Canada, Germany, and the UK have also taken steps to regulate AI in recent years.

Julia Stoyanovich, an associate professor at New York University who served on the New York City Automated Decision Systems Task Force, says she and students recently examined a hiring tool and found it assigned people different personality scores based on the software program with which they created their résumé. Other studies have found that hiring algorithms favor applicants based on where they went to school, their accent, whether they wear glasses, or whether there’s a bookshelf in the background.

Stoyanovich supports the disclosure requirement in the New York City law, but she says the auditing requirement is flawed because it only applies to discrimination based on gender or race. She says the algorithm that rated people based on the font in their résumé would pass muster under the law because it didn’t discriminate on those grounds.

“Some of these tools are truly nonsensical,” she says. “These are things we really should know as members of the public and just as people. All of us are going to apply for jobs at some point.”

Some proponents of greater scrutiny favor mandatory audits of algorithms similar to the audits of companies' financials. Others prefer “impact assessments” akin to environmental impact reports. Both groups agree that the field desperately needs standards for how such reviews should be conducted and what they should include. Without standards, businesses could engage in “ethics washing” by arranging for favorable audits. Proponents say the reviews won’t solve all problems associated with algorithms, but they would help hold the makers and users of AI legally accountable.

A forthcoming report by the Algorithmic Justice League (AJL), a private nonprofit, recommends requiring disclosure when an AI model is used and creating a public repository of incidents where AI caused harm. The repository could help auditors spot potential problems with algorithms, and help regulators investigate or fine repeat offenders. AJL founder Joy Buolamwini coauthored an influential 2018 audit that found facial-recognition algorithms work best on white men and worst on women with dark skin.

The report says it’s crucial that auditors be independent and results be publicly reviewable. Without those safeguards, “there’s no accountability mechanism at all,” says AJL head of research Sasha Costanza-Chock. “If they want to, they can just bury it; if a problem is found, there’s no guarantee that it’s addressed. It’s toothless, it’s secretive, and the auditors have no leverage.”

Deb Raji is a fellow at the AJL who evaluates audits, and she participated in the 2018 audit of facial-recognition algorithms. She cautions that Big Tech companies appear to be taking a more adversarial approach to outside auditors, sometimes threatening lawsuits based on privacy or anti-hacking grounds. In August, Facebook prevented NYU academics from monitoring political ad spending and thwarted efforts by a German researcher to investigate the Instagram algorithm.

Raji calls for creating an audit oversight board within a federal agency to do things like enforce standards or mediate disputes between auditors and companies. Such a board could be fashioned after the Financial Accounting Standards Board or the Food and Drug Administration’s standards for evaluating medical devices.

Standards for audits and auditors are important because growing calls to regulate AI have led to the creation of a number of auditing startups, some by critics of AI, and others that might be more favorable to the companies they are auditing. In 2019, a coalition of AI researchers from 30 organizations recommended outside audits and regulation that creates a marketplace for auditors as part of building AI that people trust with verifiable results.

Cathy O’Neil started a company, O'Neil Risk Consulting & Algorithmic Auditing (Orcaa), in part to assess AI that’s invisible or inaccessible to the public. For example, Orcaa works with the attorneys general of four US states to evaluate financial or consumer product algorithms. But O’Neil says she loses potential customers because companies want to maintain plausible deniability and don’t want to know if or how their AI harms people.

Earlier this year Orcaa performed an audit of an algorithm used by HireVue to analyze people’s faces during job interviews. A press release by the company claimed the audit found no accuracy or bias issues, but the audit made no attempt to assess the system’s code, training data, or performance for different groups of people. Critics said HireVue’s characterization of the audit was misleading and disingenuous. Shortly before the release of the audit, HireVue said it would stop using the AI in video job interviews.

O’Neil thinks audits can be useful, but she says in some respects it’s too early to take the approach prescribed by the AJL, in part because there are no standards for audits and we don’t fully understand the ways in which AI harms people. Instead, O’Neil favors another approach: algorithmic impact assessments.

While an audit may evaluate the output of an AI model to see if, for example, it treats men differently than women, an impact assessment may focus more on how an algorithm was designed, who could be harmed, and who’s responsible if things go wrong. In Canada, businesses must assess the risk to individuals and communities of deploying an algorithm; in the US, assessments are being developed to decide when AI is low- or high-risk and to quantify how much people trust AI.

The idea of measuring impact and potential harm began in the 1970s with the National Environmental Protection Act, which led to the creation of environmental impact statements. Those reports take into account factors from pollution to the potential discovery of ancient artifacts; similarly impact assessments for algorithms would consider a broad range of factors.

UCLA law professor Andrew Selbst was one of the first to suggest impact assessments for algorithms. The AI Now Institute, several of whose key players now advise the FTC, endorsed a similar approach by federal agencies in 2018.

In a paper forthcoming in the Harvard Journal of Law & Technology, Selbst champions documentation because we don’t yet fully understand how AI harms people. Research into algorithmic harm is only a few years old, and very little is known about AI’s impact on groups such as people who identify as queer, for example. Documentation of impact assessments, he said, will be necessary for people interested in filing lawsuits.

“We need to know how the many subjective decisions that go into building a model lead to the observed results, and why those decisions were thought justified at the time, just to have a chance at disentangling everything when something goes wrong,” the paper reads. “Algorithmic impact assessments cannot solve all algorithmic harms, but they can put the field and regulators in better positions to avoid the harms in the first place and to act on them once we know more.”

A revamped version of the Algorithmic Accountability Act, first introduced in 2019, is now being discussed in Congress. According to a draft version of the legislation reviewed by WIRED, the bill would require businesses that use automated decisionmaking systems in areas such as health care, housing, employment, or education to carry out impact assessments and regularly report results to the FTC. A spokesperson for Senator Ron Wyden (D-Oregon), a cosponsor of the bill, says it calls on the FTC to create a public repository of automated decisionmaking systems and aims to establish an assessment process to enable future regulation by Congress or agencies like the FTC. The draft asks the FTC to decide what should be included in impact assessments and summary reports.

Fiona Scott Morton is a professor at the Yale University School of Management and served as chief economist in the US Department of Justice during the Obama administration. She believes tools such as audits or assessments could change how companies building AI are seen by courts and judges, because it’s easier to say an instance of harm caused by AI was an accident than it is to refute documentation from an audit or impact assessment. But Morton thinks it's unlikely Congress will require audits of algorithms; she thinks change is more likely from a Biden administration executive order or directives by federal agencies.

Throughout the past year, people with experience documenting how AI can cause harm have highlighted the steps they feel are necessary for audits and impact assessments to succeed and how they can fail. Some draw lessons from initial efforts to regulate AI around the world and past efforts to protect people or the environment from dangerous technology.

In August, the Center for Long-Term Cybersecurity at UC Berkeley suggested that a risk assessment tool for evaluating AI being developed by the federal government include factors such as a system’s carbon footprint and the potential to exacerbate inequality; the center suggested the government take a stronger approach on AI than it did for cybersecurity. The AJL also sees lessons in cybersecurity practices. A forthcoming report coauthored by Raji calls for businesses to create processes to handle instances of AI harm akin to the way IT security workers treat bugs and security patch updates. Some of AJL’s recommendations—that companies should offer bias bounties, publicly report major incidents, and develop internal systems for the escalation of harm incidents—are drawn from cybersecurity.

In a report earlier this year, researchers at Cornell University and Microsoft Research suggest AI auditors learn from how sociologists worked with communities in the 1940s and 1950s to document instances of discrimination in housing and hiring applications.

The authors suggest that algorithm auditors look for more collaborative ways to involve communities and society in assessing AI systems. People with no experience in machine learning have identified problems with AI in the past. Last year, users helped uncover bias that discriminates against people with dark skin on Twitter and Zoom. These discoveries led Zoom to tweak its algorithm and Twitter to end use of its AI for cropping photos.

Another report, released in June by the AI on the Ground team at Data & Society, argues that community activists, critical scholars, policymakers, and technologists working for the public interest should be involved in assessing algorithms. The report says what counts as an impact often reflects the wants and needs of people in power. Done wrong, they say, impact assessments can replicate existing power structures while allowing businesses and governments to appear accountable, instead of giving regular people a way to act when things go wrong.

Back in New York, Stoyanovich says she hopes the disclosure provision in the new city law starts a movement toward meaningful empowerment of individuals, especially when it comes to instances when a person’s livelihood or freedom are at stake. She advocates public input in audits of algorithms.

“I really believe that this cannot be a space where all the decisions and fixing comes from a handful of expert entities,” she says. “There needs to be a public movement here. Unless the public applies pressure, we won't be able to regulate this in any way that’s meaningful, and business interests will always prevail.”

Updated, 12-2-21, 2pm ET: An earlier version of this article incorrectly said Julia Stoyanovich serves on New York's Automated Decision Systems Task Force, and that the hiring tool she and her students reviewed gauged applicants based on the font used in their résumé. 


More Great WIRED Stories