A conversation with computational linguist Emily M. Bender about the ways artificial intelligence can go wrong.
(Inside Science) — Automatic speech recognition is an important technology for many people. They ask Alexa to play them music or Siri to call their mother. Sometimes the technology doesn’t understand the users or provide the answers they want. With some technologies, that’s because artificial intelligence just isn’t as adaptable and responsive as an actual human. With others, there can be unintended biases within either the data used to train the technology or the software’s interpretation of the data. And sometimes, the weaknesses of the technology aren’t immediately obvious.
So when computational linguist Emily M. Bender from the University of Washington in Seattle spoke with Inside Science’s Chris Gorski earlier this month at a meeting of the American Association for the Advancement of Science in Seattle, there was a lot to talk about. The conversation below, which has been edited and condensed for clarity and brevity, began with introductions and then quickly moved into a pretty meta place. That’s where the text below begins.
Image
Emily M. Bender
Emily M. Bender: Once you turned on the recorder, both you and I changed a little bit how we’re speaking — we sort of said, “Now we’re doing the interview.” … I might be doing it more than you and you might just be accommodating me, but these are sort of sociolinguistic facts.
Right. You’re going to talk to me differently than you’re going to talk to your friends, or even when you’re talking to me about the subway or your kids because [an interview] is something different. We do that all without thinking about it.
If the sample [used to train the technology] isn’t representative of the broad population, then it’s not going to work as well for people who are not the ones who are represented. And what tends to happen is the people who speak the language variety that has been anointed the standard are the ones who are best represented. … So if you design something that works only for people who have the privilege of being raised to speak the standard variety [of a language], and then deploy it in the world without thinking carefully about that, you could end up just exacerbating current inequities in society because life just gets that much harder for someone for whom that is not their own dialect. … It’s not that that variety of English is any harder for machines, it’s just it’s not the one that the machines have been trained on.
My nightmare is that someone’s going to try to embed automatic speech recognition in the 911 response system. Everybody in the community needs to have access to 911 and that needs to be equitable. To my knowledge, no one’s done this, it’s just something that I worry about. But if you put a computer in the way there, then are you hampering people’s access to emergency first responders.
I just heard about a trans man who can no longer access his bank account, which is in another country, because, as part of his transition, he’s done hormones that have changed his voice. The computer system of the bank is basically saying you’re not the same person anymore.
I’m not a lawyer, so I can’t really tell if the old frameworks are just inadequate or we haven’t learned how to apply the old frameworks. But what’s new is something about scale, big data. The amount of information that can be gathered and processed — that’s operating at a scale where you cannot go through and do quality control. The whole point is that it’s so big you need a computer and so you can’t go in and say, “All right, what kind of garbage data should we avoid using?”
An example of this is the voice-based interview screening, where you have a computer either listening in on people talking to each other or actually talking to the job candidate. If that job candidate is coming in with a language variety that the computer is not prepared for, then the computer is going to give completely spurious answers. And depending on how it’s calibrated it could just be, we didn’t understand you, so you don’t get to come work here — further exacerbating marginalization. If you don’t have transparency and accountability around that, then you’re going to get worse and worse. … If the laws are protecting against harms, does it matter how the harms are being carried out? Or can we still use the same legal framework to try to prevent or at least provide consequences for those harms? Hopefully that was somewhat coherent.
Oh, you have some automated transcription?
Many of the mistakes that speech tech makes are of the type that you’re talking about where it just completely goes off the rails and you can tell that it got it wrong, because there will be some funny sequence of words that shows up. But there are places where it makes mistakes that are really important to the meaning and harder to notice. Machine translation is notoriously bad with negation [Editor’s note: The automated transcription program indicated that the word was not “negation,” but “litigation”]. It’s the difference between “I did go to the subway; I didn’t go to the subway.” Short little sound in there. They are very close acoustically, and I’m sure you’ve had the experience of misunderstanding someone or being like, “Did they just say can or can’t?”
Source: technologyreview.com