From the horse's mouth
Back in November 2016, I wrote about VoCo, described as Photoshop For Speech, which could synthesise anyones voice if it was given a twenty minute sample to learn the pronunciations and cadences used by that speaker. This week, a Canadian start-up named Lyrebird has announced a product which can clone a voice using just a one minute sample.
Lyrebird's founders, three university students from Montreal, have produced a system which uses artificial intelligence techniques to synthesise passable voices from a remarkably small data sample. Lyrebird says that its algorithms can also add emotion to the speech, so that the speaker can sound, for example, angry, upset, sympathetic, or stressed out.
Whilst the voices generated would not fool us if we were watching a TV program, for example, and can sometimes sound robotic, they probably would fool us if we thought we were eavesdropping into a noisy mobile phone conversation, especially by adding contextual clues such as a reference to a person's name. The speech could easily be made more convincing if the system was trained with longer voice samples, or if the resultant sound files were massaged to smooth out the the audio artifacts and incidental background sounds were added to make them seem more authentic.
This is a sample conversation produced by Lyrebird which, they claim, was synthesised after using a single one-minute sample from each speaker. Do you recognise the voices?
soundcloud.com/user-535691776/dialog
What possible uses are there for cloned speech? Lyrebird suggests it could be used to read audio books, to provide voice interfaces for people with disabilities, and for use in video games, for example, but that is true of synthesised speech as well. There is no compelling argument why cloned speech of an unsuspecting speaker would in any way be more useful than the speech engines we already have as found in products such as Alexa.
Yes, we can all think of reasons why people might want to use this software: to change the truth of what a politician might have said or the tone with which they said it, to create fake celebrity endorsements for products, for an online retailer to use your voice to fabricate evidence of a telephone sale for a product you never ordered, to deceive people into giving confidential information to identify thieves, to fake voice verification on your bank account. But can you think of any legitimate reason, anything which doesn't involve dishonesty and deception?
Lyrebird has an ethics section on its website consisting of just three short paragraphs in which it says it acknowledges that the technology raises important societal issues, and that the technology could be used for fraud, identity theft, and manipulation of political messages, but concludes that it is releasing this software for our own good:
"By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future."
26th April 2017
This article comes from the SKILLZONE email newsletter, published monthly since January 2008, and covering topics related to technology and the internet. All articles and artwork in the SKILLZONE newsletter are orignal content.