[PD] speech recognition and ethics

David Medine dmedine at ucsd.edu
Sat Feb 7 21:45:45 CET 2015


Just a quick disclaimer about the extern. It's little more than a Pd 
wrapper for the sphinx hello world example. The build environment works, 
though (and was a real pain to get right on Linux because of some 
function name conflicts between sphinx and Pd), so it's a good jumping 
off point for developing something more powerful.

Originally I wanted to make a C application that could do automatic 
training so that people could do voice commands with high accuracy, but 
this is a big project and I got distracted away from it.

One note about this is that voice recognizers typically want to be 
optimized to correctly decode most voices most of the time, but one 
could certainly train it to correctly decode a particular voice almost 
all of the time. This is another great advantage of sphinx: flexibility.

This extern doesn't build on Windows, by the way, sorry.


On 02/07/2015 11:55 AM, Jonathan Wilkes wrote:
> Thanks, I didn't know there was a Sphinx external.  It also looks like 
> the Sphinx website got a face-lift-- hopefully the software is also 
> more approachable than the last time I looked.
>
> -Jonathan
>
>
>
> On Saturday, February 7, 2015 2:16 PM, david medine <dmedine at ucsd.edu> 
> wrote:
>
>
> One of the bad things about Google is that it is essentially a giant 
> billboard. Having said that, I am going to advertise a couple of things.
>
> If you want a speech recognition API that doesn't rely on a tax-exempt 
> corporation that has more money than the nation of Russia, builds its 
> products in unsafe overseas sweatshops, charges you $99/year to 
> develop software for the device you already paid for, eagerly aids the 
> federal government in unconstitutional spying, or is in the process of 
> assimilating all of human culture, you might want to check CMU's 
> speech recognition toolkit, Sphinx.
> http://cmusphinx.sourceforge.net/
>
> Another advantage of Sphinx is that it doesn't rely on internet access 
> to decode speech. And, someone even wrote a simple Pd extern with Sphinx.
> https://github.com/dmedine/recog_tilde
>
> And yes, it is quite difficult to train Sphinx. Building a dictionary 
> is copious work, and Google and Apple have done it 1000 better than 
> anyone else because they have mountains of data and cash and luxury 
> model machine learning algorithms. . . but no one ever said DIY was easy.
>
> On 2/7/15 9:55 AM, Spencer Russell wrote:
> I saw a really interesting talk last year by Johan Schalkwyk, the head 
> of the Google speech recognition group. One of the points he made was 
> that while Google's algorithms are important, they got a lot more 
> leverage from the sheer amount of data they have access to. It allows 
> them to get away with much simpler algorithms. I think that's one of 
> the biggest problems with trying to compete with Google and Apple on 
> speech recognition, because OSS developers just don't have access to a 
> huge corpus of data.
> Even though a lot of that data is unlabeled (they don't know what the 
> actual words are that correspond to the audio), they have a huge 
> amount of interaction data, so they can for instance look at whether 
> the user tried multiple times with a particular phrase or whether the 
> user accepted a given transcription.
> It seems like if we want an open-source speech recognition package we 
> should focus on finding ways to get an accessible shared corpus. 
> Unless there was some tricky licensing I think that corpus would also 
> benefit the big guys though, so their corpus would remain a proper 
> superset of what's available to OSS developers.
> On Sat, Feb 7, 2015, at 11:39 AM, Jonathan Wilkes via Pd-list wrote:
>> Hi list,
>> Here's a fun thought-experiment: suppose you're doing a port of Pd, 
>> and the graphics toolkit you're using will include functionality to 
>> hook in to Google's speech recognition API.  Such an API could make 
>> the software accessible to people who would otherwise find it very 
>> hard to write Pd patches.
>> However, the API works by shipping off your audio data to Google's 
>> servers, doing the computation on their machines, and sending you 
>> back the results.
>> Do you use the API in your port, or not?
>> I'm decidedly not going to use that API, for what I think are obvious 
>> security, privacy, and philosophical reasons.  But I'm curious just 
>> how obvious the security and privacy implications are to others 
>> here.  How many people would use a speech-patching mechanism that 
>> sends all your speech to Google?
>> I'm also increasingly worried by the apparent gap between the 
>> usability of Google and Apple's products, and the seemingly glacial 
>> pace at which _usable_ free software speech recognition is being 
>> developed.  My position won't change, but I'm afraid it's becoming 
>> more symbolic than practical as these insecure tools become a natural 
>> part of most people's lives.
>> -Jonathan
>> _________________________________________________
>> Pd-list at lists.iem.at mailing list
>> UNSUBSCRIBE and account-management -> 
>> http://lists.puredata.info/listinfo/pd-list
>
>
> _______________________________________________
> Pd-list at lists.iem.at  mailing list
> UNSUBSCRIBE and account-management ->http://lists.puredata.info/listinfo/pd-list
>
>
> _______________________________________________
> Pd-list at lists.iem.at mailing list
> UNSUBSCRIBE and account-management -> 
> http://lists.puredata.info/listinfo/pd-list
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20150207/f426b3ef/attachment.html>


More information about the Pd-list mailing list