[PD] speech recognition and ethics

Jonathan Wilkes jancsika at yahoo.com
Sat Feb 7 20:29:21 CET 2015


Could context and domain-specific applications also simplify the algorithms to a great degree?

-Jonathan 

     On Saturday, February 7, 2015 1:06 PM, Spencer Russell <spencer at ssfrr.com> wrote:
   

 I saw a really interesting talk last year by Johan Schalkwyk, the head of the Google speech recognition group. One of the points he made was that while Google's algorithms are important, they got a lot more leverage from the sheer amount of data they have access to. It allows them to get away with much simpler algorithms. I think that's one of the biggest problems with trying to compete with Google and Apple on speech recognition, because OSS developers just don't have access to a huge corpus of data. 
 Even though a lot of that data is unlabeled (they don't know what the actual words are that correspond to the audio), they have a huge amount of interaction data, so they can for instance look at whether the user tried multiple times with a particular phrase or whether the user accepted a given transcription.
 It seems like if we want an open-source speech recognition package we should focus on finding ways to get an accessible shared corpus. Unless there was some tricky licensing I think that corpus would also benefit the big guys though, so their corpus would remain a proper superset of what's available to OSS developers.  On Sat, Feb 7, 2015, at 11:39 AM, Jonathan Wilkes via Pd-list wrote:

Hi list,
 Here's a fun thought-experiment: suppose you're doing a port of Pd, and the graphics toolkit you're using will include functionality to hook in to Google's speech recognition API.  Such an API could make the software accessible to people who would otherwise find it very hard to write Pd patches.
 However, the API works by shipping off your audio data to Google's servers, doing the computation on their machines, and sending you back the results.
 Do you use the API in your port, or not?
 I'm decidedly not going to use that API, for what I think are obvious security, privacy, and philosophical reasons.  But I'm curious just how obvious the security and privacy implications are to others here.  How many people would use a speech-patching mechanism that sends all your speech to Google?
 I'm also increasingly worried by the apparent gap between the usability of Google and Apple's products, and the seemingly glacial pace at which _usable_ free software speech recognition is being developed.  My position won't change, but I'm afraid it's becoming more symbolic than practical as these insecure tools become a natural part of most people's lives.
 -Jonathan
_______________________________________________
Pd-list at lists.iem.at mailing list
UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list

 
_______________________________________________
Pd-list at lists.iem.at mailing list
UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list


   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20150207/126e342e/attachment.html>


More information about the Pd-list mailing list