<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    There is still the access to computational power challenge, unless

    we make a seti@home-like speech recognition crawler which in and of

    itself has similar ethical implications.<br>

    <br>

    <div class="moz-cite-prefix">On 2/7/2015 12:55 PM, Spencer Russell

      wrote:<br>

    </div>

    <blockquote

cite="mid:1423331706.1597882.224374225.2BD9D12C@webmail.messagingengine.com"

      type="cite">

      <title></title>

      <div>I saw a really interesting talk last year by <span

          class="highlight" style="background-color: rgb(255, 255, 255)"><span

            class="colour" style="color:rgb(31, 31, 31)">Johan

            Schalkwyk, </span></span>the head of the Google speech

        recognition group. One of the points he made was that while

        Google's algorithms are important, they got a lot more leverage

        from the sheer amount of data they have access to. It allows

        them to get away with much simpler algorithms. I think that's

        one of the biggest problems with trying to compete with Google

        and Apple on speech recognition, because OSS developers just

        don't have access to a huge corpus of data. <br>

      </div>

      <div> </div>

      <div>Even though a lot of that data is unlabeled (they don't know

        what the actual words are that correspond to the audio), they

        have a huge amount of interaction data, so they can for instance

        look at whether the user tried multiple times with a particular

        phrase or whether the user accepted a given transcription.<br>

      </div>

      <div> </div>

      <div>It seems like if we want an open-source speech recognition

        package we should focus on finding ways to get an accessible

        shared corpus. Unless there was some tricky licensing I think

        that corpus would also benefit the big guys though, so their

        corpus would remain a proper superset of what's available to OSS

        developers.</div>

      <div> </div>

      <div> </div>

      <div>On Sat, Feb 7, 2015, at 11:39 AM, Jonathan Wilkes via Pd-list

        wrote:<br>

      </div>

      <blockquote type="cite">

        <div style="color:#000; background-color:#fff;

          font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial,

          Lucida Grande, sans-serif;font-size:16px">

          <div dir="ltr">Hi list,<br>

          </div>

          <div dir="ltr"> </div>

          <div dir="ltr">Here's a fun thought-experiment: suppose you're

            doing a port of Pd, and the graphics toolkit you're using

            will include functionality to hook in to Google's speech

            recognition API.  Such an API could make the software

            accessible to people who would otherwise find it very hard

            to write Pd patches.<br>

          </div>

          <div dir="ltr"> </div>

          <div dir="ltr">However, the API works by shipping off your

            audio data to Google's servers, doing the computation on

            their machines, and sending you back the results.<br>

          </div>

          <div dir="ltr"> </div>

          <div dir="ltr">Do you use the API in your port, or not?<br>

          </div>

          <div dir="ltr"> </div>

          <div dir="ltr">I'm decidedly not going to use that API, for

            what I think are obvious security, privacy, and

            philosophical reasons.  But I'm curious just how obvious the

            security and privacy implications are to others here.  How

            many people would use a speech-patching mechanism that sends

            all your speech to Google?<br>

          </div>

          <div dir="ltr"> </div>

          <div dir="ltr">I'm also increasingly worried by the apparent

            gap between the usability of Google and Apple's products,

            and the seemingly glacial pace at which _usable_ free

            software speech recognition is being developed.  My position

            won't change, but I'm afraid it's becoming more symbolic

            than practical as these insecure tools become a natural part

            of most people's lives.<br>

          </div>

          <div> </div>

          <div dir="ltr">-Jonathan<br>

          </div>

        </div>

        <div><u>_______________________________________________</u><br>

        </div>

        <div><a moz-do-not-send="true"

            href="mailto:Pd-list@lists.iem.at">Pd-list@lists.iem.at</a>

          mailing list<br>

        </div>

        <div>UNSUBSCRIBE and account-management -> <a

            moz-do-not-send="true"

            href="http://lists.puredata.info/listinfo/pd-list">http://lists.puredata.info/listinfo/pd-list</a><br>

        </div>

      </blockquote>

      <div> </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

<a class="moz-txt-link-abbreviated" href="mailto:Pd-list@lists.iem.at">Pd-list@lists.iem.at</a> mailing list

UNSUBSCRIBE and account-management -> <a class="moz-txt-link-freetext" href="http://lists.puredata.info/listinfo/pd-list">http://lists.puredata.info/listinfo/pd-list</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Ivica Ico Bukvic, D.M.A.

Associate Professor

Computer Music

ICAT Senior Fellow

DISIS, L2Ork

Virginia Tech

School of Performing Arts – 0141

Blacksburg, VA 24061

(540) 231-6139

<a class="moz-txt-link-abbreviated" href="mailto:ico@vt.edu">ico@vt.edu</a>

<a class="moz-txt-link-abbreviated" href="http://www.performingarts.vt.edu">www.performingarts.vt.edu</a>

disis.music.vt.edu

l2ork.music.vt.edu</pre>

  </body>

</html>