<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Just a quick disclaimer about the extern. It's little more than a Pd

    wrapper for the sphinx hello world example. The build environment

    works, though (and was a real pain to get right on Linux because of

    some function name conflicts between sphinx and Pd), so it's a good

    jumping off point for developing something more powerful.<br>

    <br>

    Originally I wanted to make a C application that could do automatic

    training so that people could do voice commands with high accuracy,

    but this is a big project and I got distracted away from it.<br>

    <br>

    One note about this is that voice recognizers typically want to be

    optimized to correctly decode most voices most of the time, but one

    could certainly train it to correctly decode a particular voice

    almost all of the time. This is another great advantage of sphinx:

    flexibility.<br>

    <br>

    This extern doesn't build on Windows, by the way, sorry.<br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 02/07/2015 11:55 AM, Jonathan Wilkes

      wrote:<br>

    </div>

    <blockquote

      cite="mid:463199607.1355197.1423338931049.JavaMail.yahoo@mail.yahoo.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <div style="color:#000; background-color:#fff;

        font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial,

        Lucida Grande, sans-serif;font-size:16px">

        <div id="yui_3_16_0_1_1423336016384_42688" dir="ltr"><span

            id="yui_3_16_0_1_1423336016384_42689">Thanks, I didn't know

            there was a Sphinx external.  It also looks like the Sphinx

            website got a face-lift-- hopefully the software is also

            more approachable than the last time I looked.</span></div>

        <div id="yui_3_16_0_1_1423336016384_47769" dir="ltr"><br>

          <span id="yui_3_16_0_1_1423336016384_42689"></span></div>

        <div id="yui_3_16_0_1_1423336016384_47770" dir="ltr"><span

            id="yui_3_16_0_1_1423336016384_42689">-Jonathan<br>

          </span></div>

        <div id="yui_3_16_0_1_1423336016384_42687" dir="ltr"><br>

          <span></span></div>

        <div id="yui_3_16_0_1_1423336016384_42686" dir="ltr"><span></span></div>

        <div class="qtdSeparateBR"><br>

          <br>

        </div>

        <div style="display: block;" class="yahoo_quoted">

          <div style="font-family: HelveticaNeue, Helvetica Neue,

            Helvetica, Arial, Lucida Grande, sans-serif; font-size:

            16px;">

            <div style="font-family: HelveticaNeue, Helvetica Neue,

              Helvetica, Arial, Lucida Grande, sans-serif; font-size:

              16px;">

              <div dir="ltr"> <font size="2" face="Arial"> On Saturday,

                  February 7, 2015 2:16 PM, david medine

                  <a class="moz-txt-link-rfc2396E" href="mailto:dmedine@ucsd.edu"><dmedine@ucsd.edu></a> wrote:<br>

                </font> </div>

              <br>

              <br>

              <div class="y_msg_container">

                <div id="yiv1173939123">

                  <div>

                    <div class="yiv1173939123moz-cite-prefix">One of the

                      bad things about Google is that it is essentially

                      a giant billboard. Having said that, I am going to

                      advertise a couple of things.<br clear="none">

                      <br clear="none">

                      If you want a speech recognition API that doesn't

                      rely on a tax-exempt corporation that has more

                      money than the nation of Russia, builds its

                      products in unsafe overseas sweatshops, charges

                      you $99/year to develop software for the device

                      you already paid for, eagerly aids the federal

                      government in unconstitutional spying, or is in

                      the process of assimilating all of human culture,

                      you might want to check CMU's speech recognition

                      toolkit, Sphinx. <br clear="none">

                      <a moz-do-not-send="true" href=""

                        class="removed-link" rel="nofollow" shape="rect"

                        target="_blank">http://cmusphinx.sourceforge.net/</a><br

                        clear="none">

                      <br clear="none">

                      Another advantage of Sphinx is that it doesn't

                      rely on internet access to decode speech. And,

                      someone even wrote a simple Pd extern with

                      Sphinx.  <br clear="none">

                      <a moz-do-not-send="true" href=""

                        class="removed-link" rel="nofollow" shape="rect"

                        target="_blank">https://github.com/dmedine/recog_tilde</a><br

                        clear="none">

                      <br clear="none">

                      And yes, it is quite difficult to train Sphinx.

                      Building a dictionary is copious work, and Google

                      and Apple have done it 1000 better than anyone

                      else because they have mountains of data and cash

                      and luxury model machine learning algorithms. . .

                      but no one ever said DIY was easy. <br

                        clear="none">

                      <br clear="none">

                      On 2/7/15 9:55 AM, Spencer Russell wrote:<br

                        clear="none">

                    </div>

                    <blockquote type="cite"> </blockquote>

                  </div>

                  <title></title>

                  <div>

                    <div>I saw a really interesting talk last year by <span

                        class="yiv1173939123highlight"

                        style="background-color:rgb(255, 255, 255);"><span

                          class="yiv1173939123colour"

                          style="color:rgb(31, 31, 31);">Johan

                          Schalkwyk, </span></span>the head of the

                      Google speech recognition group. One of the points

                      he made was that while Google's algorithms are

                      important, they got a lot more leverage from the

                      sheer amount of data they have access to. It

                      allows them to get away with much simpler

                      algorithms. I think that's one of the biggest

                      problems with trying to compete with Google and

                      Apple on speech recognition, because OSS

                      developers just don't have access to a huge corpus

                      of data. <br clear="none">

                    </div>

                    <div> </div>

                    <div>Even though a lot of that data is unlabeled

                      (they don't know what the actual words are that

                      correspond to the audio), they have a huge amount

                      of interaction data, so they can for instance look

                      at whether the user tried multiple times with a

                      particular phrase or whether the user accepted a

                      given transcription.<br clear="none">

                    </div>

                    <div> </div>

                    <div>It seems like if we want an open-source speech

                      recognition package we should focus on finding

                      ways to get an accessible shared corpus. Unless

                      there was some tricky licensing I think that

                      corpus would also benefit the big guys though, so

                      their corpus would remain a proper superset of

                      what's available to OSS developers.</div>

                    <div> </div>

                    <div> </div>

                    <div class="yiv1173939123yqt3359243372"

                      id="yiv1173939123yqt97733">

                      <div>On Sat, Feb 7, 2015, at 11:39 AM, Jonathan

                        Wilkes via Pd-list wrote:<br clear="none">

                      </div>

                      <blockquote type="cite">

                        <div

                          style="color:#000;background-color:#fff;font-family:HelveticaNeue,

                          Helvetica Neue, Helvetica, Arial, Lucida

                          Grande, sans-serif;font-size:16px;">

                          <div dir="ltr">Hi list,<br clear="none">

                          </div>

                          <div dir="ltr"> </div>

                          <div dir="ltr">Here's a fun

                            thought-experiment: suppose you're doing a

                            port of Pd, and the graphics toolkit you're

                            using will include functionality to hook in

                            to Google's speech recognition API.  Such an

                            API could make the software accessible to

                            people who would otherwise find it very hard

                            to write Pd patches.<br clear="none">

                          </div>

                          <div dir="ltr"> </div>

                          <div dir="ltr">However, the API works by

                            shipping off your audio data to Google's

                            servers, doing the computation on their

                            machines, and sending you back the results.<br

                              clear="none">

                          </div>

                          <div dir="ltr"> </div>

                          <div dir="ltr">Do you use the API in your

                            port, or not?<br clear="none">

                          </div>

                          <div dir="ltr"> </div>

                          <div dir="ltr">I'm decidedly not going to use

                            that API, for what I think are obvious

                            security, privacy, and philosophical

                            reasons.  But I'm curious just how obvious

                            the security and privacy implications are to

                            others here.  How many people would use a

                            speech-patching mechanism that sends all

                            your speech to Google?<br clear="none">

                          </div>

                          <div dir="ltr"> </div>

                          <div dir="ltr">I'm also increasingly worried

                            by the apparent gap between the usability of

                            Google and Apple's products, and the

                            seemingly glacial pace at which _usable_

                            free software speech recognition is being

                            developed.  My position won't change, but

                            I'm afraid it's becoming more symbolic than

                            practical as these insecure tools become a

                            natural part of most people's lives.<br

                              clear="none">

                          </div>

                          <div> </div>

                          <div dir="ltr">-Jonathan<br clear="none">

                          </div>

                        </div>

                        <div><u>_______________________________________________</u><br

                            clear="none">

                        </div>

                        <div><a moz-do-not-send="true" href=""

                            class="removed-link" rel="nofollow"

                            shape="rect"

                            ymailto="mailto:Pd-list@lists.iem.at"

                            target="_blank">Pd-list@lists.iem.at</a>

                          mailing list<br clear="none">

                        </div>

                        <div>UNSUBSCRIBE and account-management -> <a

                            moz-do-not-send="true" href=""

                            class="removed-link" rel="nofollow"

                            shape="rect" target="_blank">http://lists.puredata.info/listinfo/pd-list</a><br

                            clear="none">

                        </div>

                      </blockquote>

                      <div> </div>

                    </div>

                    <br clear="none">

                    <fieldset class="yiv1173939123mimeAttachmentHeader"></fieldset>

                    <br clear="none">

                    <pre>_______________________________________________

<a moz-do-not-send="true" href="" rel="nofollow" shape="rect" class="yiv1173939123moz-txt-link-abbreviated removed-link" ymailto="mailto:Pd-list@lists.iem.at" target="_blank">Pd-list@lists.iem.at</a> mailing list

UNSUBSCRIBE and account-management -> <a moz-do-not-send="true" href="" rel="nofollow" shape="rect" class="yiv1173939123moz-txt-link-freetext removed-link" target="_blank">http://lists.puredata.info/listinfo/pd-list</a>

</pre>

                    <br clear="none">

                  </div>

                </div>

                <br>

                <div class="yqt3359243372" id="yqt67497">_______________________________________________<br

                    clear="none">

                  <a moz-do-not-send="true" href="" class="removed-link"

                    shape="rect" ymailto="mailto:Pd-list@lists.iem.at">Pd-list@lists.iem.at</a>

                  mailing list<br clear="none">

                  UNSUBSCRIBE and account-management -> <a

                    moz-do-not-send="true" href="" class="removed-link"

                    shape="rect" target="_blank">http://lists.puredata.info/listinfo/pd-list</a><br

                    clear="none">

                </div>

                <br>

                <br>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>