<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Just a quick disclaimer about the extern. It's little more than a Pd
wrapper for the sphinx hello world example. The build environment
works, though (and was a real pain to get right on Linux because of
some function name conflicts between sphinx and Pd), so it's a good
jumping off point for developing something more powerful.<br>
<br>
Originally I wanted to make a C application that could do automatic
training so that people could do voice commands with high accuracy,
but this is a big project and I got distracted away from it.<br>
<br>
One note about this is that voice recognizers typically want to be
optimized to correctly decode most voices most of the time, but one
could certainly train it to correctly decode a particular voice
almost all of the time. This is another great advantage of sphinx:
flexibility.<br>
<br>
This extern doesn't build on Windows, by the way, sorry.<br>
<br>
<br>
<div class="moz-cite-prefix">On 02/07/2015 11:55 AM, Jonathan Wilkes
wrote:<br>
</div>
<blockquote
cite="mid:463199607.1355197.1423338931049.JavaMail.yahoo@mail.yahoo.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div style="color:#000; background-color:#fff;
font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial,
Lucida Grande, sans-serif;font-size:16px">
<div id="yui_3_16_0_1_1423336016384_42688" dir="ltr"><span
id="yui_3_16_0_1_1423336016384_42689">Thanks, I didn't know
there was a Sphinx external. It also looks like the Sphinx
website got a face-lift-- hopefully the software is also
more approachable than the last time I looked.</span></div>
<div id="yui_3_16_0_1_1423336016384_47769" dir="ltr"><br>
<span id="yui_3_16_0_1_1423336016384_42689"></span></div>
<div id="yui_3_16_0_1_1423336016384_47770" dir="ltr"><span
id="yui_3_16_0_1_1423336016384_42689">-Jonathan<br>
</span></div>
<div id="yui_3_16_0_1_1423336016384_42687" dir="ltr"><br>
<span></span></div>
<div id="yui_3_16_0_1_1423336016384_42686" dir="ltr"><span></span></div>
<div class="qtdSeparateBR"><br>
<br>
</div>
<div style="display: block;" class="yahoo_quoted">
<div style="font-family: HelveticaNeue, Helvetica Neue,
Helvetica, Arial, Lucida Grande, sans-serif; font-size:
16px;">
<div style="font-family: HelveticaNeue, Helvetica Neue,
Helvetica, Arial, Lucida Grande, sans-serif; font-size:
16px;">
<div dir="ltr"> <font size="2" face="Arial"> On Saturday,
February 7, 2015 2:16 PM, david medine
<a class="moz-txt-link-rfc2396E" href="mailto:dmedine@ucsd.edu"><dmedine@ucsd.edu></a> wrote:<br>
</font> </div>
<br>
<br>
<div class="y_msg_container">
<div id="yiv1173939123">
<div>
<div class="yiv1173939123moz-cite-prefix">One of the
bad things about Google is that it is essentially
a giant billboard. Having said that, I am going to
advertise a couple of things.<br clear="none">
<br clear="none">
If you want a speech recognition API that doesn't
rely on a tax-exempt corporation that has more
money than the nation of Russia, builds its
products in unsafe overseas sweatshops, charges
you $99/year to develop software for the device
you already paid for, eagerly aids the federal
government in unconstitutional spying, or is in
the process of assimilating all of human culture,
you might want to check CMU's speech recognition
toolkit, Sphinx. <br clear="none">
<a moz-do-not-send="true" href=""
class="removed-link" rel="nofollow" shape="rect"
target="_blank">http://cmusphinx.sourceforge.net/</a><br
clear="none">
<br clear="none">
Another advantage of Sphinx is that it doesn't
rely on internet access to decode speech. And,
someone even wrote a simple Pd extern with
Sphinx. <br clear="none">
<a moz-do-not-send="true" href=""
class="removed-link" rel="nofollow" shape="rect"
target="_blank">https://github.com/dmedine/recog_tilde</a><br
clear="none">
<br clear="none">
And yes, it is quite difficult to train Sphinx.
Building a dictionary is copious work, and Google
and Apple have done it 1000 better than anyone
else because they have mountains of data and cash
and luxury model machine learning algorithms. . .
but no one ever said DIY was easy. <br
clear="none">
<br clear="none">
On 2/7/15 9:55 AM, Spencer Russell wrote:<br
clear="none">
</div>
<blockquote type="cite"> </blockquote>
</div>
<title></title>
<div>
<div>I saw a really interesting talk last year by <span
class="yiv1173939123highlight"
style="background-color:rgb(255, 255, 255);"><span
class="yiv1173939123colour"
style="color:rgb(31, 31, 31);">Johan
Schalkwyk, </span></span>the head of the
Google speech recognition group. One of the points
he made was that while Google's algorithms are
important, they got a lot more leverage from the
sheer amount of data they have access to. It
allows them to get away with much simpler
algorithms. I think that's one of the biggest
problems with trying to compete with Google and
Apple on speech recognition, because OSS
developers just don't have access to a huge corpus
of data. <br clear="none">
</div>
<div> </div>
<div>Even though a lot of that data is unlabeled
(they don't know what the actual words are that
correspond to the audio), they have a huge amount
of interaction data, so they can for instance look
at whether the user tried multiple times with a
particular phrase or whether the user accepted a
given transcription.<br clear="none">
</div>
<div> </div>
<div>It seems like if we want an open-source speech
recognition package we should focus on finding
ways to get an accessible shared corpus. Unless
there was some tricky licensing I think that
corpus would also benefit the big guys though, so
their corpus would remain a proper superset of
what's available to OSS developers.</div>
<div> </div>
<div> </div>
<div class="yiv1173939123yqt3359243372"
id="yiv1173939123yqt97733">
<div>On Sat, Feb 7, 2015, at 11:39 AM, Jonathan
Wilkes via Pd-list wrote:<br clear="none">
</div>
<blockquote type="cite">
<div
style="color:#000;background-color:#fff;font-family:HelveticaNeue,
Helvetica Neue, Helvetica, Arial, Lucida
Grande, sans-serif;font-size:16px;">
<div dir="ltr">Hi list,<br clear="none">
</div>
<div dir="ltr"> </div>
<div dir="ltr">Here's a fun
thought-experiment: suppose you're doing a
port of Pd, and the graphics toolkit you're
using will include functionality to hook in
to Google's speech recognition API. Such an
API could make the software accessible to
people who would otherwise find it very hard
to write Pd patches.<br clear="none">
</div>
<div dir="ltr"> </div>
<div dir="ltr">However, the API works by
shipping off your audio data to Google's
servers, doing the computation on their
machines, and sending you back the results.<br
clear="none">
</div>
<div dir="ltr"> </div>
<div dir="ltr">Do you use the API in your
port, or not?<br clear="none">
</div>
<div dir="ltr"> </div>
<div dir="ltr">I'm decidedly not going to use
that API, for what I think are obvious
security, privacy, and philosophical
reasons. But I'm curious just how obvious
the security and privacy implications are to
others here. How many people would use a
speech-patching mechanism that sends all
your speech to Google?<br clear="none">
</div>
<div dir="ltr"> </div>
<div dir="ltr">I'm also increasingly worried
by the apparent gap between the usability of
Google and Apple's products, and the
seemingly glacial pace at which _usable_
free software speech recognition is being
developed. My position won't change, but
I'm afraid it's becoming more symbolic than
practical as these insecure tools become a
natural part of most people's lives.<br
clear="none">
</div>
<div> </div>
<div dir="ltr">-Jonathan<br clear="none">
</div>
</div>
<div><u>_______________________________________________</u><br
clear="none">
</div>
<div><a moz-do-not-send="true" href=""
class="removed-link" rel="nofollow"
shape="rect"
ymailto="mailto:Pd-list@lists.iem.at"
target="_blank">Pd-list@lists.iem.at</a>
mailing list<br clear="none">
</div>
<div>UNSUBSCRIBE and account-management -> <a
moz-do-not-send="true" href=""
class="removed-link" rel="nofollow"
shape="rect" target="_blank">http://lists.puredata.info/listinfo/pd-list</a><br
clear="none">
</div>
</blockquote>
<div> </div>
</div>
<br clear="none">
<fieldset class="yiv1173939123mimeAttachmentHeader"></fieldset>
<br clear="none">
<pre>_______________________________________________
<a moz-do-not-send="true" href="" rel="nofollow" shape="rect" class="yiv1173939123moz-txt-link-abbreviated removed-link" ymailto="mailto:Pd-list@lists.iem.at" target="_blank">Pd-list@lists.iem.at</a> mailing list
UNSUBSCRIBE and account-management -> <a moz-do-not-send="true" href="" rel="nofollow" shape="rect" class="yiv1173939123moz-txt-link-freetext removed-link" target="_blank">http://lists.puredata.info/listinfo/pd-list</a>
</pre>
<br clear="none">
</div>
</div>
<br>
<div class="yqt3359243372" id="yqt67497">_______________________________________________<br
clear="none">
<a moz-do-not-send="true" href="" class="removed-link"
shape="rect" ymailto="mailto:Pd-list@lists.iem.at">Pd-list@lists.iem.at</a>
mailing list<br clear="none">
UNSUBSCRIBE and account-management -> <a
moz-do-not-send="true" href="" class="removed-link"
shape="rect" target="_blank">http://lists.puredata.info/listinfo/pd-list</a><br
clear="none">
</div>
<br>
<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>