First of all, I would take it from another angle:<div><br></div><div>&lt;this is one possible way, out of zillions&gt;</div><div>if it is speech or not. Thus if the speech recognizer has X % of recogniztion rate, you inherit that percentage. Now you heavily depend on the recognizer, some recognizers like teh default windows try to always match the input to some string, thus they are a bit of garbage in academic terms, what you need is a strong open recognizer that can tell you how % similar the sentence is to a target sentence in database. <br>


<br></div><div>Why do I suggest this angle?</div><div>- Cause&#39; I don&#39;t wanna think &quot;what is music&quot;. Speech is a language, it is defined, it easy structured. Music? Noise is music, drone is music, ambient can be non rhythmical, what about an a Capella singing? Will it be music? and all those inherited philosophical issues. Furthermore, if you need more help maybe explaining the context will aid us, because if you only care for certain &quot;music&quot; can be easier. ALSO: if you have access the audio data, you can always extract (filter) the music. </div>


<div><br></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8">&lt;/this is one possible way, out of zillions&gt;</div><div><br></div><div>best,</div><div>pedro</div><div><br></div><div><br><div class="gmail_quote">


On Mon, Feb 7, 2011 at 5:43 PM, patrick <span dir="ltr">&lt;<a href="mailto:puredata@11h11.com">puredata@11h11.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


would it be possible to detect if the incoming audio is music or speech? i guess it&#39;s very hard, but i was thinking about some methods:<br>

<br>

using some kind of frequency detection<br>

using bonk (if the tempo is stable = music)<br>

env~ (most music are compressed nowadays)<br>

training a voice (using neural network?!?)<br>

<br>

<br>