[PD] detect if adc~ is music or speech

Tedb0t lists at liminastudio.com
Tue Feb 8 05:22:21 CET 2011


Yeah.  Trying to define music is essentially pointless.  Defining speech would be easier.

However, since you mentioned neural nets, you could try to train a net on speech and music (there's an ANN object out there that I've tested) and see what happens.  That would be a fun experiment, no idea how well it would work...  you'd need a pretty huge training set for it to be even remotely (I would think).

——t3db0t

On Feb 7, 2011, at 1:38 PM, Pedro Lopes wrote:

> First of all, I would take it from another angle:
> 
> <this is one possible way, out of zillions>
> if it is speech or not. Thus if the speech recognizer has X % of recogniztion rate, you inherit that percentage. Now you heavily depend on the recognizer, some recognizers like teh default windows try to always match the input to some string, thus they are a bit of garbage in academic terms, what you need is a strong open recognizer that can tell you how % similar the sentence is to a target sentence in database. 
> 
> Why do I suggest this angle?
> - Cause' I don't wanna think "what is music". Speech is a language, it is defined, it easy structured. Music? Noise is music, drone is music, ambient can be non rhythmical, what about an a Capella singing? Will it be music? and all those inherited philosophical issues. Furthermore, if you need more help maybe explaining the context will aid us, because if you only care for certain "music" can be easier. ALSO: if you have access the audio data, you can always extract (filter) the music. 
> 
> </this is one possible way, out of zillions>
> 
> best,
> pedro
> 
> 
> On Mon, Feb 7, 2011 at 5:43 PM, patrick <puredata at 11h11.com> wrote:
> would it be possible to detect if the incoming audio is music or speech? i guess it's very hard, but i was thinking about some methods:
> 
> using some kind of frequency detection
> using bonk (if the tempo is stable = music)
> env~ (most music are compressed nowadays)
> training a voice (using neural network?!?)
> 
> 
> From the author of aubio:
> Use a few low level features, such as energy of low and high frequencies bands, spectral spread. In a second step, these approaches are often refined using machine learning techniques bayesian networks or support vector machines.
> 
> See for instance these papers:
> http://cobweb.ecn.purdue.edu/~malcolm/interval/1996-085/
> http://www.aclweb.org/anthology/O/O08/O08-1015.pdf
> http://www.hindawi.com/journals/asp/2009/628570.html
> 
> i would like to achieve > 90% of accuracy if possible. any suggestions are welcome!
> 
> _______________________________________________
> Pd-list at iem.at mailing list
> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
> 
> 
> 
> -- 
> Pedro Lopes (MSc)
> contact: pedro.lopes at ist.utl.pt
> website: http://web.ist.utl.pt/Pedro.Lopes / http://pedrolopesresearch.wordpress.com/ | http://twitter.com/plopesresearch
> _______________________________________________
> Pd-list at iem.at mailing list
> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20110207/ce399030/attachment.htm>


More information about the Pd-list mailing list