[PD] speaker recognition with pd ?

William Brent william.brent at gmail.com
Tue Sep 27 06:00:27 CEST 2011

> 2. The Mel-Frequency Cepstral Coefficient (MFCC) of the FFT (Fast Fourier
> Transform) of a waveform is a good timbral identifier. William Brent's
> TimbreID objects are good instantaneous timbre identifiers using this
> principle, but to build up a sophisticated model of a human voice
> (robust  enough for speaker ID) you need to work out how to build a
> database. For an instantaneous MFCC identifier using an internal database,
> check out Michael Casey's "soundspotter" PD external.

Aside from the different analysis objects like [mfcc~], there is an
object in the timbreID library that makes it easy to build a training
database and make comparisons on the fly.  But like Ed and others are
saying - the problem is how to interpret the stored data.  I never
dove into the voice recognition problem, but my understanding is also
that the magic is in the transitions.  timbreID will help you get all
the data you need if you can go the Markov model route.  On the other
hand, if I were going to take a stab at a simplified system based on
isolated sounds, in general I'd guess that features of pure vowels
would be more helpful in distinguishing between different speakers
than features of  "sss" sounds or consonants.

William Brent

“Great minds flock together”
Conflations: conversational idiom for the 21st century


More information about the Pd-list mailing list