[PD] speaker recognition with pd ?

Charles Henry czhenry at gmail.com
Thu Sep 22 22:13:12 CEST 2011


On Thu, Sep 22, 2011 at 12:42 PM,  <gnd at itchybit.org> wrote:

> The task would be to identify from a live-talk the voice of the current
> speaker amongst several. Training before is also possible .. i guess this
> could be done for sure by utilizing a simple neural network trained on a
> FFT docemposition of the voices..  so there must be some software out for
> sure...

Something tells me a fft+neural network would be really bad at this.
Seriously, that sounds like a doomed project if you tried.  These
things would be huge:
1.  fft size (for resolution)
2.  network size (based on the fft size)
3.  training set (lots of variance in the speaker is possible)

How about autocovariance and dot-product?

Ahead of time, create an array containing normalized autocovariance
(an autocorrelation) of the speaker's voice.

Compute a running autocovariance of the sound.  Decompose it into the
portion of the sound matching the autocovariance of the speaker and
compare it with the part not matching the speaker (via dot-product, or
projection operators)

That would be ~less~ expensive and time consuming than neural
networks, but I'd give it not much chance of success either.  Probably
it would match quite a few different people all the same.

Chuck



More information about the Pd-list mailing list