[PD] once again a timbreID question
william.brent at gmail.com
Wed Oct 5 16:10:00 CEST 2022
Yes I pulled the plug on my webpage a while back and made a GitHub repo for
timbreID: https://github.com/wbrent/timbreIDLib.git. The README has a link
to the latest version of the examples package. Note that the library is now
called timbreIDLib to distinguish the library itself from the audio feature
database object within it, [timbreID].
The version of the examples package linked in the mirror above is pretty
old - the latest version has additional examples for audio segmentation and
key estimation. It also has some significant updates to the timbre space
patch, including a grain sequencing function that's pretty fun to play
with. You can see a quick demo video of that here:
Simon, for your specific project, you'll have lots of options for
extracting audio features from the 5-second audio clips. You can store the
feature vector of each clip in [timbreID], and it's no problem that there
will be thousands of instances. Finding the best match in the existing
database relative to a new clip's feature vector will still be very quick,
and there's a relatively new feature for [timbreID] that lets you get the K
best matches in order of similarity (not just the single best match). So
you can definitely get a list of the best matching files.
The hardest part is coming up with the ideal set of features to extract and
a strategy for dealing with the way they change over time. A basic starting
point would be to extract multi-frame Bark spectra or BFCCs. The
07-timbre-ordering/order-perc.pd example extracts multi-frame features from
pre-recorded audio. A key object is [featureAccum], which can concatenate
incoming single-frame feature vectors to produce a long multi-frame vector
(it can also sum or average them). In your case, since all clips are
precisely 5 seconds long, the multi-frame BFCC vectors will all be the same
length, so a similarity calculation is possible without any further work.
But...under that model, for two sounds to be "similar," the way their
features change over time has to align very tightly. So you might have one
participant shake some maracas into the microphone, and another participant
shake the exact same maracas at a different tempo, and you'll get a low
similarity between the two recordings because the spectro-temporal pattern
is so different. That might be ok in some applications, but it also might
not since on an intuitive level the sounds are obviously very similar.
There are lots of strategies you could try in order to get the patch better
at recognizing the exact kind of "similarity" that you're looking for, and
the more you know in advance about the kind of audio you'll be comparing,
the more you can customize your feature vector. We can keep chatting
off-list about options if you like. I hope that helps to clarify some
things in the meantime!
On Sun, Oct 2, 2022 at 4:37 PM Peter P. <peterparker at fastmail.com> wrote:
> * Simon Iten <itensimon at gmail.com> [2022-10-02 22:22]:
> > hi william and list,
> > for a museum-art-installation i will do a kind of audio "social-network"
> > basically a visitor can record a 5 second snippet via microphone or
> bluetooth and PD saves this snippet as a sample.
> > out of this ever growing sample-space 8 readsf~ objects will randomly
> play these samples in various densities to 8 speakers.
> > so far so easy. (i have implemented that part already)
> > to get a more social network kind of atmosphere it would be great if
> newly recorded snippets would increase the likelihood of similar material
> on the outputs (as in twitter/facebook/instragram blabla, where you find
> yourself in your bubble)
> > i still don’t really get howto work with [timbreID] to accomplish this.
> > maybe someone on the list has an example of this?
> > the process would be:
> > -a new sample is recorded (always 5 seconds) -> some [timbreID] analysis
> happens to create a feature-list -> samples with a similar feature-list
> should be played back next (a list of files that are similar would be great)
> Well, each snippet's feature vector would be added as a new database
> entry to the leftmost inlet of [timbreID], no? The you'd ask for the
> nearest entry at its second inlet.
> > the number of samples can easily grow to thousand of files, since the
> installation will run for quite some time. each sample is only 441kb though
> (mono 5 seconds file, according to OSX)
> > the examples of timbreID i look at either look at a fixed soundfile and
> slice it to extract features over time, or slice incoming audio based on
> onset detection for example.
> > i would just want a feature-list created for each new 5 second clip i
> Yes, your example is not so different from what you see in the
> help-patches of that great library.
> Have you checked out William's examples, which I can't locate in
> original on his web page (seems down) but which have been mirrored here?
> Pd-list at lists.iem.at mailing list
> UNSUBSCRIBE and account-management ->
“Great minds flock together”
Conflations: conversational idiom for the 21st century
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pd-list