[PD] suggestions for spectral "weight" anaylsis

William Brent william.brent at gmail.com
Wed Jan 22 14:36:49 CET 2014

Hi - try the dB threshold setting to leave out quiet grains. With
normalization on, that can give you bad results because low level noise
ends up being boosted, making the grain appear "bright". Another problem is
that a fixed window size will always get you audio slices that have mixed
content - with a window size of 93ms, it's pure luck if you get a speech
grain that only contains "aaa". It'll likely have a little "sss" or
something else in it too. So the solution there is to try to parse the
audio based on content, not size. And using single value features (like
brightness), BFCCs, or Bark spectrum, you're fine to compare grains of
various sizes - the length of features like BFCCs/Bark spectrum depends on
the size of the Bark filterbank, not the window size.

About phasing...the worst case is that you end up overlaying the same grain
with itself, but offset a bit. Don't know if that's an issue in your case.
There is the "stutter_protect" option in timbreID to avoid repeated grains
for concatenative synth, but I think you're just doing ordering? Anyway,
it's a major problem with this technique, because the goal is to spit out
sequences of timbrally similar grains. If things are working properly, that
will always produce sequences of grains that have similar waveforms, and if
you overlap similar waveforms, you get phasing/filtering! To be honest,
I've only ever done two things: 1) add low levels of good reverb to smooth
it out. 2) add controls for randomizing the grain size within reasonable
bounds (like 90% - 110% of the window size). That at least varies the
otherwise steady windowing artifacts, which can turn into a very audible
flutter if your output overlap is high.

One other thing to be aware of: if you do the analysis with a high overlap
factor, you're more likely to overlap identical content. So an overlap of 1
is best from that perspective, but then you lose time resolution and end up
with fewer grains. Or, taking things back to the first point above: if you
have good segmentation based on content and not a fixed window size, you're
less likely to have extremely similar waveforms overlap and cause phasing.
With a reasonably small audio source (like your Lucier content), you could
do this manually with labels in Audacity. Takes a lot of patience but I bet
it would make a noticeable difference.

On Sun, Jan 19, 2014 at 2:07 PM, João Pais <jmmmpais at googlemail.com> wrote:

> There are separate versions of each analysis object: one for real time,
> and one for NRT reading straight out of tables. You'll see separate help
> files for [barkSpec~] and [barkSpec], for instance. So an [until] loop
> scanning your pre-recorded audio will be the fastest way for you to work on
> this. That's what's used in the 06/order.pd example. Just look in the [pd
> analysis] sub patch and you can change the feature from barkSpec to
> whatever you like (or whatever combination of features, weighted however).
> I'd recommend putting your audio into the timbre-space patch and plotting
> by different features there. That way, you can see how the
> vowels/consonants fall on different axes when using certain features.
> That'll give you some intuition on picking the best feature or combo of
> features.
> Last - ordering by timbre is always going to be fuzzy unless you can find
> a one-dimensional feature that reflects the timbre aspect you're after.
> Ordering by multi-dimensional features, you might make a big jump along one
> dimension for one step in your ordering, and then a big jump along a
> different dimension for the next step. You never know how much one
> particular feature is contributing the choice of the next step in the
> ordering. In terms of keeping it relatively intuitive to work with, fewer
> dimensions is better. For speech, I'd recommend trying [specBrightness]
> only, with a boundary frequency of about 2.5kHz. That'll separate the
> high-frequency consonants from the more formanty low-mid vowels. You should
> get a decent continuum with just that one feature.
> Hi,
> I don't have much time to be working on this, so I ended up adapting your
> timbre-space patch, and using the Brigthness (with 2.5KHz) in both x and y
> dimensions. This plots a straight line from vowels to sibilants, although
> the result isn't 100% straight. E.g. some sounds (or silence) that belong
> to an already existing group appear later inside other groups. But in
> general it works.
> A provisory result can be heard in
> https://soundcloud.com/experimental-music/i-am-splitting-in-a-room-v2 -
> it's part of Nicollas Collins' seminar on experimental music here in Berlin.
> As soon as I can I'll try to finish my analysis of your timbre-space
> patch, and improve the results. Or, if possible, even redo the patch myself.
> Another detail, do you have any suggestion on how to use your granulator
> and not get the typical phasing effects? I changed the envelope to a vline~
> with a [0 0 0, 1 50 0, 1 50 50, 0 50 100( message. It helps, but just
> because there aren't almost any continuous sounds.
> Best,
> João

William Brent

“Great minds flock together”
Conflations: conversational idiom for the 21st century

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20140122/69c78f91/attachment.htm>

More information about the Pd-list mailing list