[PD] pd & text->speech

Wed Jun 12 20:50:49 CEST 2002

Greetings,

 > ----- Original Message -----
 > From: "Michal Seta" <mis at creazone.com>
 > To: <pd-list at iem.kug.ac.at>
 > Sent: Wednesday, June 12, 2002 7:45 AM
 > Subject: [PD] pd & text->speech
 > 
 > 
 > > Hello.
 > >
 > > First, I admit that I have not yet played with any of the text->speech (on
 > linux) software so I don't know which one is what.  However, I'd be
 > interested in using smething like that with(in) pd.  Has anyone done
 > anything like that?  Any ideas?  If there's anything I miss from Max is the
 > ispeak object :)

other possibilities (linux) i've looked at:

  mbrola (http://tcts.fpms.ac.be/synthesis)
    + free for non-commercial and non-military use
      (in a prior email message, mbrola author Thierry Dutoit
       indicated to me that use of mbrola in a musical
       performance would not violate the 'no commercial use'
       clause)
    + diphone synthesis; sounds quite good
    + supports english, german, and others
    - no library, so you have to use "piperead~" from
      ext13 for output
    - mbrola has to know phone length and frequency envelope
      before any sound is produced
    - i tested using netsend/netreceive and a perl
      script to wrap mbrola -- gets pretty unwieldy

  festival (http://www.cstr.ed.ac.uk/projects/festival.html)
    + GPL
    + based on siod, maybe eventual integration into pd
      would be possible via (a modified version of) Larry's
      "pd-scheme"
    + abstracts over various synthesis methods: mbrola
      output also possible
    - i'm still using piperead~, netsend/netreceive, and
      perl to wrap it
    - so far, i've only been able to figure out how to
      get festival to do "pure" text-to-speech: i can't figure
      out how to influence prosodic parameters like frequency
      or timing, although the mechanisms are certainly there
    - it eats cpu time

... if you're interested in automated generation of syntactically
correct nonsense (or pure text-to-speech), you might want
to check out my "SayWhat" package, which has a PD mode and
some basic example patches, at:

     http://www.ling.uni-potsdam.de/~moocow/projects/saywhat

... the PD support is currently just netreceive + piperead~,
but hopefully that will change soon...

On 12 June 2002 at 07:44:34, sme wrote:
 > hi
 > i'm not shure, if a single object can handle a complex process like speech
 > synthesis.
 > i rather have the idea of a modular system (which still could be realized in
 > pd) whith a kind of physical model of the anatomic speech-producing parts of
 > the body and a rather complicated interface-speech to control it and a way
 > to interprets/translates text to it.
 > sÜme.

i agree this would be better.  i started trying to adapt Nick
Ing-Simmons' "rsynth" Klatt-style synthesizer for eventual integration
into pd (via SayWhat) -- the current state of affairs is available at:

     http://www.ling.uni-potsdam.de/~moocow/projects/spsyn

... but that project has a longish way to go before the kind of
fine-grained control that i would like is available.  also, according
to Nick, there are still some unclear copyright issues with the
code from the original "rsynth-2.0":

On Wed, 22 May 2002 at 23:37:24, Nick Ing-Simmons wrote:
 > > It like most of rsynth stuff is stalled due to 
 > > ownership issues of the Klatt synthesis code. The man has died
 > > so cannot give permission ...

it would probably also be possible to build a "pure" pd
Klatt-style speech synthesizer from scratch around Yves'
"formant~" object, but i'm still waiting for the library
here to dig up their copy of the Klatt article for me ;-)

is anyone else working on tts / speech synthesis for pd?  if so,
maybe we could combine our efforts?

marmosets,
	Bryan