[PD] text to sound

András Murányi muranyia at gmail.com
Tue Jun 12 01:45:37 CEST 2012


On Mon, Jun 11, 2012 at 10:29 PM, Andrew Faraday <jbturgid at hotmail.com>wrote:

>  I've got an open source project using ruby to parse strings and send
> commands via TCP to pure data. Which started with some of my earliest
> non-pd coding. It's not currently set up to read text files, but it'd be a
> fairly simple mod, so you're welcome to learn ruby and submit a patch.
>
> PDF's are a much more complicated file format, I don't know how you'd go
> about extracting the text content from them to feed the text-to-music
> algorithm.
> [...]
>

PDF is indeed complicated, but extracting text can be as simple as a
(simple) regular expression. As far as I understand, basically, everything
between parentheses "(" and ")" is text (or more rigorously, everything
between parentheses between the strings "BT" and "ET" between the strings
"obj" and "endobj" is text, but I think it's enough to search for only the
parantheses). The escape character is the backslash, and it has only a
couple uses:

Sequence | Meaning
---------------------------------------------
\n       | LINE FEED (0Ah) (LF)
\r       | CARRIAGE RETURN (0Dh) (CR)
\t       | HORIZONTAL TAB (09h) (HT)
\b       | BACKSPACE (08h) (BS)
\f       | FORM FEED (FF)
\(       | LEFT PARENTHESIS (28h)
\)       | RIGHT PARENTHESIS (29h)
\\       | REVERSE SOLIDUS (5Ch) (Backslash)
\*ddd*     | Character code *ddd* (octal)

Apart from security settings that may block text extraction, unfortunately,
there could be compression applied - but I don't know how that works.

András
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20120612/a522e830/attachment.htm>


More information about the Pd-list mailing list