[PD] text to sound
András Murányi
muranyia at gmail.com
Tue Jun 12 01:45:37 CEST 2012
On Mon, Jun 11, 2012 at 10:29 PM, Andrew Faraday <jbturgid at hotmail.com>wrote:
> I've got an open source project using ruby to parse strings and send
> commands via TCP to pure data. Which started with some of my earliest
> non-pd coding. It's not currently set up to read text files, but it'd be a
> fairly simple mod, so you're welcome to learn ruby and submit a patch.
>
> PDF's are a much more complicated file format, I don't know how you'd go
> about extracting the text content from them to feed the text-to-music
> algorithm.
> [...]
>
PDF is indeed complicated, but extracting text can be as simple as a
(simple) regular expression. As far as I understand, basically, everything
between parentheses "(" and ")" is text (or more rigorously, everything
between parentheses between the strings "BT" and "ET" between the strings
"obj" and "endobj" is text, but I think it's enough to search for only the
parantheses). The escape character is the backslash, and it has only a
couple uses:
Sequence | Meaning
---------------------------------------------
\n | LINE FEED (0Ah) (LF)
\r | CARRIAGE RETURN (0Dh) (CR)
\t | HORIZONTAL TAB (09h) (HT)
\b | BACKSPACE (08h) (BS)
\f | FORM FEED (FF)
\( | LEFT PARENTHESIS (28h)
\) | RIGHT PARENTHESIS (29h)
\\ | REVERSE SOLIDUS (5Ch) (Backslash)
\*ddd* | Character code *ddd* (octal)
Apart from security settings that may block text extraction, unfortunately,
there could be compression applied - but I don't know how that works.
András
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20120612/a522e830/attachment.htm>
More information about the Pd-list
mailing list