<br><br><div class="gmail_quote">On Mon, Jun 11, 2012 at 10:29 PM, Andrew Faraday <span dir="ltr"><<a href="mailto:jbturgid@hotmail.com" target="_blank">jbturgid@hotmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div dir="ltr">
I've got an open source project using ruby to parse strings and send commands via TCP to pure data. Which started with some of my earliest non-pd coding. It's not currently set up to read text files, but it'd be a fairly simple mod, so you're welcome to learn ruby and submit a patch. <div>
<span style="font-size:10pt"><br></span></div><div><span style="font-size:10pt">PDF's are a much more complicated file format, I don't know how you'd go about extracting the text content from them to feed the text-to-music algorithm.</span><div>
[...]<br></div></div></div></div></blockquote><div> <br>PDF is indeed complicated, but extracting text can be as simple as a (simple) regular expression. As far as I understand, basically, everything between parentheses "(" and ")" is text (or more rigorously, everything between parentheses between the strings "BT" and "ET" between the strings "obj" and "endobj" is text, but I think it's enough to search for only the parantheses). The escape character is the backslash, and it has only a couple uses:<br>
<pre>Sequence | Meaning
---------------------------------------------
\n | LINE FEED (0Ah) (LF)
\r | CARRIAGE RETURN (0Dh) (CR)
\t | HORIZONTAL TAB (09h) (HT)
\b | BACKSPACE (08h) (BS)
\f | FORM FEED (FF)
\( | LEFT PARENTHESIS (28h)
\) | RIGHT PARENTHESIS (29h)
\\ | REVERSE SOLIDUS (5Ch) (Backslash)
\<i>ddd</i> | Character code <i>ddd</i> (octal)</pre>Apart from security settings that may block text extraction, unfortunately, there could be compression applied - but I don't know how that works.<br><br>András<br>
</div></div>