<br><br><div class="gmail_quote">On Mon, Jun 11, 2012 at 10:29 PM, Andrew Faraday <span dir="ltr">&lt;<a href="mailto:jbturgid@hotmail.com" target="_blank">jbturgid@hotmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div><div dir="ltr">

I&#39;ve got an open source project using ruby to parse strings and send commands via TCP to pure data. Which started with some of my earliest non-pd coding. It&#39;s not currently set up to read text files, but it&#39;d be a fairly simple mod, so you&#39;re welcome to learn ruby and submit a patch. <div>


<span style="font-size:10pt"><br></span></div><div><span style="font-size:10pt">PDF&#39;s are a much more complicated file format, I don&#39;t know how you&#39;d go about extracting the text content from them to feed the text-to-music algorithm.</span><div>


[...]<br></div></div></div></div></blockquote><div> <br>PDF is indeed complicated, but extracting text can be as simple as a (simple) regular expression. As far as I understand, basically, everything between parentheses &quot;(&quot; and &quot;)&quot; is text (or more rigorously, everything between parentheses between the strings &quot;BT&quot; and &quot;ET&quot; between the strings &quot;obj&quot; and &quot;endobj&quot; is text, but I think it&#39;s enough to search for only the parantheses). The escape character is the backslash, and it has only a couple uses:<br>


<pre>Sequence | Meaning

---------------------------------------------

\n       | LINE FEED (0Ah) (LF)

\r       | CARRIAGE RETURN (0Dh) (CR) 

\t       | HORIZONTAL TAB (09h) (HT)

\b       | BACKSPACE (08h) (BS)

\f       | FORM FEED (FF)

\(       | LEFT PARENTHESIS (28h)

\)       | RIGHT PARENTHESIS (29h)

\\       | REVERSE SOLIDUS (5Ch) (Backslash)

\<i>ddd</i>     | Character code <i>ddd</i> (octal)</pre>Apart from security settings that may block text extraction, unfortunately, there could be compression applied - but I don&#39;t know how that works.<br><br>András<br>


</div></div>