[PD] http, html and textfiles
fbar at footils.org
Mon Apr 28 19:13:04 CEST 2008
wolfgang schwarzenbrunner hat gesagt: // wolfgang schwarzenbrunner wrote:
> i am working on a little project in which websites are going to be
> parsed. well. i thought this might be a nice thing using the regex
> object from zexy... the only problem i am facing right now is that i
> have no idea how i could get a html file on my harddisk using pd
> (something like a http browsing object)...
> any suggestions?
Yep: Don't use Pd for text processing.
Pd is good at many things, but it's not good at parsing and modifying
larger amounts of text. AFAIK there still is no garbage collection for
unused symbols (Pd's "strings"), it's overcomplicated to deal with
certain characters (backslashes, spaces, commas, ...) when they should
not be interpreted by Pd etc.
What I would recommend is to do your text processing in a different
language. Many (scripting) languages that are great with text can be
used inside of Pd: Lua, Python, Java, Scheme, etc. Most of these also
include or can be extended easily with nice web browsing tools (CURL,
Socket, system("wget") ...). In the end you can do both the browsing and
all processing in one place and then only need to feed the results over
to Pd in a format, Pd can handle with more elegance than it can handle
large amounts of text.
Of course it depends a bit on how complex your project is, so you may
get away with pure Pd as well, but IMO it's a better use of Pd to
externalize the text processing to a language better suited.
More information about the Pd-list