[PD] search engine with xapian backend

Jonathan Wilkes jancsika at yahoo.com
Mon Sep 30 21:59:59 CEST 2013


Here's a quick demo of some nice changes:
https://puredata.info/Members/jancsika/search-plugin-with-xapian.webm/view

Sorry about the size of the file-- I can remove some of the old demo builds if it's
a problem.

Updates:
* all metadata fields are searchable using Xapian's field:value syntax.  So author:puckette and
even outlet_0:pointer can be used by themselves or with free text to refine a search
* Want to return all patches that contain an instance of sigmund~?  Search for object:sigmund~.
Works on exact text without stemming-- e.g., object:clip~ will give different results than object:clip
* hand-crafted some descriptive text for all pdf manuals in pd svn.  Includes gem manual and others.
* formatted escaped commas correctly
* added a firefox-style find menu bound to <ctrl-f>
* reduced index-build time and database size (both cut roughly in half)
* simplified doc search to exclude duplicates (for example, from having extra and extra/Gem in
the path)
* prettified the "info" icon
* use html <title>s for description in search results
* parse Gem docs for description and keywords
* allow to cancel index building
* use libdir libname/object prefix only for libdir results
* put name of libdir in description of all readmes and license.txt files
* reorganized and simplified the homepage topics
* reorganized code and removed some global variable (still ugly, but not
as ugly as it used to be)
* saved document data to the database as FUDI messages. (Easy to parse
if someone wants to make a [docsearch] object...)

Next I'm going to work on integrating it into Pd-l2ork, and maybe
break out the combobox into toggle buttons.

Of course if anyone is an information retrieval specialist feel free
to make suggestions.  I'm using a bunch of old docs that aren't
updated with description info, which is why so many of them have
the ugly description note.  Most of the new docs have pd meta info.
Also, I'm mixing some Pd vanilla and l2ork paths which is why some
docs show up twice.  (You can see the full path in the status bar at
the bottom.)

Best,
Jonathan



More information about the Pd-list mailing list