[PD] Using Xapian for the Pd Search Plugin

Jonathan Wilkes jancsika at yahoo.com
Thu Jan 24 22:59:42 CET 2013


I've looked a bit at the Xapian API.  Here's my preliminary route to changing
the search plugin to use Xapian.

***Build the Index***
* Read the file for each doc.
regsub out all "#X foo number number" stuff since it won't help the search
* Optional: prefix all object names with a XAPIAN prefix so that the user can search for instances of objects if 
they want.  Additionally include the object names unprefixed so they count toward a score when the user isn't
searching just for objects
* Prefix all the pd META stuff so that users can search by category, author, etc., and also include it unprefixed
so that again it counts toward a general score when not searching for a particular field
* Include the following as the document data: base directory, filename, pd META KEY/values pairs.  I include the
pd META stuff in the doc data since we want to display some of it (keywords, maybe other stuff in the future) in
the search results.

Then it's trivial to check for database existence, and only build it if it's not there.  (Maybe just have the last link
on the homepage be "Rebuild Index".)

Now we have an index so

*** Search ***
Search.  Depending on speed, I might just keep it the way it is, showing ALL results instead of the Google way of 10 per page or whatever.

*** Search by Category ***
This will be nicer than it is currently-- instead of cryptic regexp text showing up in the search bar, it will just be
the prefixed keyword, like "Kbandlimited" or "Ksignal".  That's easy enough to grasp that I don't think we'll need
some special syntax for category searches-- newbies can just depend on the home page links.  Plus, if they want to search for several categories at once they can quickly figure out it's just a matter of prefixing a 
"K" in front of the category and are way less likely to generate a tcl error as they would be screwing around inside a regexp.  (I could even make a mousebinding, like <ctrl-click> will add a category to the search bar without triggering a search, so they can use that to gang several together.)

Also, if I understand the tclxapian interface correctly, I can just hand off a tcl string to Xapian so the search-plugin can get out of tcl "quoting-hell".  (Thus, much less chance of generating errors because of malformed
lists.)

Any commments, suggestions?

-Jonathan




More information about the Pd-list mailing list