[PD] Using Xapian for the Pd Search Plugin

Hans-Christoph Steiner hans at at.or.at
Sat Jan 26 03:39:26 CET 2013



On 01/24/2013 04:59 PM, Jonathan Wilkes wrote:
> I've looked a bit at the Xapian API.  Here's my preliminary route to changing
> the search plugin to use Xapian.
> 
> ***Build the Index***
> * Read the file for each doc.
> regsub out all "#X foo number number" stuff since it won't help the search
> * Optional: prefix all object names with a XAPIAN prefix so that the user can search for instances of objects if 
> they want.  Additionally include the object names unprefixed so they count toward a score when the user isn't
> searching just for objects

This sounds quite interesting, how do you mean searching for instances of objects?


> * Prefix all the pd META stuff so that users can search by category, author, etc., and also include it unprefixed
> so that again it counts toward a general score when not searching for a particular field
> * Include the following as the document data: base directory, filename, pd META KEY/values pairs.  I include the
> pd META stuff in the doc data since we want to display some of it (keywords, maybe other stuff in the future) in
> the search results.
> 
> Then it's trivial to check for database existence, and only build it if it's not there.  (Maybe just have the last link
> on the homepage be "Rebuild Index".)

Sounds all good.


> Now we have an index so
> 
> *** Search ***
> Search.  Depending on speed, I might just keep it the way it is, showing ALL results instead of the Google way of 10 per page or whatever.
> 
> *** Search by Category ***
> This will be nicer than it is currently-- instead of cryptic regexp text showing up in the search bar, it will just be
> the prefixed keyword, like "Kbandlimited" or "Ksignal".  That's easy enough to grasp that I don't think we'll need
> some special syntax for category searches-- newbies can just depend on the home page links.  Plus, if they want to search for several categories at once they can quickly figure out it's just a matter of prefixing a 
> "K" in front of the category and are way less likely to generate a tcl error as they would be screwing around inside a regexp.  (I could even make a mousebinding, like <ctrl-click> will add a category to the search bar without triggering a search, so they can use that to gang several together.)

what about "category:bandlimited"  The K seems arbitrary and hard to remember.


> Also, if I understand the tclxapian interface correctly, I can just hand off a tcl string to Xapian so the search-plugin can get out of tcl "quoting-hell".  (Thus, much less chance of generating errors because of malformed
> lists.)

That sounds very nice too.  Sounds to me like this would be a large
improvement.  Once you start committing some code, I'll try to find the time
to add xapian to the Mac and Windows builds so people can start using/testing
early.

.hc



More information about the Pd-list mailing list