[PD] Using Xapian for the Pd Search Plugin
Hans-Christoph Steiner
hans at at.or.at
Sat Jan 26 03:39:26 CET 2013
On 01/24/2013 04:59 PM, Jonathan Wilkes wrote:
> I've looked a bit at the Xapian API. Here's my preliminary route to changing
> the search plugin to use Xapian.
>
> ***Build the Index***
> * Read the file for each doc.
> regsub out all "#X foo number number" stuff since it won't help the search
> * Optional: prefix all object names with a XAPIAN prefix so that the user can search for instances of objects if
> they want. Additionally include the object names unprefixed so they count toward a score when the user isn't
> searching just for objects
This sounds quite interesting, how do you mean searching for instances of objects?
> * Prefix all the pd META stuff so that users can search by category, author, etc., and also include it unprefixed
> so that again it counts toward a general score when not searching for a particular field
> * Include the following as the document data: base directory, filename, pd META KEY/values pairs. I include the
> pd META stuff in the doc data since we want to display some of it (keywords, maybe other stuff in the future) in
> the search results.
>
> Then it's trivial to check for database existence, and only build it if it's not there. (Maybe just have the last link
> on the homepage be "Rebuild Index".)
Sounds all good.
> Now we have an index so
>
> *** Search ***
> Search. Depending on speed, I might just keep it the way it is, showing ALL results instead of the Google way of 10 per page or whatever.
>
> *** Search by Category ***
> This will be nicer than it is currently-- instead of cryptic regexp text showing up in the search bar, it will just be
> the prefixed keyword, like "Kbandlimited" or "Ksignal". That's easy enough to grasp that I don't think we'll need
> some special syntax for category searches-- newbies can just depend on the home page links. Plus, if they want to search for several categories at once they can quickly figure out it's just a matter of prefixing a
> "K" in front of the category and are way less likely to generate a tcl error as they would be screwing around inside a regexp. (I could even make a mousebinding, like <ctrl-click> will add a category to the search bar without triggering a search, so they can use that to gang several together.)
what about "category:bandlimited" The K seems arbitrary and hard to remember.
> Also, if I understand the tclxapian interface correctly, I can just hand off a tcl string to Xapian so the search-plugin can get out of tcl "quoting-hell". (Thus, much less chance of generating errors because of malformed
> lists.)
That sounds very nice too. Sounds to me like this would be a large
improvement. Once you start committing some code, I'll try to find the time
to add xapian to the Mac and Windows builds so people can start using/testing
early.
.hc
More information about the Pd-list
mailing list