[PD] Using Xapian for the Pd Search Plugin

Jonathan Wilkes jancsika at yahoo.com
Sat Jan 26 03:55:08 CET 2013


----- Original Message -----

> From: Hans-Christoph Steiner <hans at at.or.at>
> To: pd-list at iem.at
> Cc: 
> Sent: Friday, January 25, 2013 9:39 PM
> Subject: Re: [PD] Using Xapian for the Pd Search Plugin
> 
> 
> 
> On 01/24/2013 04:59 PM, Jonathan Wilkes wrote:
>>  I've looked a bit at the Xapian API.  Here's my preliminary route 
> to changing
>>  the search plugin to use Xapian.
>> 
>>  ***Build the Index***
>>  * Read the file for each doc.
>>  regsub out all "#X foo number number" stuff since it won't 
> help the search
>>  * Optional: prefix all object names with a XAPIAN prefix so that the user 
> can search for instances of objects if 
>>  they want.  Additionally include the object names unprefixed so they count 
> toward a score when the user isn't
>>  searching just for objects
> 
> This sounds quite interesting, how do you mean searching for instances of 
> objects?

Well, we can put "clip~" in the search terms, but we can additionally add it to the
db with a prefix (something like XOclip~) when it originated from the
document as  "#X obj 20 10 clip~".  (Basically you normalize all the document
search terms to lower case, so then upper case denotes certain fields.)

I suppose we could also make use of the numbers in "#X obj 20 10", as term with
associated lower number coordinates are closer to the top left corner and are more
prominent.

> 
> 
>>  * Prefix all the pd META stuff so that users can search by category, 
> author, etc., and also include it unprefixed
>>  so that again it counts toward a general score when not searching for a 
> particular field
>>  * Include the following as the document data: base directory, filename, pd 
> META KEY/values pairs.  I include the
>>  pd META stuff in the doc data since we want to display some of it 
> (keywords, maybe other stuff in the future) in
>>  the search results.
>> 
>>  Then it's trivial to check for database existence, and only build it if 
> it's not there.  (Maybe just have the last link
>>  on the homepage be "Rebuild Index".)
> 
> Sounds all good.
> 
> 
>>  Now we have an index so
>> 
>>  *** Search ***
>>  Search.  Depending on speed, I might just keep it the way it is, showing 
> ALL results instead of the Google way of 10 per page or whatever.
>> 
>>  *** Search by Category ***
>>  This will be nicer than it is currently-- instead of cryptic regexp text 
> showing up in the search bar, it will just be
>>  the prefixed keyword, like "Kbandlimited" or 
> "Ksignal".  That's easy enough to grasp that I don't think 
> we'll need
>>  some special syntax for category searches-- newbies can just depend on the 
> home page links.  Plus, if they want to search for several categories at once 
> they can quickly figure out it's just a matter of prefixing a 
>>  "K" in front of the category and are way less likely to generate 
> a tcl error as they would be screwing around inside a regexp.  (I could even 
> make a mousebinding, like <ctrl-click> will add a category to the search 
> bar without triggering a search, so they can use that to gang several together.)
> 
> what about "category:bandlimited"  The K seems arbitrary and hard to 
> remember.

Yeah, I'm just being lazy because the "K" prefix is how its actually stored in the
database, and the main user interface is clicking a link.  It'd basically just
be a regsub there so not too hard to use your syntax.

> 
> 
>>  Also, if I understand the tclxapian interface correctly, I can just hand 
> off a tcl string to Xapian so the search-plugin can get out of tcl 
> "quoting-hell".  (Thus, much less chance of generating errors because 
> of malformed
>>  lists.)
> 
> That sounds very nice too.  Sounds to me like this would be a large
> improvement.  Once you start committing some code, I'll try to find the time
> to add xapian to the Mac and Windows builds so people can start using/testing
> early.

Well, this is all pre-testing stage.  Hopefully there's no weird snags in all this.  But the
documentation seems pretty straightforward so far.

-Jonathan

> 
> .hc
> 
> _______________________________________________
> Pd-list at iem.at mailing list
> UNSUBSCRIBE and account-management -> 
> http://lists.puredata.info/listinfo/pd-list
> 



More information about the Pd-list mailing list