[PD] Using Xapian for the Pd Search Plugin
Jonathan Wilkes
jancsika at yahoo.com
Sat Jan 26 03:55:08 CET 2013
----- Original Message -----
> From: Hans-Christoph Steiner <hans at at.or.at>
> To: pd-list at iem.at
> Cc:
> Sent: Friday, January 25, 2013 9:39 PM
> Subject: Re: [PD] Using Xapian for the Pd Search Plugin
>
>
>
> On 01/24/2013 04:59 PM, Jonathan Wilkes wrote:
>> I've looked a bit at the Xapian API. Here's my preliminary route
> to changing
>> the search plugin to use Xapian.
>>
>> ***Build the Index***
>> * Read the file for each doc.
>> regsub out all "#X foo number number" stuff since it won't
> help the search
>> * Optional: prefix all object names with a XAPIAN prefix so that the user
> can search for instances of objects if
>> they want. Additionally include the object names unprefixed so they count
> toward a score when the user isn't
>> searching just for objects
>
> This sounds quite interesting, how do you mean searching for instances of
> objects?
Well, we can put "clip~" in the search terms, but we can additionally add it to the
db with a prefix (something like XOclip~) when it originated from the
document as "#X obj 20 10 clip~". (Basically you normalize all the document
search terms to lower case, so then upper case denotes certain fields.)
I suppose we could also make use of the numbers in "#X obj 20 10", as term with
associated lower number coordinates are closer to the top left corner and are more
prominent.
>
>
>> * Prefix all the pd META stuff so that users can search by category,
> author, etc., and also include it unprefixed
>> so that again it counts toward a general score when not searching for a
> particular field
>> * Include the following as the document data: base directory, filename, pd
> META KEY/values pairs. I include the
>> pd META stuff in the doc data since we want to display some of it
> (keywords, maybe other stuff in the future) in
>> the search results.
>>
>> Then it's trivial to check for database existence, and only build it if
> it's not there. (Maybe just have the last link
>> on the homepage be "Rebuild Index".)
>
> Sounds all good.
>
>
>> Now we have an index so
>>
>> *** Search ***
>> Search. Depending on speed, I might just keep it the way it is, showing
> ALL results instead of the Google way of 10 per page or whatever.
>>
>> *** Search by Category ***
>> This will be nicer than it is currently-- instead of cryptic regexp text
> showing up in the search bar, it will just be
>> the prefixed keyword, like "Kbandlimited" or
> "Ksignal". That's easy enough to grasp that I don't think
> we'll need
>> some special syntax for category searches-- newbies can just depend on the
> home page links. Plus, if they want to search for several categories at once
> they can quickly figure out it's just a matter of prefixing a
>> "K" in front of the category and are way less likely to generate
> a tcl error as they would be screwing around inside a regexp. (I could even
> make a mousebinding, like <ctrl-click> will add a category to the search
> bar without triggering a search, so they can use that to gang several together.)
>
> what about "category:bandlimited" The K seems arbitrary and hard to
> remember.
Yeah, I'm just being lazy because the "K" prefix is how its actually stored in the
database, and the main user interface is clicking a link. It'd basically just
be a regsub there so not too hard to use your syntax.
>
>
>> Also, if I understand the tclxapian interface correctly, I can just hand
> off a tcl string to Xapian so the search-plugin can get out of tcl
> "quoting-hell". (Thus, much less chance of generating errors because
> of malformed
>> lists.)
>
> That sounds very nice too. Sounds to me like this would be a large
> improvement. Once you start committing some code, I'll try to find the time
> to add xapian to the Mac and Windows builds so people can start using/testing
> early.
Well, this is all pre-testing stage. Hopefully there's no weird snags in all this. But the
documentation seems pretty straightforward so far.
-Jonathan
>
> .hc
>
> _______________________________________________
> Pd-list at iem.at mailing list
> UNSUBSCRIBE and account-management ->
> http://lists.puredata.info/listinfo/pd-list
>
More information about the Pd-list
mailing list