[PD] matching queries with text in [text search]

Thu Dec 1 13:25:55 CET 2016

When [text search] searches through a text file, it seems to evaluate the query against a line of text term by term. If one of the query terms is found to mismatch the corresponding term in the text line, it comes out as negative.

I am looking for a search algorithm that does the opposite. I want it to match the searchable text against the query term by term, and reject it the moment it finds a mismatch. This would mean that extra terms in the query are allowed to pass, while queries that are shorter than the text are rejected, even if they match.

This should be clearer with examples:

-- Searching "cats 1 2 3" into "cats" returns negative. I want it to return positive.

--Searching "dogs" into "dogs a b c" returns positive. I want it to return negative.

At first glance, it seems like you can achieve this using [text search]'s numbered arguments. For example, [text search my-text 0] will return positive for the "cats" example, because it is only searching the first term. But I don't think that this can lead to a general solution, as the number of searchable terms is very inflexible. For instance, if the same text file contained the line "giraffe 86", then I would want the query "giraffe" to come up negative, because "giraffe 86" has not been met. But if you're using the 0 argument, it will come up positive, because it's only searching the first term.

I've made an abstraction that solves this problem by dividing the text file according to line length, and then making multiple passes with the query. It works well enough, but I'm wondering if there's a more efficient way. In the context I need it, it will be used fast and frequently, so I want to make sure that it works as quickly as possible.

I'm attaching the abstraction, for anyone who wants to take a look. Can you think of a better way of making this search, perhaps using the numbered arguments after all, or using lists instead of [text]? I assume that lists will be much slower, but perhaps there's a way.

If what I've written here is unclear, please take a look at the examples in the attachment--they should straighten things up.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20161201/db550aff/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: msearch.zip
Type: application/zip
Size: 3744 bytes
Desc: msearch.zip
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20161201/db550aff/attachment.zip>