[PD] Mailing list archive search: Remove attachment.htm results?

IOhannes m zmoelnig zmoelnig at iem.at
Tue Apr 23 12:05:44 CEST 2024


On 4/23/24 08:56, Peter P. wrote:
> Hi,
> 
> The search function on https://lists.puredata.info/pipermail/pd-list//
> is great, but a large number of search results are of the form
> "/pipermail/pd-list/attachments/20160527/3e480100/attachment.html"
> and are more or less unreadable.
> 
> I would like to suggest to exclude them from the indexing or from the
> results if possible.


what you are seeing is emails that come with both HTML and plaintext.

typically, a mail-client that composes HTML mail, will include the same 
information in a plaintext part of the email. but there's not really 
anything enforcing this (if you are usually reading plaintext emails, 
you will know this "you need an HTML-capable mail client to read this" 
message)

as such, i do not think there's anything wrong with including the HTML 
parts (even if they are rendered verbatim) in the search results, as 
they might contain information otherwise unavailable.

however, what is definitely wrong is that the "sort by relevance" does 
not really do what it promises.
at least i get HTML-attachments with a relevance of 73% sorted before 
the plaintext posts with 100% relevance.

in any case, i'm afraid this is rather low priority.
in the meantime you could try filtering out the HTML attachments by 
excluding some html tag (e.g. adding ` -div ` to the search terms helps 
for me; however, this will also exclude mails that discuss the use of 
[div]...)


msfdgasdr
IOhannes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20240423/bf45685e/attachment.sig>


More information about the Pd-list mailing list