[PD] Fastest way to find lines in text file

cyrille henry ch at chnry.net
Wed Mar 22 16:46:20 CET 2017


if you textfile is composed of 2 row of number you can optimize the search with prior treatment.

1 : order the index column (already done in your example)
2 : create 2 table of start index, and number of occurrence of this index
in you example, the "start index table" would be 0 at 345594, 5 at 345595, 15 at 345596, 16 at 345598
the "number of occurrence index table" would be : 5 at 345594, 10 at 345595, 1 at 345596, 4 at 345598
3 : put column 2 of you textfile in a "data table"

now, when searching for 345595, you just have to [tabread table1] and [tabread table2] at position 345595, and with a small until loop you just have to read the data table only where needed.

cheers
c

Le 22/03/2017 à 14:34, Jack a écrit :
> I guess my 2 precedent mails were enough clear.
> But i will answer at each point :
>
> 1) My previous mails :
> I need to find every lines of a textfile containing a word.
> The textfile has 2.539.592 lines.
> Now, i am using [msgfile] from zexy because i can find a line, skip a
> line and find again ... until the end of the textfile.
> But, i am wondering if there is an other object (in an other library)
> faster, specialized in this work ?
> ...
> The textfile has only two "strings" by line.
> Here, 20 lines of the textfile :
>
> 345594 577427
> 345594 567267
> 345594 528911
> 345594 534435
> 345594 523087
> 345595 374384
> 345595 377303
> 345595 380544
> 345595 379911
> 345595 557020
> 345595 552396
> 345595 562487
> 345595 460842
> 345595 428449
> 345595 424095
> 345596 447676
> 345598 579883
> 345598 379495
> 345598 379039
> 345598 380328
>
> 2) See above
> 3) See above
> 4) See above
> 5) Linux/Ubuntu 16.10/Pd 0.47.1
> 6) you abuse :)
>
> ++
>
> Jack
>
>
>
>
> Le 22/03/2017 à 13:31, Lorenzo Sutton a écrit :
>> Hi,
>>
>> On 22/03/2017 13:01, Jack wrote:
>>> I need to find all instances that math to the first row.
>>> It is not possible with [text search] if i am right.
>>
>> I think you should outline your use case/problem in more detail. This
>> should be a good practice when asking for support on the Mailing List.
>>
>> Example:
>>
>> 1) I have a text file where each line contains a two integers separated
>> by a space (" ") char - such as (possibly paste a part of the file on
>> pastebin or similar too).
>> 213214 12313
>> 123223 13213
>>
>> 2) My file is [always/at least/circa/ ...] 2,539,592 lines long
>>
>> 3) My algorithm should find all subsequent lines matching the first line
>> in the file and return [all line numbers for matches / the total count
>> of matched lines / ...]
>>
>> 3) I want the algorithm to be [as fast as possible / run in under 1
>> second / run in under 1ms / ... ]
>>
>> 4) I [want to / do not need to] use Pd Vanilla
>>
>> 5) My patch should run on [All platforms / Windows / OSX / Linux / ...]
>>
>> 6) My patch should run [on potentially any machine / on a Raspberry Pi /
>> on a 1990s 386 machine / on my digital toaster where I have compiled a
>> custom version of Pd / ... ]
>>
>> :)
>>
>>
>>> ++
>>>
>>> Jack
>>>
>>>
>>>
>>> Le 22/03/2017 à 08:27, Liam Goodacre a écrit :
>>>> You can also use [text search], although t's not so easy to find more
>>>> than the first instance. If you don't mind taking a extra step, you
>>>> could give each line a third term, which is the line number. Then you
>>>> can use the "> 3" argument for [text search] to find matches s
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* Pd-list <pd-list-bounces at lists.iem.at> on behalf of Jack
>>>> <jack at rybn.org>
>>>> *Sent:* 21 March 2017 18:14
>>>> *To:* pd-list at lists.iem.at
>>>> *Subject:* [PD] Fastest way to find lines in text file
>>>>
>>>> Hello,
>>>>
>>>> I need to find every lines of a textfile containing a word.
>>>> The textfile has 2.539.592 lines.
>>>> Now, i am using [msgfile] from zexy because i can find a line, skip a
>>>> line and find again ... until the end of the textfile.
>>>> But, i am wondering if there is an other object (in an other library)
>>>> faster, specialized in this work ?
>>>> Thanx.
>>>> ++
>>>>
>>>> Jack
>>>>
>>>>
>>>> _______________________________________________
>>>> Pd-list at lists.iem.at mailing list
>>>> UNSUBSCRIBE and account-management ->
>>>> https://lists.puredata.info/listinfo/pd-list
>>>>
>>>>
>>>> _______________________________________________
>>>> Pd-list at lists.iem.at mailing list
>>>> UNSUBSCRIBE and account-management ->
>>>> https://lists.puredata.info/listinfo/pd-list
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pd-list at lists.iem.at mailing list
>>> UNSUBSCRIBE and account-management ->
>>> https://lists.puredata.info/listinfo/pd-list
>>>
>>
>> _______________________________________________
>> Pd-list at lists.iem.at mailing list
>> UNSUBSCRIBE and account-management ->
>> https://lists.puredata.info/listinfo/pd-list
>
>
> _______________________________________________
> Pd-list at lists.iem.at mailing list
> UNSUBSCRIBE and account-management -> https://lists.puredata.info/listinfo/pd-list
>



More information about the Pd-list mailing list