[PD] Fastest way to find lines in text file

Wed Mar 22 17:01:37 CET 2017

Good Idea !
Just need to order the textfile (In fact, the file is not totally
ordered) ;)
Thanx.
Speaking on this topic, give me a new idea on the good method to adopt. :)
++

Jack


Le 22/03/2017 à 16:46, cyrille henry a écrit :
> if you textfile is composed of 2 row of number you can optimize the
> search with prior treatment.
> 
> 1 : order the index column (already done in your example)
> 2 : create 2 table of start index, and number of occurrence of this index
> in you example, the "start index table" would be 0 at 345594, 5 at
> 345595, 15 at 345596, 16 at 345598
> the "number of occurrence index table" would be : 5 at 345594, 10 at
> 345595, 1 at 345596, 4 at 345598
> 3 : put column 2 of you textfile in a "data table"
> 
> now, when searching for 345595, you just have to [tabread table1] and
> [tabread table2] at position 345595, and with a small until loop you
> just have to read the data table only where needed.
> 
> cheers
> c
> 
> Le 22/03/2017 à 14:34, Jack a écrit :
>> I guess my 2 precedent mails were enough clear.
>> But i will answer at each point :
>>
>> 1) My previous mails :
>> I need to find every lines of a textfile containing a word.
>> The textfile has 2.539.592 lines.
>> Now, i am using [msgfile] from zexy because i can find a line, skip a
>> line and find again ... until the end of the textfile.
>> But, i am wondering if there is an other object (in an other library)
>> faster, specialized in this work ?
>> ...
>> The textfile has only two "strings" by line.
>> Here, 20 lines of the textfile :
>>
>> 345594 577427
>> 345594 567267
>> 345594 528911
>> 345594 534435
>> 345594 523087
>> 345595 374384
>> 345595 377303
>> 345595 380544
>> 345595 379911
>> 345595 557020
>> 345595 552396
>> 345595 562487
>> 345595 460842
>> 345595 428449
>> 345595 424095
>> 345596 447676
>> 345598 579883
>> 345598 379495
>> 345598 379039
>> 345598 380328
>>
>> 2) See above
>> 3) See above
>> 4) See above
>> 5) Linux/Ubuntu 16.10/Pd 0.47.1
>> 6) you abuse :)
>>
>> ++
>>
>> Jack
>>
>>
>>
>>
>> Le 22/03/2017 à 13:31, Lorenzo Sutton a écrit :
>>> Hi,
>>>
>>> On 22/03/2017 13:01, Jack wrote:
>>>> I need to find all instances that math to the first row.
>>>> It is not possible with [text search] if i am right.
>>>
>>> I think you should outline your use case/problem in more detail. This
>>> should be a good practice when asking for support on the Mailing List.
>>>
>>> Example:
>>>
>>> 1) I have a text file where each line contains a two integers separated
>>> by a space (" ") char - such as (possibly paste a part of the file on
>>> pastebin or similar too).
>>> 213214 12313
>>> 123223 13213
>>>
>>> 2) My file is [always/at least/circa/ ...] 2,539,592 lines long
>>>
>>> 3) My algorithm should find all subsequent lines matching the first line
>>> in the file and return [all line numbers for matches / the total count
>>> of matched lines / ...]
>>>
>>> 3) I want the algorithm to be [as fast as possible / run in under 1
>>> second / run in under 1ms / ... ]
>>>
>>> 4) I [want to / do not need to] use Pd Vanilla
>>>
>>> 5) My patch should run on [All platforms / Windows / OSX / Linux / ...]
>>>
>>> 6) My patch should run [on potentially any machine / on a Raspberry Pi /
>>> on a 1990s 386 machine / on my digital toaster where I have compiled a
>>> custom version of Pd / ... ]
>>>
>>> :)
>>>
>>>
>>>> ++
>>>>
>>>> Jack
>>>>
>>>>
>>>>
>>>> Le 22/03/2017 à 08:27, Liam Goodacre a écrit :
>>>>> You can also use [text search], although t's not so easy to find more
>>>>> than the first instance. If you don't mind taking a extra step, you
>>>>> could give each line a third term, which is the line number. Then you
>>>>> can use the "> 3" argument for [text search] to find matches s
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> *From:* Pd-list <pd-list-bounces at lists.iem.at> on behalf of Jack
>>>>> <jack at rybn.org>
>>>>> *Sent:* 21 March 2017 18:14
>>>>> *To:* pd-list at lists.iem.at
>>>>> *Subject:* [PD] Fastest way to find lines in text file
>>>>>
>>>>> Hello,
>>>>>
>>>>> I need to find every lines of a textfile containing a word.
>>>>> The textfile has 2.539.592 lines.
>>>>> Now, i am using [msgfile] from zexy because i can find a line, skip a
>>>>> line and find again ... until the end of the textfile.
>>>>> But, i am wondering if there is an other object (in an other library)
>>>>> faster, specialized in this work ?
>>>>> Thanx.
>>>>> ++
>>>>>
>>>>> Jack
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pd-list at lists.iem.at mailing list
>>>>> UNSUBSCRIBE and account-management ->
>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pd-list at lists.iem.at mailing list
>>>>> UNSUBSCRIBE and account-management ->
>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pd-list at lists.iem.at mailing list
>>>> UNSUBSCRIBE and account-management ->
>>>> https://lists.puredata.info/listinfo/pd-list
>>>>
>>>
>>> _______________________________________________
>>> Pd-list at lists.iem.at mailing list
>>> UNSUBSCRIBE and account-management ->
>>> https://lists.puredata.info/listinfo/pd-list
>>
>>
>> _______________________________________________
>> Pd-list at lists.iem.at mailing list
>> UNSUBSCRIBE and account-management ->
>> https://lists.puredata.info/listinfo/pd-list
>>
> 
> _______________________________________________
> Pd-list at lists.iem.at mailing list
> UNSUBSCRIBE and account-management ->
> https://lists.puredata.info/listinfo/pd-list