[PD] Fastest way to find lines in text file

cyrille henry ch at chnry.net
Wed Mar 22 17:10:08 CET 2017



Le 22/03/2017 à 17:01, Jack a écrit :
> Good Idea !
> Just need to order the textfile (In fact, the file is not totally
> ordered) ;)
> Thanx.
> Speaking on this topic, give me a new idea on the good method to adopt. :)

since you can do it in a non real time way, I think python have a sort function that can do this easily.
or try with libre office.

cheers
c

> ++
>
> Jack
>
>
>
> Le 22/03/2017 à 16:46, cyrille henry a écrit :
>> if you textfile is composed of 2 row of number you can optimize the
>> search with prior treatment.
>>
>> 1 : order the index column (already done in your example)
>> 2 : create 2 table of start index, and number of occurrence of this index
>> in you example, the "start index table" would be 0 at 345594, 5 at
>> 345595, 15 at 345596, 16 at 345598
>> the "number of occurrence index table" would be : 5 at 345594, 10 at
>> 345595, 1 at 345596, 4 at 345598
>> 3 : put column 2 of you textfile in a "data table"
>>
>> now, when searching for 345595, you just have to [tabread table1] and
>> [tabread table2] at position 345595, and with a small until loop you
>> just have to read the data table only where needed.
>>
>> cheers
>> c
>>
>> Le 22/03/2017 à 14:34, Jack a écrit :
>>> I guess my 2 precedent mails were enough clear.
>>> But i will answer at each point :
>>>
>>> 1) My previous mails :
>>> I need to find every lines of a textfile containing a word.
>>> The textfile has 2.539.592 lines.
>>> Now, i am using [msgfile] from zexy because i can find a line, skip a
>>> line and find again ... until the end of the textfile.
>>> But, i am wondering if there is an other object (in an other library)
>>> faster, specialized in this work ?
>>> ...
>>> The textfile has only two "strings" by line.
>>> Here, 20 lines of the textfile :
>>>
>>> 345594 577427
>>> 345594 567267
>>> 345594 528911
>>> 345594 534435
>>> 345594 523087
>>> 345595 374384
>>> 345595 377303
>>> 345595 380544
>>> 345595 379911
>>> 345595 557020
>>> 345595 552396
>>> 345595 562487
>>> 345595 460842
>>> 345595 428449
>>> 345595 424095
>>> 345596 447676
>>> 345598 579883
>>> 345598 379495
>>> 345598 379039
>>> 345598 380328
>>>
>>> 2) See above
>>> 3) See above
>>> 4) See above
>>> 5) Linux/Ubuntu 16.10/Pd 0.47.1
>>> 6) you abuse :)
>>>
>>> ++
>>>
>>> Jack
>>>
>>>
>>>
>>>
>>> Le 22/03/2017 à 13:31, Lorenzo Sutton a écrit :
>>>> Hi,
>>>>
>>>> On 22/03/2017 13:01, Jack wrote:
>>>>> I need to find all instances that math to the first row.
>>>>> It is not possible with [text search] if i am right.
>>>>
>>>> I think you should outline your use case/problem in more detail. This
>>>> should be a good practice when asking for support on the Mailing List.
>>>>
>>>> Example:
>>>>
>>>> 1) I have a text file where each line contains a two integers separated
>>>> by a space (" ") char - such as (possibly paste a part of the file on
>>>> pastebin or similar too).
>>>> 213214 12313
>>>> 123223 13213
>>>>
>>>> 2) My file is [always/at least/circa/ ...] 2,539,592 lines long
>>>>
>>>> 3) My algorithm should find all subsequent lines matching the first line
>>>> in the file and return [all line numbers for matches / the total count
>>>> of matched lines / ...]
>>>>
>>>> 3) I want the algorithm to be [as fast as possible / run in under 1
>>>> second / run in under 1ms / ... ]
>>>>
>>>> 4) I [want to / do not need to] use Pd Vanilla
>>>>
>>>> 5) My patch should run on [All platforms / Windows / OSX / Linux / ...]
>>>>
>>>> 6) My patch should run [on potentially any machine / on a Raspberry Pi /
>>>> on a 1990s 386 machine / on my digital toaster where I have compiled a
>>>> custom version of Pd / ... ]
>>>>
>>>> :)
>>>>
>>>>
>>>>> ++
>>>>>
>>>>> Jack
>>>>>
>>>>>
>>>>>
>>>>> Le 22/03/2017 à 08:27, Liam Goodacre a écrit :
>>>>>> You can also use [text search], although t's not so easy to find more
>>>>>> than the first instance. If you don't mind taking a extra step, you
>>>>>> could give each line a third term, which is the line number. Then you
>>>>>> can use the "> 3" argument for [text search] to find matches s
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> *From:* Pd-list <pd-list-bounces at lists.iem.at> on behalf of Jack
>>>>>> <jack at rybn.org>
>>>>>> *Sent:* 21 March 2017 18:14
>>>>>> *To:* pd-list at lists.iem.at
>>>>>> *Subject:* [PD] Fastest way to find lines in text file
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I need to find every lines of a textfile containing a word.
>>>>>> The textfile has 2.539.592 lines.
>>>>>> Now, i am using [msgfile] from zexy because i can find a line, skip a
>>>>>> line and find again ... until the end of the textfile.
>>>>>> But, i am wondering if there is an other object (in an other library)
>>>>>> faster, specialized in this work ?
>>>>>> Thanx.
>>>>>> ++
>>>>>>
>>>>>> Jack
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pd-list at lists.iem.at mailing list
>>>>>> UNSUBSCRIBE and account-management ->
>>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pd-list at lists.iem.at mailing list
>>>>>> UNSUBSCRIBE and account-management ->
>>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pd-list at lists.iem.at mailing list
>>>>> UNSUBSCRIBE and account-management ->
>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pd-list at lists.iem.at mailing list
>>>> UNSUBSCRIBE and account-management ->
>>>> https://lists.puredata.info/listinfo/pd-list
>>>
>>>
>>> _______________________________________________
>>> Pd-list at lists.iem.at mailing list
>>> UNSUBSCRIBE and account-management ->
>>> https://lists.puredata.info/listinfo/pd-list
>>>
>>
>> _______________________________________________
>> Pd-list at lists.iem.at mailing list
>> UNSUBSCRIBE and account-management ->
>> https://lists.puredata.info/listinfo/pd-list
>
>
> _______________________________________________
> Pd-list at lists.iem.at mailing list
> UNSUBSCRIBE and account-management -> https://lists.puredata.info/listinfo/pd-list
>



More information about the Pd-list mailing list