[PD] Fastest way to find lines in text file

Jack jack at rybn.org
Wed Mar 22 17:41:25 CET 2017


Le 22/03/2017 à 17:10, cyrille henry a écrit :
> 
> 
> Le 22/03/2017 à 17:01, Jack a écrit :
>> Good Idea !
>> Just need to order the textfile (In fact, the file is not totally
>> ordered) ;)
>> Thanx.
>> Speaking on this topic, give me a new idea on the good method to
>> adopt. :)
> 
> since you can do it in a non real time way, I think python have a sort
> function that can do this easily.
> or try with libre office.

Or command line :
$ sort -k1 -g linksIdOK.txt
++

Jack


> 
> cheers
> c
> 
>> ++
>>
>> Jack
>>
>>
>>
>> Le 22/03/2017 à 16:46, cyrille henry a écrit :
>>> if you textfile is composed of 2 row of number you can optimize the
>>> search with prior treatment.
>>>
>>> 1 : order the index column (already done in your example)
>>> 2 : create 2 table of start index, and number of occurrence of this
>>> index
>>> in you example, the "start index table" would be 0 at 345594, 5 at
>>> 345595, 15 at 345596, 16 at 345598
>>> the "number of occurrence index table" would be : 5 at 345594, 10 at
>>> 345595, 1 at 345596, 4 at 345598
>>> 3 : put column 2 of you textfile in a "data table"
>>>
>>> now, when searching for 345595, you just have to [tabread table1] and
>>> [tabread table2] at position 345595, and with a small until loop you
>>> just have to read the data table only where needed.
>>>
>>> cheers
>>> c
>>>
>>> Le 22/03/2017 à 14:34, Jack a écrit :
>>>> I guess my 2 precedent mails were enough clear.
>>>> But i will answer at each point :
>>>>
>>>> 1) My previous mails :
>>>> I need to find every lines of a textfile containing a word.
>>>> The textfile has 2.539.592 lines.
>>>> Now, i am using [msgfile] from zexy because i can find a line, skip a
>>>> line and find again ... until the end of the textfile.
>>>> But, i am wondering if there is an other object (in an other library)
>>>> faster, specialized in this work ?
>>>> ...
>>>> The textfile has only two "strings" by line.
>>>> Here, 20 lines of the textfile :
>>>>
>>>> 345594 577427
>>>> 345594 567267
>>>> 345594 528911
>>>> 345594 534435
>>>> 345594 523087
>>>> 345595 374384
>>>> 345595 377303
>>>> 345595 380544
>>>> 345595 379911
>>>> 345595 557020
>>>> 345595 552396
>>>> 345595 562487
>>>> 345595 460842
>>>> 345595 428449
>>>> 345595 424095
>>>> 345596 447676
>>>> 345598 579883
>>>> 345598 379495
>>>> 345598 379039
>>>> 345598 380328
>>>>
>>>> 2) See above
>>>> 3) See above
>>>> 4) See above
>>>> 5) Linux/Ubuntu 16.10/Pd 0.47.1
>>>> 6) you abuse :)
>>>>
>>>> ++
>>>>
>>>> Jack
>>>>
>>>>
>>>>
>>>>
>>>> Le 22/03/2017 à 13:31, Lorenzo Sutton a écrit :
>>>>> Hi,
>>>>>
>>>>> On 22/03/2017 13:01, Jack wrote:
>>>>>> I need to find all instances that math to the first row.
>>>>>> It is not possible with [text search] if i am right.
>>>>>
>>>>> I think you should outline your use case/problem in more detail. This
>>>>> should be a good practice when asking for support on the Mailing List.
>>>>>
>>>>> Example:
>>>>>
>>>>> 1) I have a text file where each line contains a two integers
>>>>> separated
>>>>> by a space (" ") char - such as (possibly paste a part of the file on
>>>>> pastebin or similar too).
>>>>> 213214 12313
>>>>> 123223 13213
>>>>>
>>>>> 2) My file is [always/at least/circa/ ...] 2,539,592 lines long
>>>>>
>>>>> 3) My algorithm should find all subsequent lines matching the first
>>>>> line
>>>>> in the file and return [all line numbers for matches / the total count
>>>>> of matched lines / ...]
>>>>>
>>>>> 3) I want the algorithm to be [as fast as possible / run in under 1
>>>>> second / run in under 1ms / ... ]
>>>>>
>>>>> 4) I [want to / do not need to] use Pd Vanilla
>>>>>
>>>>> 5) My patch should run on [All platforms / Windows / OSX / Linux /
>>>>> ...]
>>>>>
>>>>> 6) My patch should run [on potentially any machine / on a Raspberry
>>>>> Pi /
>>>>> on a 1990s 386 machine / on my digital toaster where I have compiled a
>>>>> custom version of Pd / ... ]
>>>>>
>>>>> :)
>>>>>
>>>>>
>>>>>> ++
>>>>>>
>>>>>> Jack
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le 22/03/2017 à 08:27, Liam Goodacre a écrit :
>>>>>>> You can also use [text search], although t's not so easy to find
>>>>>>> more
>>>>>>> than the first instance. If you don't mind taking a extra step, you
>>>>>>> could give each line a third term, which is the line number. Then
>>>>>>> you
>>>>>>> can use the "> 3" argument for [text search] to find matches s
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> *From:* Pd-list <pd-list-bounces at lists.iem.at> on behalf of Jack
>>>>>>> <jack at rybn.org>
>>>>>>> *Sent:* 21 March 2017 18:14
>>>>>>> *To:* pd-list at lists.iem.at
>>>>>>> *Subject:* [PD] Fastest way to find lines in text file
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I need to find every lines of a textfile containing a word.
>>>>>>> The textfile has 2.539.592 lines.
>>>>>>> Now, i am using [msgfile] from zexy because i can find a line,
>>>>>>> skip a
>>>>>>> line and find again ... until the end of the textfile.
>>>>>>> But, i am wondering if there is an other object (in an other
>>>>>>> library)
>>>>>>> faster, specialized in this work ?
>>>>>>> Thanx.
>>>>>>> ++
>>>>>>>
>>>>>>> Jack
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pd-list at lists.iem.at mailing list
>>>>>>> UNSUBSCRIBE and account-management ->
>>>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pd-list at lists.iem.at mailing list
>>>>>>> UNSUBSCRIBE and account-management ->
>>>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pd-list at lists.iem.at mailing list
>>>>>> UNSUBSCRIBE and account-management ->
>>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pd-list at lists.iem.at mailing list
>>>>> UNSUBSCRIBE and account-management ->
>>>>> https://lists.puredata.info/listinfo/pd-list
>>>>
>>>>
>>>> _______________________________________________
>>>> Pd-list at lists.iem.at mailing list
>>>> UNSUBSCRIBE and account-management ->
>>>> https://lists.puredata.info/listinfo/pd-list
>>>>
>>>
>>> _______________________________________________
>>> Pd-list at lists.iem.at mailing list
>>> UNSUBSCRIBE and account-management ->
>>> https://lists.puredata.info/listinfo/pd-list
>>
>>
>> _______________________________________________
>> Pd-list at lists.iem.at mailing list
>> UNSUBSCRIBE and account-management ->
>> https://lists.puredata.info/listinfo/pd-list
>>
> 
> _______________________________________________
> Pd-list at lists.iem.at mailing list
> UNSUBSCRIBE and account-management ->
> https://lists.puredata.info/listinfo/pd-list




More information about the Pd-list mailing list