[NTLUG:Discuss] Short script question

Fred James fredjame at fredjame.cnc.net
Tue May 4 12:07:29 CDT 2010


Bobby Wrenn wrote:
> Bobby Wrenn wrote:
>> I know this will be trivial to someone who deals with this sort of 
>> thing every day. However, I do not fall into that category.
>>
>> I have been looking on the web for pointers on doing this and have 
>> come up dry. Usually you want to delete duplicate lines. But I need 
>> to do the opposite. I need to find lines in a tab delimited file 
>> which are partial matches and save the matches to a new file 
>> something like this;
>>
>> read a line into a buffer 1
>> find another line that matches the regex of the line in buffer 1 put 
>> it in buffer 2
>> find another line that matches the regex of the line in buffer 1 put 
>> it in buffer 3
>> recurs to end of file
>> append all the buffered lines to another file
>> clear the buffer
>> go to the next line and do it again until the end of the file
>>
>> The file is tab delimited and the regex will get the first word the 
>> first tab the next word space and the first three character/numbers 
>> of the next word as the search criteria. The rest of the line will be 
>> any character. The part to match will be everything up to the first 
>> three characters of the second word after the first tab.
>>
>> Can someone point me in the right direction? Perhaps an on line 
>> tutorial that might cover something like this. I've looked at sed and 
>> awk but all the examples I can find expect that you want to remove 
>> duplicates.
>>
>> Thanks in advance
>> Bobby Wrenn
> Starting to answer my own question. I have the regex that will select 
> the line
> ^([A-Z|0-9]+\t)([A-Z|0-9]+ [A-Z|0-9][A-Z|0-9][A-Z|0-9]).*
> So I can search for a match to \1 but then I have to copy the rest of 
> the line that does not match \2 then append both lines to a file, and 
> recurs.
Bobby Wrenn
'grep' should do what you want in terms of writing all (complete lines) 
wherein a match is found ... so ... maybe you could ...
    (1) read the part(s) of the lines in the original file that you want 
to match into a "pattern_file"
    (2) use grep with the -f option to use the pattern_file, and maybe 
the -n to get line numbers as well
???
Hope this helps - or did I miss the point all together?
Regards
Fred James




More information about the Discuss mailing list