[NTLUG:Discuss] Short script question

Tue May 4 10:17:45 CDT 2010

Bobby Wrenn wrote:
> I know this will be trivial to someone who deals with this sort of 
> thing every day. However, I do not fall into that category.
>
> I have been looking on the web for pointers on doing this and have 
> come up dry. Usually you want to delete duplicate lines. But I need to 
> do the opposite. I need to find lines in a tab delimited file which 
> are partial matches and save the matches to a new file something like 
> this;
>
> read a line into a buffer 1
> find another line that matches the regex of the line in buffer 1 put 
> it in buffer 2
> find another line that matches the regex of the line in buffer 1 put 
> it in buffer 3
> recurs to end of file
> append all the buffered lines to another file
> clear the buffer
> go to the next line and do it again until the end of the file
>
> The file is tab delimited and the regex will get the first word the 
> first tab the next word space and the first three character/numbers of 
> the next word as the search criteria. The rest of the line will be any 
> character. The part to match will be everything up to the first three 
> characters of the second word after the first tab.
>
> Can someone point me in the right direction? Perhaps an on line 
> tutorial that might cover something like this. I've looked at sed and 
> awk but all the examples I can find expect that you want to remove 
> duplicates.
>
> Thanks in advance
> Bobby Wrenn
>
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
>
>
Starting to answer my own question. I have the regex that will select 
the line
^([A-Z|0-9]+\t)([A-Z|0-9]+ [A-Z|0-9][A-Z|0-9][A-Z|0-9]).*
So I can search for a match to \1 but then I have to copy the rest of 
the line that does not match \2 then append both lines to a file, and 
recurs.