[NTLUG:Discuss] Short script question
Bobby Wrenn
bobby at wrennest.com
Tue May 4 10:17:45 CDT 2010
Bobby Wrenn wrote:
> I know this will be trivial to someone who deals with this sort of
> thing every day. However, I do not fall into that category.
>
> I have been looking on the web for pointers on doing this and have
> come up dry. Usually you want to delete duplicate lines. But I need to
> do the opposite. I need to find lines in a tab delimited file which
> are partial matches and save the matches to a new file something like
> this;
>
> read a line into a buffer 1
> find another line that matches the regex of the line in buffer 1 put
> it in buffer 2
> find another line that matches the regex of the line in buffer 1 put
> it in buffer 3
> recurs to end of file
> append all the buffered lines to another file
> clear the buffer
> go to the next line and do it again until the end of the file
>
> The file is tab delimited and the regex will get the first word the
> first tab the next word space and the first three character/numbers of
> the next word as the search criteria. The rest of the line will be any
> character. The part to match will be everything up to the first three
> characters of the second word after the first tab.
>
> Can someone point me in the right direction? Perhaps an on line
> tutorial that might cover something like this. I've looked at sed and
> awk but all the examples I can find expect that you want to remove
> duplicates.
>
> Thanks in advance
> Bobby Wrenn
>
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
>
>
Starting to answer my own question. I have the regex that will select
the line
^([A-Z|0-9]+\t)([A-Z|0-9]+ [A-Z|0-9][A-Z|0-9][A-Z|0-9]).*
So I can search for a match to \1 but then I have to copy the rest of
the line that does not match \2 then append both lines to a file, and
recurs.
More information about the Discuss
mailing list