[NTLUG:Discuss] Short script question
Fred James
fredjame at fredjame.cnc.net
Tue May 4 18:56:20 CDT 2010
Bobby Wrenn wrote:
(omissions for brevity)
> Pretty close. Only what I want is a new file with all the lines where
> the regex matches more than one line. I don't want to remove
> duplicates I want a list of line where the first part of the line is
> duplicated in more than one line.
>
> Regards,
> Bobby
Bobby Wrenn
Let me spin this one more time, but a little different ...
(1) copy the original_file to copy_file
(2) use Sed (or whatever you like) to modify copy_file - keeping
every line but chipping off the '.*' part of each
... Now follow this thought ... if the pattern in line one of copy_file
also matches line 10 and 2000 in original file, then the pattern in line
10 of copy_file will also match line 2000 in original file, and at
minimum you could wind up with ...
line 1 once
line 10 twice
line 2000 twice
... which I doubt is optimal(?) ... so ...
(3) sort copy_file > copy_file_0
(4) uniq copy_file_0 > copy_file
... and then, in a loop (such as in a sh/bash shell script) ...
(5) for each line in copy_file
(5.1) grep line original_file > temporary_file
(5.2) if lc temporary_file > 1
(5.3) then cat temporary_file >> duplicates_file
... once that is complete ...
(6) grep -vf duplicates_file original_file > new_file
... the result(s) ...
duplicates_file contains all the lines that matched any of the
regex's where more than one match was found
new_file is original_file with all the lines in duplicates_file
removed
... and you still have original_file in case you have to back up and try
again
More clear mud? Anywhere near what you are looking for?
Regards
Fred James
More information about the Discuss
mailing list