[NTLUG:Discuss] Short script question

Tue May 4 18:56:20 CDT 2010

Bobby Wrenn wrote:
(omissions for brevity)
> Pretty close. Only what I want is a new file with all the lines where 
> the regex matches more than one line. I don't want to remove 
> duplicates I want a list of line where the first part of the line is 
> duplicated in more than one line.
>
> Regards,
> Bobby
Bobby Wrenn
Let me spin this one more time, but a little different ...
    (1) copy the original_file to copy_file
    (2) use Sed (or whatever you like) to modify copy_file - keeping 
every line but chipping off the '.*' part of each
... Now follow this thought ... if the pattern in line one of copy_file 
also matches line 10 and 2000 in original file, then the pattern in line 
10 of copy_file will also match line 2000 in original file, and at 
minimum you could wind up with ...
    line 1 once
    line 10 twice
    line 2000 twice
... which I doubt is optimal(?) ... so ...
    (3) sort copy_file > copy_file_0
    (4) uniq copy_file_0 > copy_file
... and then, in a loop (such as in a sh/bash shell script) ...
    (5) for each line in copy_file
    (5.1) grep line original_file > temporary_file
    (5.2) if lc temporary_file > 1
    (5.3) then cat temporary_file >> duplicates_file
... once that is complete ...
    (6) grep -vf duplicates_file original_file > new_file
... the result(s) ...
       duplicates_file contains all the lines that matched any of the 
regex's where more than one match was found
       new_file is original_file with all the lines in duplicates_file 
removed
... and you still have original_file in case you have to back up and try 
again

More clear mud?  Anywhere near what you are looking for?
Regards
Fred James