[NTLUG:Discuss] sed awk question
David Stanaway
david at stanaway.net
Mon Aug 12 14:03:13 CDT 2002
On Mon, 2002-08-12 at 11:34, Chris Cox wrote:
> Bobby Wrenn wrote:
>
> >My reading led me to believe that awk was best for working with fields rather
> >than whole lines.
> >
> >My question last week made me think perl was better than either and more
> >versatile than either.
> >
> Usually true. The case where perl won't be better is cases where
> you cannot be sure of Perl's availability... which is becoming quite
> a small list btw. As more an more Unix vendors start shipping perl
> as a standard part of their base OS install, there will be less and
> less reasons to use things like sed and awk. Might still be
> some cases where memory and space are involved though.... or for
> old folks like myself that think naturally in terms of sed and
> awk. Since perl is evolving (esp. v6)... I'll probably defer
> on doing much perl. The radical changes to perl REs, while
> certainly techncially better, is a headache and a half.
>
> >
> >
> >So, here is my question this week. I need to sort a comma delimited file and
> >return only those line where the first field is unique. Currently I am
> >pulling the file into Access and elimating duplicates there. There has to be
> >a better way.
> >
> Not as easy as some would like to think...
> (this actually eliminates duplicates... technically you asked for data that
> ONLY had unique entries.... that could be done too) Outputs the first
> occurence of the duplicate key lines when duplicates are found.
>
> sort -t, -k1 csg.data | awk -F, '
> {
> if ($1 != lastsaved) {
> print;
> lastsaved=$1;
> }
> }'
>
> The uniq program does not provide the flexibility to deal with field
> delimited files.
>
Sorry, but that is incorrect:
dstanawa at ciderbox:~$ cat test.txt
aa,wrong,dsdf
asd,333,rewse
cc,wrong,sdsd
aa,f00f,tt
afg,sdf,sdaf
cc,dws,44
dstanawa at ciderbox:~$ sort -t, -k1 test.txt |uniq -t, -W1
aa,f00f,tt
afg,sdf,sdaf
asd,333,rewse
cc,dws,44
The problem is picking which entry gets printed, here, the file is
lexographically sorted.
--
David Stanaway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 481 bytes
Desc: This is a digitally signed message part
Url : http://ntlug.org/pipermail/discuss/attachments/20020812/9b1ec853/attachment.bin
More information about the Discuss
mailing list