[NTLUG:Discuss] Why would a command stop working?

Steve Baker sjbaker1 at airmail.net
Mon Dec 31 00:45:05 CST 2001


Rick Matthews wrote:

> Apparently one of my input files contains some garbage.

I'd be suprised if that stopped uniq - unless the garbage is actually
within one of the supposedly duplicated lines and not in the other.

> These should be
> straight text files and they are being sorted by the full line (no
> options used with sort or uniq). How can I validate the format of the
> input files prior to processing? (I need to check to see if there is a
> grep option to select only text lines...)?

You could use 'tr' to delete characters in the range \000 to \011,
\013 to \037 and \177 to \377.  That should leave you with a clean
ASCII file...unless of course some of this 'garbage' is in the form
of printable characters.

I suppose the most likely thing is that one file contains TAB characters
and the other has spaces - or perhaps they have different line endings
(eg if one file came from a UNIX/Linux box and the other from a Windoze
machine or an old style Mac).

You can fix those things using 'tr' to delete the offending characters
or translate them into spaces.  There are also options to sort to
ignore leading blanks.

Uniq has some options for that kind of thing too - but they are
pretty much useless unless 'sort' has already placed the lines
that you wish to eliminate into consecutive order.

> > Is there some reason your script doesn't just do?:
> >       sort -u file1 > file2

BEWARE: Some older versions of 'sort' don't have '-u'.

----------------------------- Steve Baker -------------------------------
Mail : <sjbaker1 at airmail.net>   WorkMail: <sjbaker at link.com>
URLs : http://www.sjbaker.org
       http://plib.sf.net http://tuxaqfh.sf.net http://tuxkart.sf.net
       http://prettypoly.sf.net http://freeglut.sf.net
       http://toobular.sf.net   http://lodestone.sf.net




More information about the Discuss mailing list