[NTLUG:Discuss] scratching head

Fri Feb 28 06:29:12 CST 2003

Steve Baker wrote:

> Fred James wrote:
>
>>
>> Very close - the "..." in this case just means any string of 
>> characters that you want to search for, though I admit that isn't 
>> standard notation.  For example:
>> find . -type f | xargs file | grep -i text | cut -f1 -d: | xargs grep 
>> "hello"
>
>
> Ah - that makes more sense...but it's still going to search non-text
> files when the filename contains the string 'text'...which will result
> in some interesting issues in some cases.
>
>> ...as a side note, experience has shown me that on certain system you 
>> may want to include something like "grep -v proc" (this seems to be 
>> system dependent), especially if you start at / in your search, to 
>> avoid getting hung up in some endless mess.  Maybe someone could shed 
>> some light on that?
>
>
> Well, the '/proc' directory on Linux (at least) is not really a set
> of files on disk somewhere - each 'file' is generated on-the-fly from
> the kernel somewhere.  So when you open the file and read it, those I/O
> requests get routed to some status-producing module somewhere.
>
> Since a number of the 'files' contain things like the complete contents
> of a program's address space, 'find' and 'grep' are very likely to turn
> up some "interesting" things that'll tie your program into knots.
>
> Reading these 'files' can also screw up some programss.  In some older
> versions of the kernel, you couldn't run 'more' or 'less' on the files
> in /proc.  That seems to have been fixed in more recent kernels...but
> these are still VERY strange files.  'ls -l' says most of them are of
> zero length - yet 'wc -c' does not agree!
>
> The problem with your script is that the 'file' program's idea of what
> is 'text' is pretty broad and 'find's idea of what is a "regular file"
> is also rather lax - so you are hitting a bunch of things that are 
> patently
> NOT simple text files - then stuffing a bunch of random garbage into 
> grep.
> If you get any kind of match for your test string in an essentially
> binary file, you get a LOT of crap appearing on your output.  Possibly
> gigabytes of crap.
>
> Doing this right is really a hard problem since there is a fine line
> between what is 'text' and what is crap...and computers are not good
> at telling the difference!
>
> ---------------------------- Steve Baker -------------------------
> HomeEmail: <sjbaker1 at airmail.net>    WorkEmail: <sjbaker at link.com>
> HomePage : http://www.sjbaker.org
> Projects : http://plib.sf.net    http://tuxaqfh.sf.net
>            http://tuxkart.sf.net http://prettypoly.sf.net
> -----BEGIN GEEK CODE BLOCK-----
> GCS d-- s:+ a+ C++++$ UL+++$ P--- L++++$ E--- W+++ N o+ K? w--- !O M- 
> V-- PS++ PE- Y-- PGP-- t+ 5 X R+++ tv b++ DI++ D G+ e++ h--(-) r+++ y++++
> -----END GEEK CODE BLOCK-----
>
>
> _______________________________________________
> https://ntlug.org/mailman/listinfo/discuss
>
>
One of my instructors was fond of saying "know your context" - in this 
case that would mean: don't go looking for text where you know there 
isn't any.  Yes, it is true that this little one liner is far from 
perfect (would need a program to fix that, I believe), but as one liners 
go I think it is pretty neat.

-- 
(new signature withheld, awaiting author's approval)