[NTLUG:Discuss] CD Turning 25

Sat Aug 18 21:34:16 CDT 2007

On 8/18/07, John K. Taber <jktaber at charter.net> wrote:
> On Sat, 2007-08-18 at 12:34 -0700, Fred wrote:
> > On Fri, 17 Aug 2007 Dennis Rice <dennis at dearroz.com> wrote:
> >
> > > So how does one insure that data is not lost (as I have done with my
> > > old Syquest) as technology advances?  What are your thoughts and
> > > predictions?  What would you do to insure that information is not
> > > lost?
> >
> > Start with a big laser and a sheet of granite, then etch all your data onto
> > the granite. Bury it in the sand. It worked for the Egyptians. Their data
> > has lasted over 5000 years. Of course their "lasers" were chisels but
> > they work the same.
> >
> > Anything else sucks in comparison, so if you're not using that method,
> > be prepared for the "Syquest effect".
> >
> > Fred
>
> Except that succeeding generations of Egyptians did not appreciate those
> old stone obelisks, stiles, inscribed monuments, and so on. They smashed
> them up, crushed them and used the remains to build roads and other
> monuments. The famed Rosetta stone was just a piece of an old obelisk
> used as road bed.
>
> What is needed is a record retention policy.
>
> I inherited family photos going back to the 1870s. They were baffling.
> Who were all those people dressed in funny clothes? Nobody thought of
> recording on the back of the pictures who they were, date, place, and
> occasion for the photo.
>
> I daresay for all the digital photos NTLUG members have on their drives
> there is no record of who, date, place, and occasion. Well, have you?
>
> I had a relative who survived almost to 100, and she identified some of
> photo people for me. That was nice, but even so the identified pictures
> didn't mean much to me. So that was my second cousin three times
> removed. Great. Never heard of her.
>
> She misremembered some, misidentified several, so we cannot be sure of
> her identifications. They are probably 75% correct I estimate. After
> all, at the time she was 90 and those people in the photos were friends
> and relatives she hadn't seen in 60 years.
>
> I hate to say this but most data SHOULD disappear. For some perverse
> reason people in the computer business feel that every bit in every byte
> is precious and should be preserved forever. Preservation should be the
> exception, not the rule.
>
> My advice is work out what should be preserved -- remember, preservation
> is the exception. This data should be needed in the future rather than
> gee, maybe, this might could be nice to have. Needed, not nice. And once
> past its need, get rid of it.
>
> <\end rant>
>
> John

Very similar experience to one my family had.
Which makes me agree with your conclusions.
Easy to say, hard to do while you are alive.
Eliminating memories is probably best left to your survivors.
Its the period of time between when the accumulation of Information
piles up and survivor cleanup commences that needs help.

We did several Enterprise studies of the costs of keeping Stored
Information and deleting it. These studies were based on many good
studies done by others previously.
What is found is that the cost of removing the Information exceeds the
cost of keeping it after a point in time. At least in the Enterprise.
For the SOHO?
That point in time is when no one quickly remembers what the
Information is by looking at the name, or its Storage location, and
what its value is to ongoing business.

Here's the experience my family had:
In response to:
"Oh, and one more thing. Robert Pearson's first comment above ["CAS (Content
Addressed Storage) requires organizing the Storage by Content. This requires
some, if not a whole lot of, advance knowledge of the Information being
stored."] truly invites a reaction, as the exact opposite is true. Real CAS
explicitly does **not** require organizing Storage by Content."

I guess I am a little confused by all of this?
You have my respect as an acknowledged expert in CAS.
I don't claim to be.
My goal is to build consensus and make CAS better and more viable.

I fully agree with your statement about the Information, once it is in the CAS.
My statement was directed to these areas:
1) Determining Stored Information that is a candidate for CAS
2) Getting it in the CAS
3) Once the Information is in the CAS dealing with the same
issues "non-CAS" Storage has like "hot-spots", bottlenecks, updates, migrations,
replications. and synchronizations.

Is CAS exempt from these "Management" issues?

How long would it take to migrate 30 TB of Seismic Information to CAS?
The same as for "regular" Storage?
I can tell you how long a 30 TB "snapshot" takes to commit and that is not being
hashed for Content.

Here is an example:
A real world example is when my Uncle died. His wife preceded him by a few
years.
My Uncle loved to take pictures. He had thousands of pictures.
He had some made into movies. He loved to show these at the family reunions. One
whole room was his for the slide shows and the home theatre.
And he took more pictures all the time.
He knew all the people in the pictures and had stories to tell about the person,
the place, and the picture.

I helped his children transfer all these pictures to CD. Not every recipient had
DVD capability.
But we really didn't know what to do with them. They seemed very valuable to us
somehow. Like a valuable piece of history.
They should have been made with a VCR, or transferred to a VCR, so there would
be audio but that technology wasn't available until much later.
Between all of us we could probably identify 10% of the people and places.
We physically archived the pictures, movies and master DVD, made the CDs and
disbanded.
That video Information has become valueless.
The technology is there to view them at any time. To what purpose?
Nobody derives any value from viewing them.

We put them on the CAS.
Suddenly they became hugely popular as background for TV shows. They were
constantly being accessed. The TV people wanted to edit them and add comments.
Now I've got many versions of the originals that have become "fixed" that need
to go on the CAS. I am out of CAS.
Does CAS offer versioning software?
Does it work with popular versioning software?

Then I got a chance to buy hundreds of hours of old TV sitcom video. We put it
on the CAS figuring it would only be read.
Wrong! Thousands of "stills" have been edited out and produced.
These need to be stored in a related fashion to the CAS originals.
Is the hash on a still from a video the same as the video hash?
There is all kinds of Information related to the original Content that is stored
on the CAS in a Content related fashion.
I keep hearing database. It was in a database.

Plus they want to insert "human realistic" pellet people in place of the
original actors to avoid paying any royalties. So now I have
the original, modified originals and "human realistic" copies. They are all the
same content but play at different levels of value. Maybe these are not good
candidates for CAS?

What about the level of Information High Availability? Information Integrity?
Disaster Recovery? Business Continuance?
Worst of all, Findability?
Each Unit of Information stored on the CAS is subject to all these demands.

"Vendor A" said CAS was the wrong solution. What I needed was "5 nines (99999)"
active Storage. And lots of it. He bought me lunch.