How Our Information Repository Works
I have finally come up with a solution that I am happy with for keeping our data organized. This is a home use system, probably overkill for most but I think I built it fairly intelligently and thriftily. I’m a big believer in making sure you don’t ever lose the data if you are going to bother organizing it.
First thing I did was get rid of all my “off-line” backup media like CD-Rs, DVD-Rs, etc. I still use them for semi annual HDD snap shots, but only for sections of my filesystems that I know change a lot or stuff that can be relatively self contained (MP3, DVD collections).
The only time I should ever have to access these CDs/DVDs is in a “disaster recovery” mode. No more piles of optical media cluttering my desk. Like the “paperless” office, this isn’t always reality but there is a lot less clutter than there used to be.
I put everything online – at least everything that I care about and would not want to lose. I have a NFS server (Linux of course, currently RH9) that has /home mounted on a large dedicated spindle. In addition to that there is a second spindle of the same size in a removable drive tray that rsync’s the content of the first spindle every night. I also intend to run AIDE (free equivalent to Tripwire) against the first volume to detect “bit rot”. This is not a hot swap configuration – for home use that would be unnecessary.
There is a third hard disk of the same size in another removable tray that gets swapped with the first removable tray on a regular basis. At any given time I have one of these removable drives in a safety deposit box at the bank. I rotate them once every 2-3 weeks depending on what I have on the go in my life. Sometimes more often if there has been a lot of data changes over a short period of time. For example, after uploading all my camera CF cards to the system after a long weekend I will run the backup and change the drive out and head to the bank.
Also, the home directories on my family’s main computer are NFS mounted off this NFS shared /home directory. This main “terminal” is a Mac G5 running OS X. It has a large S-ATA disk (160 GB) in it but all it contains is Mac OS and the few apps I run (PS, iLife tools, some gnu tools via fink). I intend to use it for streaming/mixing analog video once I get around to converting the family’s old Hi8 tapes. (remember /home is NFS mounted at 100MB/s – local disk would be better for AV work).
The nice thing about this model is that there is only one machine that needs real administration. The Mac takes care of itself for the most part, I can concentrate on the server.
You might balk at the cost of having two IDE HDDs “doing nothing” but try pricing an equivalent tape system that can handle hundreds of GBs of data as easily. I think a SDLT tape drive goes for around $3k CDN around here. Much more expensive. Also, in my case I built the system incrementally to spread the costs around. I went small on the disks (120GB). In retrospect I wish I had bought larger drives, especially now that I am putting DVDs online…
Now, the data organization part…
I use a combination of postnuke and gallery, and soon mythtv. These are just the tools I use; any CMS and image/music/video cataloging system(s) should do the same thing.
All photos get catalogued and annotated in gallery. This makes them searchable. I have separate apache virtual hosts set up for completely disparate types of photos. Examples: all my family photos are in one “repository”; all my off-roading and car and truck “porn” (~30k photos/videos of “naked” cars and trucks) are in another separate repository. Both of these collections are accessible from the same CMS (postnuke) and as such I create articles/stories in the CMS to create online simplified access to a collection of text, video, images, music, etc. It’s kind of ironic, but we are basically creating the high tech equivalent of “scrapbooking” by doing this.
That is how far I have gotten with organizing my data and it works. The following is the stuff I still need to finish.
I also have a lot of PDF/PS documents. In theory gallery can be used to catalog these as well but I haven’t hacked it yet to support PDF/PS. I don’t tend to have a lot of ascii text documents around; if I did I’d probably add a web interface to CVS or something of that nature. To me e-mail is not important enough to catalog. I’ll do the occasional print to PDF of something but otherwise they mostly get circular filed after 30 days. I suppose “print to fax” might be an alternative to PDF if you didn’t want to use a it because of proprietary concerns. (I don’t care personally, I’ll stick to MP3 over OGG as well)
The actual on-line browsing of MP3s, home video and DVDs will be done with mythtv running on the same box but output directly to the TV in the family room. The reason why I want to do it with a mythtv type box (freevo is another alternative sw, btw) is so that I can simplify access to this information for the rest of my family. Currently it is a PITA to even get a DVD playing in the DVD player as there are three different remotes to deal with. Pretty fugly, especially for the non-technical members of the household. MythTV will make it possible to get a single interface to do everything, independent of our Mac also. Additionally, since mythtv has a web browsing capability. I am hoping that I can also browse the gallery photo collections there. It’s more cozy to be on the couch sharing with the family than hunched around a computer display.
I haven’t gotten very far with how the cataloguing of MP3s works in Myth. Everything is currently in iTunes. I’m not an audiophile so this is a secondary purpose to me. I have about 50 GB of my own MP3s but I rarely listen to them. I need to be able to point and click an album/playlist simply is all my requirements consist of.
Anyway, I think that is the extent of what I’m doing (or will be doing) in a nutshell to keep stuff organized.
Links: Storing Your Digital Images By Vincent Bockaer
Updated on 13 February 2006: Added link to dpreview.com that has a good discussion on backup strategies and media types.
