Scrubbing those hard drives clean

School of Hard (Drive) Knocks

by ray on February 2, 2010

One of the problems with having a Windows Home Server with 12 hard drives is that hard drives do fail and there’s a dozen chances for that to happen. Add to that the 18TB of data those drives can hold and things are further complicated. The odds are not in my favor and it’s only a matter of time. I’d been thinking about that recently and had just begun looking at ways to monitor the drive health when I came across some health problems while testing out some tools.

I installed a WHS add-in called Home Server Smart that shows the SMART stats from the hard drives. Sure enough, there were a couple bad or pending bad sectors on a couple drives. But all drives have bad sectors and manufacturers plan for it and re-allocate the sectors to some spares. I’ll monitor those drives and if the bad sectors increase I’ll act. But there was one drive with 160 bad sectors. And sure enough, checking the system log showed this drive had a bad block.

Here’s the Home Server Smart screen from today which is up to 171 bad sectors this morning (Click for full size):

HomeServerSmart

Now, in retrospect the right thing to do would have been to remove this disk immediately and get a replacement. But I figured I’d be smart. It was time to get the hard drives under control and test them all. So I pulled out a spare 1TB drive and began running SpinRite at level 4 on it. For 1 TB, this would take an estimated 2 days to complete the testing. My plan was to replace one of the external 2TB drives first and then run SpinRite on that 2TB drive before using it to replace the problem drive. Naturally the problem drive was internal and inside a drive cage that would require removal to get at the drive. So the fewer times into the machine the better. But that assumed the drive lasted. I figured the problem wasn’t new, just newly noticed.

That was a couple days ago and I was monitoring the drive since then. There weren’t any new bad sectors the first couple of days. I felt confident because I had file duplication on so one drive failure would lose data. I also had full recent backups of everything. I replaced the 2GB external drive and began running SpinRite on the freed up 2TB drive. I was almost home free.

Then last night things began to go very, very wrong. Streaming video or file copies would stop for no apparent reason. No heavy disk usage, no heavy CPU load, no high memory usage. Into Home Server Smart again and the bad sectors are up to 176. So I immediately  began the drive removal process on that drive.

It was slow. Very slow. Painfully slow. 1GB per hour slow. I didn’t have 1,400+ hours for it to go through the drive removal process. But it was late so I decided to give it the night to see if it improved. I woke up to find it made no real progress. So I used the Shutdown command (since I couldn’t use the WHS console to do it) and powered off the server and powered it on. I figured drive duplication would save me from losing files. Although my fear was some corrupt data that would go unnoticed for months.

After the startup I began the drive remove again. After giving it the day while I was at work it was faster but not by much.  At the current rate it would take another 6 days to remove the drive. So I again shut it down. Then I went inside and pulled the drive cable so it would go missing.

When the server came back up it told me the drive was missing so it got that part right. I started the drive removal process again. It’s been running several hours now. The progress bar has moved. Because since there’s no actual data I can’t tell how far along it is. But perfmon does show some heavy read/write drive activity that looks like file copies. So I’m going to be patient. With the drive gone I now have some files that aren’t duplicated. I figure it’s duplicating or verifying those files. A normal drive removal would last overnight for this much data so I figure I need to wait until morning at the least. While I have them backed up I’d hate to have to figure out which ones were lost if they get corrupted so I’ll let WHS do its thing for awhile..

Lessons Learned (the important ones):

  1. Test all drives before installing them. The problem drive is one of the new ones. Whether SpinRite would have caught the problem or not is unknown since it took a couple weeks to manifest itself, but as of now any new drive gets SpinRite Level 5 before it goes into the server (or my PCs for that matter). I’d been lazy and impatient in the past. I’d go through the drive removal process the night before the new drive was due to arrive and slap the new one in as soon as it arrived. No more.
  2. Write down the drive serial numbers (and the specific model numbers). Somewhere along the line my drive mappings got messed up. When I thought I was removing the drive from external bay 4 I was actually removing the one from Bay 3. So when I rebooted I got a drive missing error. Luckily that was easily fixed by popping the drive back in. Still, it’s easily avoidable as the Disk Management add-in I use shows drive model and serial numbers (at least for most drives).
  3. Having Windows Home Server offline for drive removals really sucks. Timing wise this wasn’t too bad because I haven’t needed the files on it (although if it was online I’d be streaming video now). But the server has basically been offline since Sunday night.
  4. Hard Drives hate me at the moment. A 2TB drive I ordered for another project, but was going to use as a replacement, arrived DOA on Thursday and it’s replacement didn’t arrive until today. I can’t remember the last time I got a drive that was literally DOA and wouldn’t even spin up when I pulled it out of the box. Hopefully is just one of those things and not a manufacturing issue with a batch of them. The replacement is still in its unopened antistatic bag.
  5. If a hard drive seems to be going bad pull it immediately. Either replace it or run diagnostics on it. I left the drive in because I figured the bad sectors weren’t new. I figured I’d “be smart” and minimize the effort and time opening up the server case. I waited until there were new bad sectors but by then I was already in trouble.
  6. It’s not a new lesson but it reinforces my current beliefs. Windows Home Server file duplication is a good thing. Backups are a good thing. File duplication is not a backup.

For more information about the Home Server Smart Add-in you can see the reviews at HomeServerLand or We Got Served. No sense me repeating their review. I’ll just add my endorsement of the plug in. It’s a simple but well designed plug-in that does it’s thing without getting in the way. The plug-in is free but donations are accepted. I threw a small donation their way to encourage these types of add-ins.

{ 0 comments }

Yet Another Windows Home Server Disk Upgrade

January 9, 2010
Hard Drive Platter thumbnail

Just last week I wrote that I’d just upgrade the drives in my Windows Home Server as I needed the space, and I really didn’t think I’d need the space for awhile. Well, I copied up a bunch of files and my free space dropped below 2TB which is where I start considering upgrading a [...]

Read the full article →

Optimizing My Website – Low Hanging Fruit

January 7, 2010
Rocket ship thumbnail

In this article I look at the 4 main areas which I could easily optimize to improve my website performance. Image optimization, CSS optimization, Javascript optimization and plugin optimization where all used to improve the performance of my Wordpress site.

Read the full article →

Domains Up For Auction At Bido

January 4, 2010
WWW in gold thumbnail

Over the past year or so I’ve been looking at domaining among other potential ways to turn my interest in websites and operating systems into income producers. At least enough income to make it a self-supporting hobby. One of the ways to make some money with domains is to put them up at a domain [...]

Read the full article →

The OS Quest Trail Log #47: State of the Quest

January 3, 2010
Happy New Year 2010 Thumbnail

This isn’t one of those year in review or predictions for the year ahead posts. I hate those. But this is the time of year I look at my current computing situation and muse about where I want to go the next year. Not goals, not predictions, just some practical reviews – if I want [...]

Read the full article →

Website Performance Tools

December 18, 2009
Red sports car thumbnail

I mentioned in my previous Trail Log that my look at the new Site Performance feature in Google Webmaster Tools tools led me down a rat hole looking into ways to optimize my website. Since I use WordPress I approached this as WordPress optimization, but the reality was most of this was basic web optimization. [...]

Read the full article →

The OS Quest Trail Log #46: Housekeeping Edition

December 13, 2009
Santa having a beer thumbnail

It’s been over a month since I’ve done a Trail Log and I got some time for blog updates this weekend so I might aw well do one. The day job has kept me busy and pretty well burnt out by the time I get home so I haven’t dived in depth into anything for [...]

Read the full article →

Google DNS – Close But No Cigar

December 12, 2009
WWW in gold

Among Google’s recent announcements was their introduction of Google Public DNS. I’ve been using OpenDNS and have no complaints. Well, actually I recently found I had defaulted back to using my ISP’s DNS (Comcast), probably during a router firmware upgrade. When I switched to back OpenDNS I also didn’t notice a different over Comcast. I [...]

Read the full article →

Google Wants Our Photos In The Cloud

December 11, 2009
Compact Digital Camera thumbnail

Google currently has a deal going that offers a free Eye-Fi card when you lease 200GB of storage for them for a year. When I first saw it it seemed like a pretty good deal, and I hate to pass up a good deal. But it’s less of a deal if I don’t really need [...]

Read the full article →

Windows Home Server Power Pack 3

November 25, 2009
WHS Versions

Windows Home Server Power Pack 3 was released Tuesday and I’ve applied the upgrade. While I didn’t have any serious problems it didn’t go exactly as planned. Just problems enough to cause some momentary heart palpitations.

Read the full article →

The OS Quest Trail Log #45: Windows 7 Unleashed Edition

October 25, 2009
Pot of gold thumbnail

Microsoft seems to have found it’s pot of gold with the release of Windows 7. It seems to be getting universally positive reviews. My experience with it is still that it just works, which is what I want from an operating system. After all, let’s face it – all it is is an OS. Amazon [...]

Read the full article →