Thursday, February 3, 2011

Hard drives continuously failing in Fujitsu RX300 server

Hi,
We look after a Fujitsu RX300 S4 server that has 6 x 500GB SATA drives in a RAID-6 array, running from an LSI MegaRAID card (built into the motherboard).

A couple of weeks ago, one hard drive flagged itself as being faulty (orange light on the drive bay, MegaRAIDcli software shows a firmware status of "Failed"). We ordered and replaced the drive, but after the rebuild started, a different drive flagged itself as faulty.

This has happened 3 times now - twice it flagged up different drives that had a fault, and once it has flagged up a drive that we have already replaced.

At the moment, two drives are showing faults - we don't know if the drives are actually failing, or whether the backplane or RAID card is at fault.

Has anyone experienced this before? Any tips on what to do next? We have a call into Fujitsu, but wondered if anyone out there had any pointers....

  • I feel for you. This kind of hardware problems are extremely stressful and annoying to debug.

    Back in 2002 I had a "joy" of debugging a similar problem. After wayyyyy too much "Let's replace a HD" and similar server massaging the backplane was the actual fault. But that was an IBM server and a completely different story, anyway.

    If possible, test the "faulty" drives with another server and see if they are functioning normally there. My guts tell me in your case it's not about the drives, something else is broken. Drives tend not to break like that.

    fistameeny : Just come off the phone from Fujitsu. They are aiming to replace the backplane and upgrade the firmware on the controller and possibly the drives. Will post back with an update
    gWaldo : I was just going to suggest that you call them and check on the firmware! Ensure your backups are good, but I wouldn't "test the drives elsewhere" until you know that it's good. Putting the drives in another array will break their association with your current array. If you can live without that server for a while, I'd consider shutting it down. I'd also ask Fujitsu if your data is at risk and if shutting it down would help; escalate to a tech or engineer a few steps up - this isn't a question for the helpdesk!

0 comments:

Post a Comment