Saturday, January 29, 2011

RAID for a SCM server

I need to buy/build a server to host our Subversion repository (FYI: I am a dev/not an IT guy). Obviously this is mission critical, and needs to have high network and disk i/o performance. Our repository is currently 5GB and we support 20 devs. The server was going to be Windows 2008, but Linux is an option if it is a compelling and simpler/easier solution.

CLARIFICATION: The 5GB repository is about 2GB source, and yes, it needs to handle 20 devs doing multiple small commits, logs, histories, and checkouts all day long. (How do I clarify source commits? A few C# files here and there, with a few lines of changes? Pretty standard stuff.)

UPDATE: Budget: I was hoping to get by with $2,000 or less, only because I don't think we need to spend that much. However, if it takes $5,000, then that is what it takes. This is our LIFE. But if $2500 gets 100% and $5000 gets 103%, it isn't worth the extra money.

My first priority, of course, is data integrity. If a drive fails, I want to have the machine stop writes and be able to put a new drive in quickly to have the machine back up and running as fast as possible. (I can deal with a few hours of downtime, but not a few hours of "work" during the downtime).

I don't think I need (or want) RAID 5, as the rebuild cost seems to high/complicated.

At a minimum, I could use RAID 1, and have a backup disk (clearly one not from the same batch or even maker ;-)

RAID 1+0 looks like it might be faster? Is it worth the complexity?

Can someone point me to some suggestions and best practices for managing a RAID drive, in particular, whatever solution is offered, how do I manage the disk failure. Is there software that can notify me (email/pager) if a drive dies? Software that will prevent writes to the disk at that point?

Any other things I need to think of?

UPDATE: My Question is this: What are the advantages between hardware RAID vs Windows Server 2008 software RAID for RAID 1+0 wrt speed, management (of a dead disk) and alerts of disk failure.

Thanks

  • Your repository is 5GB, but what is the frequency of your commits / updates and the rough size of those?

    How much money are we working with here? This really is the most important first question you should ask yourself.

    RAID 1 or 1+0 with either 1 or 2 hot spares would be ideal I am thinking, this way if a drive does fail, the raid card will automatically begin rebuilding the RAID using the hot spare drive. You would then just buy a new drive to match the ones you have in there, and replace the bad one with that.

    : Updated with budget and commit information.
    : Hardware or software? Why one or the other? What about notification and controlling of what happens during a drive failure?
    From Zero0ne
  • If a drive fails, I want to have the machine stop writes and be able to put a new drive in

    RAID controllers typically don't operate like this. If a drive fails, the controller marks the array as degraded, and continues to let the array operate (but at a lower speed as it needs to do more error handling).

    I don't think I need (or want) RAID 5, as the rebuild cost seems to high/complicated.

    Generally RAID 5 and 6 are perfectly valid choices, the rebuild cost is rarely incurred. It's worse that the write performance of RAID 5/6 can be rather low.

    I could use RAID 1

    For 20 users, with decent disks I guess this would be fine.

    RAID 1+0 looks like it might be faster? Is it worth the complexity?

    Yes, RAID 1+0 is faster, and does not have any significant additional complexity -- this is one of the most frequently used RAID levels and all good controllers have a mature implementation of this. In a perfect world, a 4-disk RAID 1+0 could have 4x the read performance and 2x the write performance of a single drive. One thing though, costs goes up as you need at least 4 drives, and effective storage size relative to the number of drives used is not too great.

    how do I manage the disk failure. Is there software that can notify me (email/pager)

    Comes with the controller if you buy a decent one; you just have to install the management software and set it up for email notifications. Additionally you can put a hot-spare drive on the controller, so that it will rebuild right away (note that performance goes down during rebuild).

    3 tips:

    • Measure your current disk I/O pattern and performance needs on your existing server (perfmon etc). Don't go overboard on RAID if your actual disk I/O doesn't isn't that high. 20 users is not much, but of course Subversion may need more disk I/O than one would think.
    • Buy a name-brand server (Dell, HP, IBM, etc), don't DYI. It is almost never worth it for a generic standard server.
    • Remember, RAID != backup. You seem a little fixated on the disk failure scenario -- RAID provides you with a higher uptime for the server and more disk I/O, but you still need proper backups.
    3dinfluence : +1 for mentioning the backup thing....was about to post another answer just calling out the fact that this question seems to be a RAID in the place of backups.
    : 1. We are currently using an off site repository, so we don't have anything to measure with (until we get something in house) 2. I am sold on COTS hardware. It is fully baked 3. Yes, this is not backup, just an ability to recover quickly from a disaster.
    : I am still confused about software vs hardware raid and the software to manage it. Is there a disadvantage to using a SATA card, and Windows Server 2008 software to manage it? Or should I use a hardware card and it's software? What are the pros and cons of each?
    Chris S : Software raids tend to require bringing the server down to replace a disk. Hardware raids from Dell or HP don't. I'd recommend something like a HP ML310 with SC40Ge, a pair of SAS HDs in RAID 1, and Server 2008 R2 Std Ed. It'll cost about $2500 and last you for at least 3 years.
    Jesper Mortensen : @teleball: "Software RAID" (by which I mean something the OS handles) is cheap and pretty good. Downsides are as Chris S writes: a) RAID'ing the OS boot partition is very hard / impossible, b) recovering after disk failure requires a reboot, c) the management apps are typically more thought of as 'local' to the PC, fx. drive failure notifications are logged to syslog but the management apps can't send them as emails. If you have 20 devs working on this, then their salary totally dominates the HW cost -- get a small name-brand server with a 'real' hardware RAID controller.
  • I would recommend a hardware RAID 1+0 setup. This will give you good performance and redundancy/failure tolerance at the expense of costing a bit more (more drives are required vs. RAID 5).

    A mirrored RAID volume has 2 copies of all of the data, so if a drive fails you still have an accessible copy. You don't need to block disk access on a drive failure. You can configure your system with "hot spare" drives that sit unused until a drive failure occurs, then spring to life and automagically take the place of the failed drive. This should give you a fully-functional RAID volume that can tolerate another drive failure and buy you enough time to replace the failed drive. In order for the RAID 1+0 volume to completely fail, you would need to have multiple drive failures within a short amount of time (which is typically quite uncommon).

    Most hardware RAID controllers come with management software that can alert you on failures.

    Most of my server experience is using HP products, so I'm mainly speaking from that point of view (although most other brands do something similar).

    Chris S : Most drives from several years ago would be fast enough for this application. There's no need to go to RAID 10 to eek out the last bit of performance.
    From
  • I cannot see from your description where the high network or disk IO load could occur. 5 GB is a very small repository for SCM, C# files are just a few KB in size and 20 devs are no problem at all. So you should concentrate on reliability of your setup, so a server with redundant power supplies and a RAID 1 should be fine. Your main concern should be disaster recovery but this is nothing a RAID setup will buy you as you are probably aware.

    : I want fast checkouts of 2GB of data (over the network) as well fast history/log queries.
    Chris S : My 4 year old server can pump out 2GB in <10 seconds. A newer server could do better. Worry about reliability, your throughput requirements are fairly conservative.
    From

0 comments:

Post a Comment