1Inside Logo HDDproblèmes de disque dur RAID de consommation et du matériel

Il y a quelques années j'ai eu la chance d'avoir la possibilité d'acheter un second raid matériel main 5 carte (un Areca 1220) pour un prix très bas. Depuis, je l'ai utilisé la carte ainsi que plusieurs séries de disques (à l'origine 300gig, puis 750gig et actuellement 1To) in a ded­ic­ated serv­er PC as a large net­work file-store for the fam­ily’s music, les photos, des vidéos et des back-ups.

Before I obtained the hard­ware card I tried using soft­ware raid, but found the res­ults very dis­ap­point­ing. The serv­er has a low power, single core cpu which isn’t really up to the task of act­ing as a raid‑5 engine. Whilst I’ve heard plenty of times that RAID isn’t a back-up, this is a case where only a cheap solu­tion will do. RAID-5 offers pro­tec­tion from single drive fail­ure, which is good enough for my pur­poses. The ded­ic­ated card offers an enorm­ous per­form­ance advant­age, but in prac­tice this isn’t very import­ant. The fea­tures it adds how­ever are! The Areca card offers an LA inde­pend­ent raid solu­tion which counts for a lot. It also offers online capa­city expan­sion and raid-level migra­tion (si, par exemple, I could upgrade to raid‑6). Both of these fea­tures are much less simple with cheap­er solu­tions.

Si, you might think, what’s the prob­lem. La réponse: the lack of options from Hard Disk man­u­fac­tur­ers…

Ever since using the Areca card I have suffered from occa­sion­al drive “fail­ures”. Upon power­ing off and on the drive reappears as fully func­tion­al. I then have to spend many hours rebuild­ing the array from degraded back to nor­mal. After much search­ing I have dia­gnosed the prob­lem, but am unable to prop­erly solve it.

Hard Drive man­u­fac­tur­ers provide a range of drives for dif­fer­ent pur­poses. The typ­ic­al drives most of us buy are con­sumer level drives. The man­u­fac­tur­ers also offer enter­prise-class drives designed for serv­ers which have intens­ive use pat­terns and 24.7 uptime. These drives are often phys­ic­ally identic­al, but have under­gone addi­tion­al test­ing and are sup­plied with slightly dif­fer­ent firm­ware, optim­ised for serv­er work­loads.

One of these fea­tures is Error Recov­ery Con­trol (ERC). This fea­ture is also called CCTL (Com­mand Com­ple­tion Time Lim­it) by Sam­sung and Hita­chi and TLER (Time-Lim­ited Error Recov­ery) by West­ern Digit­al. All drives suf­fer the occa­sion­al error at a phys­ic­al level, which could be caused by things like stray cos­mic rays. These errors are handled by redund­ancy built into the way the drive stores data, but occa­sion­ally one can be severe enough to cause prob­lems read­ing data. Nor­mal con­sumer drives will spend a pro­longed peri­od attempt­ing to read the dam­aged data to recov­er it. They then map it to a new part of the drive and everything con­tin­ues as nor­mal. Cependant, this delay can cause severe prob­lems in enter­prise envir­on­ments, so enter­prise drives will time-out their self-repair attempts after a short peri­od (usu­ally 7 seconds or so) and report the error to the raid con­trol­ler. The raid con­trol­ler then handles the error by recal­cu­lat­ing the data using the oth­er drives in the array. This pre­vents large delays in send­ing data, but requires the pres­ence of oth­er drives and a raid con­trol­ler.

Si, I have a prop­er hard­ware raid card. It expects to hear back from drives with­in no more than 7–8 seconds regard­less of an error. I also have con­sumer hard drives, which attempt to repair their own errors for a long peri­od. So when an error occurs the drive tries to fix it, does­n’t respond with­in 7–8 seconds, and the raid con­trol­ler than assumes the drive has failed and kicks it out of the array.

Si, the obvi­ous solu­tions would be either to tell the raid con­trol­ler to wait longer without kick­ing a drive out, OR tell the drive to give up after 7 seconds like an enter­prise drive… Infuri­at­ingly, neither is pos­sible!

I have searched extens­ively, but I can­’t find any prop­er raid‑5 cards which allow the user to change how long they will wait for a drive. In the past there were some WD drives which could have the TLER fea­ture enabled with a util­ity released by WD called WD-TLER, but recently WD have dis­abled this option, pre­sum­ably to “pro­tect” the huge markup on their enter­prise drives (which are double the price for the same hard­ware)

Some people have found ways to tem­por­ar­ily enable ERC on some drives using either HDAT2, Smart­CTL ou hdparm, how­ever these do not sup­port my RAID card under Win­dows, and the change is lost if the PC is power cycled.

For users like myself that need a large capa­city stor­age, and the fea­tures offered by a hard­ware raid‑5 solu­tion, but that do not need 24.7 uptime, long war­ranties or drives designed for heavy duty usage there is cur­rently NO appro­pri­ate solu­tion. Its about time either a drive man­u­fac­turer addressed this mar­ket (by releas­ing a con­sumer drive with ERC enabled for a small, par exemple. 15%, premi­um) or a raid-card man­u­fac­turer addressed the mar­ket by offer­ing a card with the option to increase the time before drives are timed out. Cre­at­ing either of these solu­tions is trivi­al, a simple firm­ware tweak would do the job.

Until then, I advise oth­ers to avoid using hard­ware raid cards with con­sumer drives, and giv­en the price premi­um of enter­prise drives I recom­mend­ing avoid­ing hard­ware raid alto­geth­er.

Qu'est-ce que tu penses? Envoyez-nous un commentaire ci-dessous! Si vous souhaitez vous abonner s'il vous plaît utiliser le lien d'abonnement dans le menu en haut à droite. Vous pouvez également partager avec vos amis en utilisant les liens sociaux ci-dessous. À votre santé.

Laisser un commentaire

Un Commentaire