1Inside Logo HDDproblèmes de disque dur RAID de consommation et du matériel

Il y a quelques années j'ai eu la chance d'avoir la possibilité d'acheter un second raid matériel main 5 carte (un Areca 1220) pour un prix très bas. Depuis, je l'ai utilisé la carte ainsi que plusieurs séries de disques (à l'origine 300gig, puis 750gig et actuellement 1To) dans un PC serveur dédié en tant que grand magasin de fichiers réseau pour la musique de la famille, les photos, des vidéos et des back-ups.

Before I obtained the hard­ware card I tried using soft­ware raid, but found the res­ults very dis­ap­point­ing. The serv­er has a low power, single core cpu which isn’t really up to the task of act­ing as a raid‑5 engine. Whilst I’ve heard plenty of times that RAID isn’t a back-up, this is a case where only a cheap solu­tion will do. RAID-5 offers pro­tec­tion from single drive fail­ure, which is good enough for my pur­poses. The ded­ic­ated card offers an enorm­ous per­form­ance advant­age, but in prac­tice this isn’t very import­ant. The fea­tures it adds how­ever are! The Areca card offers an LA inde­pend­ent raid solu­tion which counts for a lot. It also offers online capa­city expan­sion and raid-level migra­tion (si, par exemple, I could upgrade to raid‑6). Both of these fea­tures are much less simple with cheap­er solutions.

Si, you might think, what’s the prob­lem. La réponse: the lack of options from Hard Disk manufacturers…

Ever since using the Areca card I have suffered from occa­sion­al drive “fail­ures”. Upon power­ing off and on the drive reappears as fully func­tion­al. I then have to spend many hours rebuild­ing the array from degraded back to nor­mal. After much search­ing I have dia­gnosed the prob­lem, but am unable to prop­erly solve it.

Hard Drive man­u­fac­tur­ers provide a range of drives for dif­fer­ent pur­poses. The typ­ic­al drives most of us buy are con­sumer level drives. The man­u­fac­tur­ers also offer enter­prise-class drives designed for serv­ers which have intens­ive use pat­terns and 24.7 uptime. These drives are often phys­ic­ally identic­al, but have under­gone addi­tion­al test­ing and are sup­plied with slightly dif­fer­ent firm­ware, optim­ised for serv­er workloads.

One of these fea­tures is Error Recov­ery Con­trol (ERC). This fea­ture is also called CCTL (Com­mand Com­ple­tion Time Lim­it) by Sam­sung and Hita­chi and TLER (Time-Lim­ited Error Recov­ery) by West­ern Digit­al. All drives suf­fer the occa­sion­al error at a phys­ic­al level, which could be caused by things like stray cos­mic rays. These errors are handled by redund­ancy built into the way the drive stores data, but occa­sion­ally one can be severe enough to cause prob­lems read­ing data. Nor­mal con­sumer drives will spend a pro­longed peri­od attempt­ing to read the dam­aged data to recov­er it. They then map it to a new part of the drive and everything con­tin­ues as nor­mal. Cependant, this delay can cause severe prob­lems in enter­prise envir­on­ments, so enter­prise drives will time-out their self-repair attempts after a short peri­od (usu­ally 7 seconds or so) and report the error to the raid con­trol­ler. The raid con­trol­ler then handles the error by recal­cu­lat­ing the data using the oth­er drives in the array. This pre­vents large delays in send­ing data, but requires the pres­ence of oth­er drives and a raid controller.

Si, I have a prop­er hard­ware raid card. It expects to hear back from drives with­in no more than 7–8 seconds regard­less of an error. I also have con­sumer hard drives, which attempt to repair their own errors for a long peri­od. So when an error occurs the drive tries to fix it, does­n’t respond with­in 7–8 seconds, and the raid con­trol­ler than assumes the drive has failed and kicks it out of the array.

Si, the obvi­ous solu­tions would be either to tell the raid con­trol­ler to wait longer without kick­ing a drive out, OR tell the drive to give up after 7 seconds like an enter­prise drive… Infuri­at­ingly, neither is possible!

I have searched extens­ively, but I can­’t find any prop­er raid‑5 cards which allow the user to change how long they will wait for a drive. In the past there were some WD drives which could have the TLER fea­ture enabled with a util­ity released by WD called WD-TLER, but recently WD have dis­abled this option, pre­sum­ably to “pro­tect” the huge markup on their enter­prise drives (which are double the price for the same hardware)

Some people have found ways to tem­por­ar­ily enable ERC on some drives using either HDAT2, Smart­CTL ou hdparm, how­ever these do not sup­port my RAID card under Win­dows, and the change is lost if the PC is power cycled.

For users like myself that need a large capa­city stor­age, and the fea­tures offered by a hard­ware raid‑5 solu­tion, but that do not need 24.7 uptime, long war­ranties or drives designed for heavy duty usage there is cur­rently NO appro­pri­ate solu­tion. Its about time either a drive man­u­fac­turer addressed this mar­ket (by releas­ing a con­sumer drive with ERC enabled for a small, par exemple. 15%, premi­um) or a raid-card man­u­fac­turer addressed the mar­ket by offer­ing a card with the option to increase the time before drives are timed out. Cre­at­ing either of these solu­tions is trivi­al, a simple firm­ware tweak would do the job.

Until then, I advise oth­ers to avoid using hard­ware raid cards with con­sumer drives, and giv­en the price premi­um of enter­prise drives I recom­mend­ing avoid­ing hard­ware raid altogether.

Vous avez quelques idées de votre propre? Laissez-vous tenter par des commentaires ci-dessous! Si vous souhaitez vous abonner s'il vous plaît utiliser le lien d'abonnement dans le menu en haut à droite. Vous pouvez également partager avec vos amis en utilisant les liens sociaux ci-dessous. À votre santé.

Laisser un commentaire

Un Commentaire