1内部硬盘徽标硬件RAID和消费者的硬盘问题

以前我几年是幸运的,足有购买二手硬件RAID机会 5 卡 (一个槟榔 1220) 一个非常低的价格. 从那时起,我拥有多套驱动器所使用的卡一起 (原本300gig, 然后750gig目前1Tb的) 在一个专用的服务器PC作为一个大的网络文件的存储,为家庭的音乐, 照片, 视频和后备.

Before I obtained the hard­ware card I tried using soft­ware raid, but found the res­ults very dis­ap­point­ing. The serv­er has a low power, single core cpu which isn’t really up to the task of act­ing as a raid‑5 engine. 虽然我听说过很多次,那 RAID 不是备份, this is a case where only a cheap solu­tion will do. RAID-5 offers pro­tec­tion from single drive fail­ure, which is good enough for my pur­poses. The ded­ic­ated card offers an enorm­ous per­form­ance advant­age, but in prac­tice this isn’t very import­ant. The fea­tures it adds how­ever are! 槟榔卡提供 操作系统 inde­pend­ent raid solu­tion which counts for a lot. It also offers online capa­city expan­sion and raid-level migra­tion (所以, 例如, I could upgrade to raid‑6). Both of these fea­tures are much less simple with cheap­er solutions.

所以, 你可能认为, what’s the prob­lem. 答案: the lack of options from Hard Disk manufacturers…

Ever since using the Areca card I have suffered from occa­sion­al drive “fail­ures”. Upon power­ing off and on the drive reappears as fully func­tion­al. I then have to spend many hours rebuild­ing the array from degraded back to nor­mal. After much search­ing I have dia­gnosed the prob­lem, but am unable to prop­erly solve it.

Hard Drive man­u­fac­tur­ers provide a range of drives for dif­fer­ent pur­poses. The typ­ic­al drives most of us buy are con­sumer level drives. The man­u­fac­tur­ers also offer enter­prise-class drives designed for serv­ers which have intens­ive use pat­terns and 24.7 运行时间. These drives are often phys­ic­ally identic­al, but have under­gone addi­tion­al test­ing and are sup­plied with slightly dif­fer­ent firm­ware, optim­ised for serv­er workloads.

One of these fea­tures is Error Recov­ery Con­trol (ERC). This fea­ture is also called CCTL (Com­mand Com­ple­tion Time Lim­it) by Sam­sung and Hita­chi and TLER (Time-Lim­ited Error Recov­ery) by West­ern Digit­al. All drives suf­fer the occa­sion­al error at a phys­ic­al level, which could be caused by things like stray cos­mic rays. These errors are handled by redund­ancy built into the way the drive stores data, but occa­sion­ally one can be severe enough to cause prob­lems read­ing data. Nor­mal con­sumer drives will spend a pro­longed peri­od attempt­ing to read the dam­aged data to recov­er it. They then map it to a new part of the drive and everything con­tin­ues as nor­mal. 然而, this delay can cause severe prob­lems in enter­prise envir­on­ments, so enter­prise drives will time-out their self-repair attempts after a short peri­od (usu­ally 7 秒左右) and report the error to the raid con­trol­ler. The raid con­trol­ler then handles the error by recal­cu­lat­ing the data using the oth­er drives in the array. This pre­vents large delays in send­ing data, but requires the pres­ence of oth­er drives and a raid controller.

所以, I have a prop­er hard­ware raid card. It expects to hear back from drives with­in no more than 7–8 seconds regard­less of an error. I also have con­sumer hard drives, which attempt to repair their own errors for a long peri­od. 因此,当出现错误的驱动器会尝试修复它, does­n’t respond with­in 7–8 seconds, and the raid con­trol­ler than assumes the drive has failed and kicks it out of the array.

所以, the obvi­ous solu­tions would be either to tell the raid con­trol­ler to wait longer without kick­ing a drive out, 或者告诉驱动器后放弃 7 seconds like an enter­prise drive… Infuri­at­ingly, 也不是可能的!

I have searched extens­ively, but I can­’t find any prop­er raid‑5 cards which allow the user to change how long they will wait for a drive. In the past there were some WD drives which could have the TLER fea­ture enabled with a util­ity released by WD called WD-TLER, but recently WD have dis­abled this option, pre­sum­ably to “pro­tect” the huge markup on their enter­prise drives (这是两倍相同的硬件价格)

Some people have found ways to tem­por­ar­ily enable ERC on some drives using either HDAT2, Smart­CTL 或者 hdparm的, how­ever these do not sup­port my RAID card under Win­dows, 如果PC上电循环的更改将丢失.

For users like myself that need a large capa­city stor­age, and the fea­tures offered by a hard­ware raid‑5 solu­tion, 但是,这并不需要 24.7 运行时间, long war­ranties or drives designed for heavy duty usage there is cur­rently NO appro­pri­ate solu­tion. Its about time either a drive man­u­fac­turer addressed this mar­ket (by releas­ing a con­sumer drive with ERC enabled for a small, e.g. 15%, premi­um) or a raid-card man­u­fac­turer addressed the mar­ket by offer­ing a card with the option to increase the time before drives are timed out. Cre­at­ing either of these solu­tions is trivi­al, a simple firm­ware tweak would do the job.

直到那时, I advise oth­ers to avoid using hard­ware raid cards with con­sumer drives, and giv­en the price premi­um of enter­prise drives I recom­mend­ing avoid­ing hard­ware raid altogether.

Leave a Reply

条评论