Monday, March 13, 2017

a layer of indirection

I forget the disclaimer that you should not do this, ever :)

We had a cluster for Hadoop experiments at uni and no ressources to replace all the faulty disks at that time (20-30% were faulty to some degree from the SMART values - more than 150 disks). So this was kind of an experiment. All used data was available and backup up ouside of that cluster. The problem was that with ext4 after running a job certain disks always switched to readonly and this was a major hassle as this node had to be touched by hand. HDFS ist 3x replicated and checksummed and the disks usally worked fine for quite a time after the first bad sector. So we switched to ZFS, ran weekly scrubs - only replaced disks that didn't survived the srub in reasonable time or with reasonable failure rates and bumped up the HDFS checksum reads that everything is control read once a week. The working directory for the layer above (MapReduce and stuff like that) got a dataset with copies=2 so that intermediate data is still fine within reasonable amounts. This was for learning or research purposes where top speed or 100% integrity didn't matter and uptime and usability was more important. Basically the metadata on disk had to be sound and the data on a single disk didn't matter that much. This was quite a ride and it's long been replaced since then.

Just thought it's interesting how far you can push that. In the end it worked but turned out there is no magic, disks die sooner or later and sometimes take the whole node with them.

Don't go to ebay and buy broken disks out of believing with ZFS these will work. Some survive a while, most die fast, some exhibit strange behavoir.

That RAIDZ is more or less for "let's see where this goes" purposes, backups are in place it's not a production system.



from lizard's ghost http://ift.tt/2nlAtCj

No comments:

Post a comment