Friday was funny… i had a two node cluster for test on our storage lab, and i was out of the office, delivering a presentation on Oracle Open Storage forum Wednesday/28 – Rio de Janeiro, and thursday/29 – São Paulo (more on that soon). So, a co-worker was trying to make some tests with one server, and was facing the “amnesia” scenario on that cluster.
So, he did remove one of the rpool discs (Hardware/RAID 1), and did import it on his desk. Trying to debug the problem, and see if something was wrong with the ZFS root pool. Well, everything was fine… let’s put the disk on place again…

Friday i was back and trying to follow with my tests, but the cluster was down (the amnesia was still there). One node Up, and the other saying: pool checksum verification failed! WTF?
So, after a little conversation, and knowing about the “import procedure”… SDC: Silent Data Corruption 1, 2 (Conventional RAID systems are blind about the data they should protect). Well, it was just remove the disk again, and the node 2 came to life. Cluster up and running again!
Why are we using hardware RAID in the first place? ;-)
Long history…