I talk too much…
Ok, that was funny… well, not really, because i did loose a lot of time on it. I have no experience with ASM (Automatic Storage Management) by Oracle, and while a system administrator, i did manage the oracle volumes as filesystems. The raw devices on that time had the “performance” thing, but was a nightmare for the DBA’s to manage, so soon they would ask to use a normal filesystem.
So, in these days there is Oracle ASM and so i could see that the Oracle Database and Cluster installation/configuration procedure still is the “crap” as always. I really don’t like the kind of application that wants to do everything, like others could not do the task right. So, it does the database thing, the Operating system thing, the filesystem, the NFS client, etc….
Looking at the configuration of some LUN’s for Oracle RAC (it was not my business ;-), i was looking at a co-worker procedure using the format utility and said to him (my first mistake! ;-):
“Oh, i did never used the Solaris format utility anymore, why are you using it”?
“So, that still is the procedure from Oracle…”
So, i did talk again (yeah, another mistake):
“Ok, but so do a script to use a utility for this task.. Solaris has one, what is the name…”.
Well, i could not remember, so i said again (reading the docs he was reading, and not keeping my mouth shut):
“You just need one partition, let’s do a simple script here, and use ZFS to make this job”.
^– Big mistake!
I did a little script, did create a zpool on each disk, and zpool destroy it. Simple and fast. But (always has a but), i talk to much, so i did continue to look at the procedure and looking at the database creation and tests. So, looking at the disks behaviour i did think: Something is wrong with that… we were doing some transactions tests to measure the performance for a new hardware, and no synchronous writes on the ZFS log. A really good performance, but not real. My co-worker did ask me what was the problem, he was not understanding my concern. Oracle should be doing the right thing as always. But synch without using the SSD’s, something was wrong. I think you know already what was the problem, but the fact is that i did loose more time to realize the cause.
First i did a simple test using one LUN to create a ZFS filesystem and do a iozone sync and O_DSYNC test (oracle uses O_DSYNC), and everything was working fine. So, we did look at the oracle process that handles the log (pfiles), and O_DSYNC was there. So, something wrong with the underlying parts… so, the light… ;-)
ZFS enables the write cache on disks because it knows how to handle it, and a zpool destroy does not disable it. Using the great tool from Robert Milkowski (WCE), i could see that all the LUN’s were with the cache enabled (you can use format -e to get/set it too).
Well, with the write cache disabled all worked fine. In the end, i think an old problem that i had with AVS can be root caused on this too… if you did follow the OpenSolaris community and specifically the OHAC community, you know that i did write an agent to be used with Sun Cluster and AVS. I did face a problem that after the reboot the configuration of AVS was lost. And other users were facing the same problem. I could not confirm this, and IIRC i did a AVS configuration not using ZFS to create the slice, but i’m not sure. If i did not, would be nice to make this test and see if the problem persists.
So, that is the tip, if you use ZFS like me to initialize a raw disk, take care with the write cache enabled, because you can use a software like ASM that do not know how to handle it, and don’t care to (at least) disable it.
Did you ever did a hdparm -W on your disks? ;-)