DO NOT TRY IT IN PRODUCTION. USE AT YOUR OWN RISK!

FIFO

FIFO


In my last post i did write about some issues related with disksort:
1) Performance (Latency)
2) Consistency
In my D Script i’m printing the buf sector when the sd driver receives it. So, without sorting.
That’s important because i’m trying to understand why we have some label updates in the middle of my D script’s output, and that confirms that it has nothing to do (yet) with sd.c disksort algorithm. I think one thing we can guarantee is: the sd driver is receiving the commands in that order.
– But is ZFS sending the label updates in the wrong order?
– Or we have different groups of transactions on that report?
I need to figure it out, but i did some tests on a really idle server, and the commands are issued perfectly in sync with the spa_sync. Maybe we have cmds being mixed somewhere and i’m seeing that on sd driver layer. Anyway, i don’t have the right data to affirm anything… ;-)
Edited (13/sep/2010):
I did realize i’m not separating reads from writes, so the commands issued after the Label updates can be reads. And actually is what i think it is… what we need now is to print that information to confirm that, and if so, the only thing we need to understand is the Label update order (L0 and L2… L1 and L3).

From a performance perspective, as i did mention on that post too, i did solve a latency problem on some workloads tuning the zfs_vdev_max_pending parameter from “35” to “10” (and that’s the new default value for ZFS anyway). But for latencies to be predictable, i think a FIFO algorithm would be a better approach, solving the problem as a whole (even with 10 on the waitq we still have sorting enabled).
So, i have disabled disksort on disks, and here are the steps i did to change the disksort_disabled on a running system (i did not find this information anywhere, so i’m sharing this hoping can be usefull for somebody else):
OBS:
This is just for one disk, using this method you will need to do it for each instance (disk) you want, in this case i’m doing for instance 1, or sd1;
I’m changing kernel variables on a running server, do not do it in production or use this instructions at your own risk;
The changes made here are not persistent after server reboot;
In the end of this post there is some scripts to help set this parameter on more than one disk;

# echo '*sd_state::softstate 1 | \
::print -at "struct sd_lun"' | mdb -k

...
 a lot of output
...

So, let’s see just the disksort_disabled option:

# echo '*sd_state::softstate 1 | \
::print -at "struct sd_lun"' | \
mdb -k | grep disksort

ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0

The .3 is a specific bit on the address ffffff09090b77e7, so lets see the whole byte:

# echo '*sd_state::softstate 1 | \
::print -at "struct sd_lun"' | \
mdb -k | grep  ffffff09090b77e7

ffffff09090b77e7 unsigned un_f_pm_is_enabled :1 = 0
ffffff09090b77e7.1 unsigned un_f_watcht_stopped :1 = 0
ffffff09090b77e7.2 unsigned un_f_pkstats_enabled :1 = 0x1
ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0
ffffff09090b77e7.4 unsigned un_f_lun_reset_enabled :1 = 0
ffffff09090b77e7.5 unsigned un_f_doorlock_supported :1 = 0
ffffff09090b77e7.6 unsigned un_f_start_stop_supported :1 = 0
ffffff09090b77e7.7 unsigned un_f_reserved1 :1 = 0

So, the Byte is: 0000 0100 (Decimal 4). Let’s confirm…

# echo '0xffffff09090b77e7/B' | mdb -k

0xffffff09090b77e7:             4

So, to change the disksort_disabled to 1, we need to change that Byte to: 0000 1100 (Decimal 12). Let’s do it!

# echo 'ffffff09090b77e7/v0t12' | mdb -kw

0xffffff09090b77e7:             0x4     =       0xc

Ok, let’s see if that works…

echo '*sd_state::softstate 1 | \
::print -at "struct sd_lun" un_f_disksort_disabled' | mdb -k 

ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0x1

We can print the whole Byte to confirm too:

# echo '*sd_state::softstate 1 | \
::print -at "struct sd_lun"' | \
mdb -k | grep  ffffff09090b77e7

ffffff09090b77e7 unsigned un_f_pm_is_enabled :1 = 0
ffffff09090b77e7.1 unsigned un_f_watcht_stopped :1 = 0
ffffff09090b77e7.2 unsigned un_f_pkstats_enabled :1 = 0x1
ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0x1
ffffff09090b77e7.4 unsigned un_f_lun_reset_enabled :1 = 0
ffffff09090b77e7.5 unsigned un_f_doorlock_supported :1 = 0
ffffff09090b77e7.6 unsigned un_f_start_stop_supported :1 = 0
ffffff09090b77e7.7 unsigned un_f_reserved1 :1 = 0

Here is a sample script if you need to configure this for more disks (with the same configuration):

x=1; while [ $x -lt $NRDISKS ]; do echo -n "sd$x: "; \
for y in `echo "*sd_state::softstate 0t$x | \
::print -at 'struct sd_lun' un_f_pm_is_enabled" | \
mdb -k | awk '{print $1}'`; do echo "$y/v0t12" | \
mdb -kw; done;  let x=x+1; done

sd1: 0xffffff09090b77e7:             0x4     =       0xc
sd2: 0xffffff09102d0e27:             0x4     =       0xc
sd3: 0xffffff0910780c27:             0x4     =       0xc
sd4: 0xffffff0910ba04e7:             0x4     =       0xc
sd5: 0xffffff0910998367:             0x4     =       0xc
sd6: 0xffffff0932189067:             0x4     =       0xc
sd7: 0xffffff0932188be7:             0x4     =       0xc
sd8: 0xffffff0932188767:             0x4     =       0xc
sd9: 0xffffff09321882e7:             0x4     =       0xc
sd10: 0xffffff0932300e27:             0x4     =       0xc
sd11: 0xffffff09323009a7:             0x4     =       0xc
sd12: 0xffffff0932300527:             0x4     =       0xc
sd13: 0xffffff09323000a7:             0x4     =       0xc
sd14: 0xffffff09322ffc27:             0x4     =       0xc
sd15: 0xffffff09322ff7a7:             0x4     =       0xc
sd16: 0xffffff09322ff327:             0x4     =       0xc
sd17: 0xffffff090f79f7a7:             0x4     =       0xc
sd18: 0xffffff090f79fc27:             0x4     =       0xc
sd19: 0xffffff090f79f327:             0x4     =       0xc
sd20: 0xffffff0930adb0e7:             0x4     =       0xc
sd21: 0xffffff09322fc9e7:             0x4     =       0xc
sd22: 0xffffff09322fc567:             0x4     =       0xc
sd23: 0xffffff09322fc0e7:             0x4     =       0xc
...

So you can check the new value:

x=1; while [ $x -lt $NRDISKS ]; do echo -n "sd$x: "; \
echo "*sd_state::softstate 0t$x | \
::print -at 'struct sd_lun' un_f_disksort_disabled" | \
mdb -k; let x=x+1; done

sd1: ffffff09090b77e7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd2: ffffff09102d0e27.3 unsigned un_f_disksort_disabled :1 = 0x1
sd3: ffffff0910780c27.3 unsigned un_f_disksort_disabled :1 = 0x1
sd4: ffffff0910ba04e7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd5: ffffff0910998367.3 unsigned un_f_disksort_disabled :1 = 0x1
sd6: ffffff0932189067.3 unsigned un_f_disksort_disabled :1 = 0x1
sd7: ffffff0932188be7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd8: ffffff0932188767.3 unsigned un_f_disksort_disabled :1 = 0x1
sd9: ffffff09321882e7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd10: ffffff0932300e27.3 unsigned un_f_disksort_disabled :1 = 0x1
sd11: ffffff09323009a7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd12: ffffff0932300527.3 unsigned un_f_disksort_disabled :1 = 0x1
sd13: ffffff09323000a7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd14: ffffff09322ffc27.3 unsigned un_f_disksort_disabled :1 = 0x1
sd15: ffffff09322ff7a7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd16: ffffff09322ff327.3 unsigned un_f_disksort_disabled :1 = 0x1
sd17: ffffff090f79f7a7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd18: ffffff090f79fc27.3 unsigned un_f_disksort_disabled :1 = 0x1
sd19: ffffff090f79f327.3 unsigned un_f_disksort_disabled :1 = 0x1
sd20: ffffff0930adb0e7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd21: ffffff09322fc9e7.3 unsigned un_f_disksort_disabled :1 = 0x1
sd22: ffffff09322fc567.3 unsigned un_f_disksort_disabled :1 = 0x1
sd23: ffffff09322fc0e7.3 unsigned un_f_disksort_disabled :1 = 0x1
...

OBS: You just need to set the “NRDISKS” variable (e.g: how many disks you have).

To make this permanent, you will need to use this method, or find a /etc/system global parameter for it (what is not recommended). To be continued…

peace