The One

Ok, if you are a noob C programmer, you will find this post really fun, serious, it will make you LOL. If you are a noob, really noob “XYZ” programmer, maybe you find it helpfull… maybe… no. You will LOL too…
So, one more time we have a core file sitting around on our servers. That is serious stuff, and we can have a really big problem, and a difficult debug task ahead… a good opportunity to blog about, and learn new things about gdb/mdb, dtrace… cool! Let’s start soft: gdb.
Something simple like gdb –core, so i could see what program did generated it:

Core was generated by `/usr/local/bin/myprogram'.

Hmmm, that is a ten lines program… signal 8? Arithmetic exception… betther not blog about this. ;-)
Let’s recompile myprogram with options to make life of newbies like me easier…

gcc -g -lm -lumem -o myprogram myprogram.c

Ok, now let’s see it again (gdb /usr/local/bin/myprogram –core=newcore):

GNU gdb 6.3.50_2004-11-23-cvs
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-pc-solaris2.11"...
Core was generated by `/usr/local/bin/myprogram'.
Program terminated with signal 8, Arithmetic exception.
Reading symbols from /lib/
Loaded symbols for /lib/
Reading symbols from /lib/
Loaded symbols for /lib/
Reading symbols from /lib/
Loaded symbols for /lib/
#0  0x08051854 in main (argc=1, argv=0x8047e8c) at myprogram.c:144
144            variance = somae / (*ptw - 1);

The more i look, the more i’m disappointed with myself (vi myprogram.c)…

(gdb) print variance
$6 = 4278169904
(gdb) print somae
$7 = 0
(gdb) print *ptw
$8 = 1

You did not initialize the variance “Neo”… so it’s pointing to Zion or maybe to the Matrix itself…
Ok, the “Chosen One” is making a division by zero (*ptw – 1 = 0). We have a test to see if *ptw is greater than zero, and actually it does, but the code is for calculate average and standard deviation, Morpheus, and for that we need at least two values (that’s enough to realize we need another savior, or we are lost).
The *ptw is a pointer to total writes and we are having a dataset that in a 180 seconds period has just one write (something interesting at least, a really idle “log” share, topic for another post).
So, was a matter of change the code from if (*ptw > 0 ) to if (*ptw > 1 ), and add a else condition to assign the average latency to the actual write value (and the standard deviation to zero). Actually do the same for *ptr too (reads)…
Ok, stop complaining and Do the right thing… cypher…