Some digging around later we found the fix. Kernel 2.6 has a new parameter “min_free_kbytes” which allows it to reserve a dedicated amount of memory for itself to process jobs. This kept the kernel from choking up when the servers were faced with sudden spikes in heavy jobs.
I set my server’s “min_free_kbytes” to “4096″ kbytes which was the recommended value. It’s more of a trial and error configuration so I’ll have to monitor the server over a period of time and increase the value if needed till I hit the sweet spot.
How to set it?
To have the new value take effect immediately, edit the “/proc/sys/vm/min_free_kbytes” file. Remember!, reboot and the changes will be forgotten.
echo "4096" > /proc/sys/vm/min_free_kbytes
To have it permanent, add “vm.min_free_kbytes=65536″ to the /etc/sysctl.conf file.
echo "vm.min_free_kbytes=4096" >> /etc/sysctl.conf