Nov 4, 2012

Zabbix poller processes more than 75% busy and queue delay (III)

Let's complete the last article about Zabbix poller processes more than 75% busy and queue delay. In this section, I am going to tackle the part of the client, that is, those things which can be modified on the agent so as to remove or attenuate the issues mentioned in the first article.

Remember that this is the continuation of the two previous articles:


First up, I changed the number of pre-forked instances of the Zabbix client which process passive checks (StartAgents) to 64. This parameter is really meaningful, because its default value is 5, that is to say, only five processes will be started in order to obtain the data requested by the server. So if you have a lot of items and a small monitoring period (as my case), you will need more processes to be able to attend all requests.

root@zabbix-client:~# cat /etc/zabbix/zabbix_agentd.conf
...
StartAgents=64

So let's see now in the graphs, how this change impacts on the results. Let's first with the Zabbix server performance.




And then, the Zabbix data gathering process.




As you can see on the first picture, the server has gone from a Zabbix queue of 30 to 0 (although you can observe 5 on the figure, think that the graph has been cut out). And on the second one, the Zabbix busy poller processes went from 24% to 0%.

Other parameters that you can play with are the number of seconds that the data can be stored in the buffer and its maximum number of values.

root@zabbix-client:~# cat /etc/zabbix/zabbix_agentd.conf
...
BufferSend=3600

BufferSize=65535

Also keep in mind that you should have a small value for the timeout (I am using five seconds on my installation).

Lastly, in order to solve the problem that I mentioned in the first article about from time to time, the processes break down and the zabbix agent is stopped, I developed a simple bash script to work around this issue.

root@zabbix-client:~# tail -f /var/log/zabbix/zabbix_agentd.log
...
zabbix_agentd [17271]: [file:'cpustat.c',line:155] lock failed: [22] Invalid argument
 17270:20121015:092010.216 One child process died (PID:17271,exitcode/signal:255). Exiting ...
...
 17270:20121015:092012.216 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485).


root@zabbix-client:~# cat /etc/zabbix/monitor_zabbix.sh
#!/bin/bash

while [ 1 ];
do
        if ! pgrep -f "/usr/local/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf" &> /dev/null ; then
                /etc/zabbix/zabbix.sh start
        fi
        sleep 15
done

This script is run in batch mode and takes care of monitoring the status of the agent processes and starting over when they drop . It uses another bash script to start and stop the agents.

root@zabbix-client:~# cat /etc/zabbix/zabbix.sh
#!/bin/bash

case $1 in
        "start")
                taskset -c $(($(cat /proc/cpuinfo | grep processor | wc -l) - 1)) /usr/local/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf;;
        "stop")
                pkill -f "/usr/local/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf";;
        *)
                printf "./zabbix.sh start|stop\n\n"
esac


No comments:

Post a Comment