Remember that this is the continuation of the two previous articles:
- Zabbix poller processes more than 75% busy and queue delay (I)
- Zabbix poller processes more than 75% busy and queue delay (II)
First up, I changed the number of pre-forked instances of the Zabbix client which process passive checks (StartAgents) to 64. This parameter is really meaningful, because its default value is 5, that is to say, only five processes will be started in order to obtain the data requested by the server. So if you have a lot of items and a small monitoring period (as my case), you will need more processes to be able to attend all requests.
root@zabbix-client:~# cat /etc/zabbix/zabbix_agentd.conf
...
StartAgents=64
So let's see now in the graphs, how this change impacts on the results. Let's first with the Zabbix server performance.
And then, the Zabbix data gathering process.
As you can see on the first picture, the server has gone from a Zabbix queue of 30 to 0 (although you can observe 5 on the figure, think that the graph has been cut out). And on the second one, the Zabbix busy poller processes went from 24% to 0%.
Other parameters that you can play with are the number of seconds that the data can be stored in the buffer and its maximum number of values.
root@zabbix-client:~# cat /etc/zabbix/zabbix_agentd.conf
...
BufferSend=3600
BufferSize=65535
Also keep in mind that you should have a small value for the timeout (I am using five seconds on my installation).
Lastly, in order to solve the problem that I mentioned in the first article about from time to time, the processes break down and the zabbix agent is stopped, I developed a simple bash script to work around this issue.
root@zabbix-client:~# tail -f /var/log/zabbix/zabbix_agentd.log
...
zabbix_agentd [17271]: [file:'cpustat.c',line:155] lock failed: [22] Invalid argument
17270:20121015:092010.216 One child process died (PID:17271,exitcode/signal:255). Exiting ...
...
17270:20121015:092012.216 Zabbix Agent stopped. Zabbix 2.0.3 (revision 30485).
root@zabbix-client:~# cat /etc/zabbix/monitor_zabbix.sh
#!/bin/bash
while [ 1 ];
do
if ! pgrep -f "/usr/local/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf" &> /dev/null ; then
/etc/zabbix/zabbix.sh start
fi
sleep 15
done
This script is run in batch mode and takes care of monitoring the status of the agent processes and starting over when they drop . It uses another bash script to start and stop the agents.
root@zabbix-client:~# cat /etc/zabbix/zabbix.sh
#!/bin/bash
case $1 in
"start")
taskset -c $(($(cat /proc/cpuinfo | grep processor | wc -l) - 1)) /usr/local/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf;;
"stop")
pkill -f "/usr/local/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf";;
*)
printf "./zabbix.sh start|stop\n\n"
esac
After following every article we still have the zabbix busy poller processes around 85%
ReplyDeleteWhat can be the problem?