Jul 25, 2011

Tuning Zabbix to improve its performance (I)

I am really looking forward to this article. I think it is going to be really useful for Zabbix administrators.

When you have to control a small group of machines, it is enough to install Zabbix (either from the repositories or the source code) and not modify any parameter. But when the number of monitored machines or items is very large, it is necessary to fit some values related to the operating system, the database and the Zabbix itself. Otherwise it is possible that your system acts up or the performance is not expected.

Bellow you can see the status of my Zabbix server at work (Zabbix 1.8.5 with MySQL 5.1, on Ubuntu 11.04 - 64 bits). I am monitoring around 430 devices, between servers and switches, and you can distinguish that the requeried server performance (new values per second) is really huge: 1687.




This configuration would not be possible with a Zabbix base installation. Also point out the hardware features of the server: 4 vCPUs (2.66 GHz), 8 GB RAM and 254 GB of storage.

First of all, we are going to take a look at several graphics of the server. Let's get started with the memory consumption during a typical day. The figure shows that the average available memory is around 1.73 GB and the system is not swapping.




Regarding the CPU, I have chosen a period of 6 hours so as to explain the concept of Housekeeping in Zabbix. As you can make out in the next chart, the normal use of CPU is about 20-25%, but each hour, there is a strong increment. This situation coincides with a rise of the Input/Output operations.




The Housekeeping is a task run by Zabbix which takes care of removing the unnecessary data of the history, alerts and alarms tables. Taking a look at the zabbix log, you can find out how many records are deleted from the database.

root@zbx01:~# egrep 'housekeeper|Deleted' /var/log/zabbix/zabbix_server.log
1599:20110719:230307.692 Executing housekeeper
1599:20110719:231127.392 Deleted 1522478 records from history and trends
1599:20110720:001127.393 Executing housekeeper
1599:20110720:001927.742 Deleted 1480673 records from history and trends
...

This procedure is configured by means of different parameters into the zabbix_server.conf file.

Through the load average graph, we can also appreciate this issue, where the load average (1 min) reaches maximum increases of 1.30.




And finally, the following graphic represents the status of the Zabbix cache during a week. Its values are rightly suited too.




In the next article, I will teach how to set up correctly the parameters related to the Linux kernel, MySQL and Zabbix.


1 comment: