Too many processes on zabbix server как исправить

I am monitoring certain FreeIPA servers that are normally forking <= 460 process (roughly). This generates the “Too many processes on {HOST.NAME}” as part of the “Template OS Linux” template.

The Expression that does the check is looking for processes greater than 300:

{Template OS Linux:proc.num[].avg(5m)}>300

What is the best way to over ride the value in another template I created particularly for these class of servers that is also inheriting “Template OS Linux” and “Template IPA Servers?”

Richlv's user avatar

Richlv

3,9141 gold badge17 silver badges21 bronze badges

asked Jul 6, 2016 at 21:21

farhany's user avatar

Probably user macros. You would add a user macro in the original template:

{$TRIGGER_THRESHOLD_PROCESSES_RUNNING}

And you would modify the trigger expression like this :

{Template OS Linux:proc.num[].avg(5m)}>{$TRIGGER_THRESHOLD_PROCESSES_RUNNING}

Then you could define a user macro with the same name on the lower level template – or even individual hosts – with a different value.

The user macro name is up to you, as long as it follows the syntax rules.

answered Jul 6, 2016 at 21:47

Richlv's user avatar

RichlvRichlv

3,9141 gold badge17 silver badges21 bronze badges

If you consistently have more than 300 processes on all the systems that you monitor, you can also just edit the original template. As an example, 450 is a better value for my specific use case since I always have at least 160+ processes enclosed in brackets that don’t reflect what I’m actually trying to monitor – user space. You may want to use a number more appropriate to your environment than my example.

I changed this value by going to the original template, which you can find from Configuration –> Templates –> Template OS Linux –> Triggers –> Too many processes on {HOST.NAME}, and then changing the value from 300 to a more reasonable value for your specific need.

One thing to note is that not all servers in an environment are the same, and if your environment has multiple different types of servers, the user macros solution is going to be far superior.

answered Sep 12, 2017 at 20:37

Vladinatrix's user avatar

zabbix сообщил об ошибке: Слишком много процессов в городе Синин

Сообщение об ошибке:Too many processes on Xining City(ник сервера)

wKioL1f_OnmhkElaAAA7QG9iVWQ654.jpg-wh_50

Анализ причины ошибки:

Серверный процесс больше, чем300, Значение по умолчанию300, Некоторые серверные процессы могут легко превысить300, Здесь мы можем настроитьToo many processes on {HOST.NAME} Измененное значение3000, Исходное значение300

Решение: измените значение Слишком много процессов на {HOST.NAME}

1, Сбросить значение триггераConfigurationèHostèTemplate OS LinuxTriggersè Too many processes on {HOST.NAME}

wKioL1f_Ooby1dkfAAB5R3Hi5Js105.jpg-wh_50

2Too many processes on {HOST.NAME} Измененное значение3000, Исходное значение300

wKiom1f_OpCi2Mn3AABIxAvb9e0274.jpg-wh_50

3, Подождите, пока обновится, появится обычная подсказка

wKioL1f_OpvxBEc3AABNjUIRvms607.jpg-wh_50

Эта статья воспроизведена из блога 51CTO, исходная ссылка: http://blog.51cto.com/solin/1861531, если вам нужно перепечатать, пожалуйста, свяжитесь с исходным автором самостоятельно

Newly deployed Zabbix server, on the Mointoring page after the launch, there is “Too many processes on Zabbix server” alarm. There’s a lot of information on the Internet that can’t be solved. Can only be handled according to experience

1 PS-AXF Review the process and see what process is more.

16838? S 0:00 _/usr/local/sbin/zabbix_server:configuration Syncer [Waiting sec for processes]

16839? S 0:00 _/usr/local/sbin/zabbix_server:db watchdog [Synced Alerts config in 0.001120 sec, idle sec]

16840? S 0:00 _/usr/local/sbin/zabbix_server:poller #1 [got 0 values in 0.000004 sec, idle 1 sec]

16842? S 0:00 _/usr/local/sbin/zabbix_server:poller #2 [got 0 values in 0.000004 sec, idle 1 sec]

16843? S 0:00 _/usr/local/sbin/zabbix_server:poller #3 [got 0 values in 0.000004 sec, idle 1 sec]

16844? S 0:00 _/usr/local/sbin/zabbix_server:poller #4 [got 0 values in 0.000004 sec, idle 1 sec]

16845? S 0:00 _/usr/local/sbin/zabbix_server:poller #5 [got 3 values in 0.001429 sec, idle 1 sec]

16846? S 0:00 _/usr/local/sbin/zabbix_server:poller #6 [got 0 values in 0.000004 sec, idle 1 sec]

16847? S 0:00 _/usr/local/sbin/zabbix_server:poller #7 [got 0 values in 0.000004 sec, idle 1 sec]

16848? S 0:00 _/usr/local/sbin/zabbix_server:poller #8 [got 0 values in 0.000003 sec, idle 1 sec]

16849? S 0:00 _/usr/local/sbin/zabbix_server:poller #9 [got 0 values in 0.000004 sec, idle 1 sec]

16850? S 0:00 _/usr/local/sbin/zabbix_server:poller #10 [got 0 values in 0.000003 sec, idle 1 sec]

16851? S 0:00 _/usr/local/sbin/zabbix_server:unreachable poller #1 [got 0 values in 0.000027 sec, idle 5 sec]

16852? S 0:00 _/usr/local/sbin/zabbix_server:trapper #1 [processed data in 0.000000 sec, waiting for connection]

16853? S 0:00 _/usr/local/sbin/zabbix_server:trapper #2 [processed data in 0.000000 sec, waiting for connection]

16854? S 0:00 _/usr/local/sbin/zabbix_server:trapper #3 [processed data in 0.100752 sec, waiting for connection]

16855? S 0:00 _/usr/local/sbin/zabbix_server:trapper #4 [processed data in 0.000000 sec, waiting for connection]

16856? S 0:00 _/usr/local/sbin/zabbix_server:trapper #5 [processed data in 0.000000 sec, waiting for connection]

16857? S 0:00 _/usr/local/sbin/zabbix_server:icmp pinger #1 [got 0 values in 0.000004 sec, idle 5 sec]

16858? S 0:00 _/usr/local/sbin/zabbix_server:icmp pinger #2 [got 0 values in 0.000004 sec, idle 5 sec]

16859? S 0:00 _/usr/local/sbin/zabbix_server:icmp pinger #3 [got 0 values in 0.000004 sec, idle 5 sec]

16860? S 0:00 _/usr/local/sbin/zabbix_server:icmp pinger #4 [got 0 values in 0.000005 sec, idle 5 sec]

16861? S 0:00 _/usr/local/sbin/zabbix_server:icmp pinger #5 [got 0 values in 0.000003 sec, idle 5 sec]

16862? S 0:00 _/usr/local/sbin/zabbix_server:alerter [sent alerts:0 success, 0 fail in 0.000436 sec, idle [sec]

16863? S 0:00 _/usr/local/sbin/zabbix_server:housekeeper [deleted 2757 hist/trends, 0 items, 0 events, 0 sessions, 0 ala RMS, 0 Audit it

16864? S 0:00 _/usr/local/sbin/zabbix_server:timer #1 [processed 0 triggers, 0 events in 0.000000 sec, 0 maint.periods I N 0.000000 sec,

16865? S 0:00 _/usr/local/sbin/zabbix_server:http poller #1 [got 0 values in 0.001521 sec, idle 5 sec]

16866? S 0:00 _/usr/local/sbin/zabbix_server:discoverer #1 [processed 0 rules in 0.000441 sec, idle sec]

16867? S 0:00 _/usr/local/sbin/zabbix_server:discoverer #2 [processed 0 rules in 0.000576 sec, idle sec]

16868? S 0:00 _/usr/local/sbin/zabbix_server:discoverer #3 [processed 0 rules in 0.000486 sec, idle sec]

16869? S 0:00 _/usr/local/sbin/zabbix_server:discoverer #4 [processed 0 rules in 0.000771 sec, idle sec]

16870? S 0:00 _/usr/local/sbin/zabbix_server:discoverer #5 [processed 0 rules in 0.000450 sec, idle sec]

Observing my own server, I found that the most process is Zabbix own service.

2 According to previous experience, the number of processes started by the service should be configurable in the configuration file. Then look at the Zabbix configuration file, and sure enough to find similar configuration parameters

#Advanced parameters
# # Option: startpollers
# Number of pre-forked instances of pollers.
## Mandatory: no
# range: 0-1000
# Default:
#Description; When initializing, the number of child processes started, the more the number, the more throughput capacity of the service side, the greater the consumption of system resources

# startpollers=5

# # Option:startdiscoverers

# Number of pre-forked instances of discoverers.

#

# Mandatory:no

# range:0-250

# Default:

Note: To set the number of threads for the Autodiscover host, consider increasing this value if the single agent manages more than 500 machines (only for direct agent scenarios)

# Startdiscoverers=1

Similar to some of the parameters, the specific situation depends on the results of PS to control parameter processing. My parameters do not know what causes, are particularly high, should actually be adjusted as needed.

After adjusting the parameters of the corresponding process, restarting the Zabbix service can solve the problem.

This article is from the “bit accumulation” blog, please be sure to keep this source http://16769017.blog.51cto.com/700711/1761002

Too many processes on Zabbix server workaround

Добрый день.

Многие, кто ставит zabbix сервер с настройками по умолчанию, часто встречаются с такой проблемой, как сообщение в админ панели:
Zabbix poller processes more than 75% busy

Немного теории:
poller – процесс, который производит опрос агентов. Нужен в большом количестве, если выполняется мониторинг большой сети и еще в каких-то случаях:-)

Нам важно следующее:
Если у вас хорошее железо, то можно немного расщедриться и выделить ему комнату вместо подстилки у двери)

Как починить

Открыть конфигурационный файл zabbix_server.conf:

/etc/zabbix/zabbix_server.conf

Найти параметр StartPollers и установить ему значение больше дефолтного, например 15 или 20, и не забудьте удалить комментарий # перед параметром.

Из-за увеличения процессов идет большая нагрузка на сервер, будьте аккуратны, чтобы все это хозяйство не оставило Вас без драгоценных камней ресурсов.

Это также относится и к ошибке «Zabbix unreachable poller processes more than 75% busy», для этого необходимо увеличить параметр StartPollersUnreachable – значения нужно подбирать под сервер и ваши нужды.

Совет от Кэпа

После увеличения предыдущих параметров, было бы здорово увеличить размер CacheSize – размер памяти для хранения узлов и различного рода элементов данных. Увеличиваем CacheSize, ставим например 256M или даже 512М, если не жалко, то можно и 1024M.

Заключение

Если вдруг видите в Dashboard в каких-либо блоках сообщение о проблеме с подключением к базе данных (БД), то необходимо увеличить лимит подключений, так как может быть перестанет хватать коннектов к БД.

Желаю успехов)

  • #1

Hi,

I monitor our FreeBSD production server using zabbix-agent and I keep getting this email:

Code:

Subject: PROBLEM: Too many processes on FreeBSD.mydomain.co.uk
Trigger: Too many processes on FreeBSD.mydomain.co.uk
Trigger status: PROBLEM
Trigger severity: Warning
Trigger URL:

Item values:

1. Number of processes (FreeBSD.mydomain.co.uk:proc.num[]): 327
2. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*
3. *UNKNOWN* (*UNKNOWN*:*UNKNOWN*): *UNKNOWN*

Original event ID: 234

I am not sure what this mean.. what is classify as too many request?
ps r|wc

I have 14 jails on that host:
1 Mail
1 Database
12 webjail

Could anyone please help me as to how to treat such email notification?

Thank you

SirDice


  • #2

The default limit is rather low, so if you have 14 jails running there will be a lot more processes on the host itself. You’ll need to increase the limit on the host configuration in Zabbix.

  • Thread Starter

  • #3

You’ll need to increase the limit on the host configuration in Zabbix

Hi SirDice ,
What do you use to set the limit? Do you just pick a number or is there a methodology to follow?

  • #4

I tend to look at trends when it is running “as expected” at set the trigger significantly above. This trigger is really looking for “out of the norm “activities, like a stript gone awry forking. Or disable this trigger, and just rely on either the running processes or load level triggers.

  • Thread Starter

  • #5

I tend to look at trends when it is running “as expected” at set the trigger significantly above

Would you check with ps r|wc or another command?

SirDice


  • #6

In Zabbix, go to “Latest data”. Lookup the process count for the host. There’s a “Graph” button on the right hand side. If you click on it you’ll see a graph of the number of processes. Change the time-period to a week or more and look at the graph itself. Look at the shape (straight line, more or less random, or something else). Then try to imagine what the graph will do in the future. Base your numbers on that.

Alert thresholds aren’t dynamic. So you’ll need to adjust the thresholds according to your usage patterns. For example, a typical firewall doesn’t have a lot of processes running and a host with 14 jails on it will have quite a lot. So for each host you will need to set the right numbers, the firewall needs a lower threshold than the host with the 14 jails.

Добавить комментарий