| Cheshire Cat Computing http://steveshipway.org/forum/ |
|
| only heartbeats getting back to Nagios server http://steveshipway.org/forum/viewtopic.php?f=22&t=1637 |
Page 1 of 1 |
| Author: | jsbsmd [ Thu Sep 18, 2008 2:49 am ] |
| Post subject: | only heartbeats getting back to Nagios server |
Hoping someone can help me. I have nagios 3.03 running with nsca 2.7.2 running on ubuntu 7.1 I have install navevlog 1.8.1 on a vista machine and a windows 2000 server. Test connections work fine. the problem is that I am only seeing the heartbeat command forwarding to my nagios server in the nagios.cmd file. eg: [1221658966] PROCESS_SERVICE_CHECK_RESULT;jsb-pc2;EventLog Agent;1;HEARTBEAT [WARN #1]: Service starting I cannot get any warnings or errors, even when I "generate test event" Here is what i see in my eventlog: Sent NSCA notification to 192.168.0.191:5667 Service [EventLog Agent], status 1 Message: HEARTBEAT [WARN #1]: Service starting Filter matching ( ) this is what i see when i generate a test event "critical" The description for Event ID 1 from source NagiosEventLog Test cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer. If the event originated on another computer, the display information had to be saved with the event. The following information was included with the event: Test message the message resource is present but the message is not found in the string/message table |
|
| Author: | stevesh [ Thu Sep 18, 2008 11:17 am ] |
| Post subject: | Re: only heartbeats getting back to Nagios server |
Since you're seeing entries in the NSCA log on the Nagios server, we know that NSCA is working and the agent is configured with the correct NSCA settings. The 'generate test message' generates a test message in the local eventlog, which may (or may not, depending on your filters) cause a NSCA message to be sent. My guess is that your Filter definitions are not being matched and so the agent does not send a NSCA alert to Nagios. The heartbeats still come through as they do not depend on the filters. Check your filter definitions. You can enable the 'debug mode' which lets you know what filter tests are being performed (it generates a LOT of extra application eventlog messages though). Create a very general filter (eg, all eventlogs, all statuses, match anything) and see if NSCA alerts are sent. Make sure your test message you generate matches one of your defined filters. Remember that some of the filter lines are regexps, some lists, and some just text. Don't use the wrong format else it will match nothing. If you're still stuck, post a dump of the relevant part of your registry (use regedit and look under local machine/software/Cheshire Cat/Nagios) and let me see your filter definitions. Steve |
|
| Author: | jsbsmd [ Fri Sep 19, 2008 1:04 am ] | |||
| Post subject: | Re: only heartbeats getting back to Nagios server | |||
ok. i'm getting abit father ahead. i created a catchall filter and now see the event in nagios.cmd. eg: from eventvwr Sent NSCA notification to 192.168.0.191:5667 Service [Application EventLog], status 2 Message: Application [error] [NagiosEventLog Test #1]: Test message Filter matching (Application Log) nagios.cmd [1221738722] PROCESS_SERVICE_CHECK_RESULT;jsb-pc2;Application EventLog;2;Application [error] [NagiosEventLog Test #0]: Test message however on my nagios screen, see attached .jpg, i only see no new messages in 30minutes here is my service definitions in nagios. define service{ service_description EventLog active_checks_enabled 0 passive_checks_enabled 1 flap_detection_enabled 0 register 0 is_volatile 0 check_period 24x7 max_check_attempts 1 normal_check_interval 5 retry_check_interval 1 check_freshness 1 freshness_threshold 1800 check_command check_dummy!0!No messages in last 30mins # contact_groups YOUR_CONTACT_GROUP notification_interval 120 notification_period 24x7 notification_options w,u,c,r stalking_options w,c,u name EventLog register 0 } define service{ use EventLog service_description Application EventLog host_name jsb-pc2 } define service{ use EventLog service_description System EventLog host_name jsb-pc2 } define service{ use EventLog service_description Security EventLog host_name jsb-pc2 }
|
||||
| Author: | stevesh [ Fri Sep 19, 2008 10:47 am ] |
| Post subject: | Re: only heartbeats getting back to Nagios server |
This is all looking good so far. Are you sure that your nagios.cmd is being processed? Check your nagios.cfg file and make sure you have external commands enabled with a reasonable frequency. The entry is definitely going in there correctly, and the hostname and service description are correct, so maybe the nagios.cmd is not being processed? |
|
| Author: | jsbsmd [ Sat Sep 20, 2008 3:34 am ] |
| Post subject: | Re: only heartbeats getting back to Nagios server |
ok.. now i'm getting frustrated. messages coming into syslog and nagios.cmd ie: syslog Sep 19 10:13:05 ubuntu710s nsca[20559]: Handling the connection... Sep 19 10:13:06 ubuntu710s nsca[20559]: SERVICE CHECK -> Host Name: 'jsb-pc2', Service Description: 'EventLog Agent', Return Code: '2', Output: 'HEARTBEAT [CRIT #2]: Service halting' Sep 19 10:13:06 ubuntu710s nsca[20559]: End of connection... Sep 19 10:13:11 ubuntu710s nsca[20560]: Handling the connection... Sep 19 10:13:12 ubuntu710s nsca[20560]: SERVICE CHECK -> Host Name: 'jsb-pc2', Service Description: 'EventLog Agent', Return Code: '1', Output: 'HEARTBEAT [WARN #1]: Service starting' Sep 19 10:13:12 ubuntu710s nsca[20560]: End of connection... Sep 19 10:13:49 ubuntu710s nsca[20579]: Handling the connection... Sep 19 10:13:49 ubuntu710s nsca[20579]: SERVICE CHECK -> Host Name: 'jsb-pc2', Service Description: 'Application EventLog', Return Code: '2', Output: 'Application [error] [NagiosEventLog Test #0]: Test message ' ie nagios.cmd [1221833629] PROCESS_SERVICE_CHECK_RESULT;jsb-pc2;Application EventLog;2;Application [error] [NagiosEventLog Test #0]: Test message nagios.cmd is getting executed. re: stopped apache server and it came thru nagios.cmd and changed status of screen. It seems to be a filter/condition check on the nagios server side is the problem. questions: 1. how does the check_dummy!0! command work if alerts are coming in a 1=warnings or 2=criticals eg:define service{ service_description EventLog active_checks_enabled 0 passive_checks_enabled 1 flap_detection_enabled 0 register 0 is_volatile 0 check_period 24x7 max_check_attempts 1 normal_check_interval 5 retry_check_interval 1 check_freshness 1 freshness_threshold 1800 check_command check_dummy!0!No messages in last 30mins contact_groups YOUR_CONTACT_GROUP notification_interval 120 notification_period 24x7 notification_options w,u,c,r stalking_options w,c,u name EventLog register 0 } define service{ use EventLog service_description Application EventLog host_name jsb-pc2 } define service{ use EventLog service_description System EventLog host_name jsb-pc2 } define service{ use EventLog service_description Security EventLog host_name jsb-pc2 } 2. It seems as if the time the message gets into nagios.cmd and the time the status updates on the screen is quite long. is there any way to may the status update on trhe screen faster? 3. the ony message i see is the No messages in the last 30min. how will the alert change this message? nagios.cmd 4. I have the defined service " Eventlog" in my templates.cfg file and the define service{ use EventLog service_description Application EventLog host_name jsb-pc2 } define service{ use EventLog service_description System EventLog host_name jsb-pc2 } define service{ use EventLog service_description Security EventLog host_name jsb-pc2 } in my windows.cfg file. Do they need to be in the same file, or does it really matter? # EXTERNAL COMMAND OPTION # This option allows you to specify whether or not Nagios should check # for external commands (in the command file defined below). By default # Nagios will *not* check for external commands, just to be on the # cautious side. If you want to be able to use the CGI command interface # you will have to enable this. # Values: 0 = disable commands, 1 = enable commands check_external_commands=1 # EXTERNAL COMMAND CHECK INTERVAL # This is the interval at which Nagios should check for external commands. # This value works of the interval_length you specify later. If you leave # that at its default value of 60 (seconds), a value of 1 here will cause # Nagios to check for external commands every minute. If you specify a # number followed by an "s" (i.e. 15s), this will be interpreted to mean # actual seconds rather than a multiple of the interval_length variable. # Note: In addition to reading the external command file at regularly # scheduled intervals, Nagios will also check for external commands after # event handlers are executed. # NOTE: Setting this value to -1 causes Nagios to check the external # command file as often as possible. command_check_interval=15s #command_check_interval=-1 # EXTERNAL COMMAND FILE # This is the file that Nagios checks for external command requests. # It is also where the command CGI will write commands that are submitted # by users, so it must be writeable by the user that the web server # is running as (usually 'nobody'). Permissions should be set at the # directory level instead of on the file, as the file is deleted every # time its contents are processed. command_file=/usr/local/nagios/var/rw/nagios.cmd |
|
| Author: | jsbsmd [ Sat Sep 20, 2008 10:27 am ] |
| Post subject: | Re: only heartbeats getting back to Nagios server |
okay.. not sure what happened but i kept thinking about nagios not processing the nagios.cmd file. The alerts showed about in the file and in the syslog, but never went through. i deleted the nagios.cmd file, recreated it, and now things work fine. As soon as the message gets into the cmd file it is processed by nagios. The byte size returns to 0 bytes by the time i look at it, but i know it gets thru because i see it in the nagios.log file. I'm happy about the solution, but stumped as to why the nagios.cmd file was the cause of the problem. Could it be a permissions issue? Is it suppossed to be processed immediately and return to a zero byte size or contain the last alert until a new one gets writtent into it. who knows.. but it works. Thanks for all the help. i'm very new at linux and nagios but am an expert in monitoring using openview,mom,netiq,tng, etc... It pains me to see the prices of these agents out there. Where could i write some suggestions to enhance the product? |
|
| Author: | stevesh [ Mon Sep 22, 2008 11:51 am ] |
| Post subject: | Re: only heartbeats getting back to Nagios server |
The nagios.cmd file is not a real file, it is a pipe device. It will appear to have size zero - if it doesnt, then it is a real file, and something has gone wrong. Delete it and restart nagios. 1. The check_command is run by the freshness checks (not by active checks) and can be used to reset the status to 0 (OK) if no messages have been received for a certain period of time, 2. This depends on your nagios.cfg polling frequency for the nagios.cmd, plus the refresh frequency of your web interface (in the cgi.cfg) 3. This is the message set by check_dummy (see 1 above) when the freshness timeout is completed. Chenge the command definition, or even disable freshness checks altogether. 4. Doesnt matter. |
|
| Page 1 of 1 | All times are UTC + 12 hours [ DST ] |
| Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |
|