fbpx
Netreo is now BMC. Read theBlog

How To Troubleshoot False Alerts in Netreo

By: Brian Olsen
January 17, 2024

Regardless of the attention given to configuring monitoring solutions, the dynamic nature of today’s modern infrastructures can impact alert functionality. Optimizing network performance in complex, hybrid infrastructures leveraging SD-WANs, real-time provisioning and other advanced features is really tough. So what should IT teams do when receiving false alerts or notifications that appear inaccurate?

Like so many tasks in network monitoring, utilizing a tried and true process is always the best place to start. Our first How To post of 2024 takes you through the same process used by our crack Customer Support team for Troubleshooting False Alerts in Netreo.

The Process

While step one is always identifying the source of the problem, doing so is easier said than done. And like pretty much every troubleshooting task, testing connectivity to the device in question is paramount.

A number of steps require a sample device, so have a sample device with a known IP address handy for reference. Knowing how the device was added to Netreo is very helpful, since similar devices are typically added the same way. You’ll also need to know whether the device is associated with a Service Engine. So before you begin …

  • Identify a sample device & IP address
  • Know how the device was added to Netreo (Scanning, CMDB, Manual, Import, API, Virtual  Environment, Cloud Resource)
  • Identify the specific Service Engine associated with the device

Testing Connectivity

To make sure that the device is reachable by Netreo, test connectivity to the device from your Netreo instance. To test connectivity, you use the Credential and Connectivity Test tool. Navigate to Administration -> Tools -> Credential and Connectivity Test.

  1. Select ICMP (Ping) and enter the IP address of the device
  2. If the device requires a Service Engine to be reached, select the appropriate SE
  3. Click Test and verify the results of the Ping request
    1. Focus on the “1 host up” part of the output. If this is present, the device is reachable
    2. If the result shows “0 host up” then the device is not reachable / there is no connectivity to the device from Netreo or the configured Netreo Service Engine
  4. Furthermore, you can use this same connectivity test to confirm if specific ports are open, filtered or closed on devices
    1. Ports to check
      1. SNMP – UDP/161
      2. Windows – TCP/5985
    2. If the result includes filtered or closed in the message, then follow up with the Netreo Customer Support to ensure these ports available

Testing Credentials

By testing credentials, users are able to identify whether the credentials applied to the device in question are functioning properly and determine if access methods are blocked in some way. Troubleshooting credentials is always recommended toward the start of troubleshooting, because it’s relatively easy.

Validating device credentials

Navigate to a device’s Overview page where and click the gear symbol to go to the Admin page. Scrolling to the bottom of the Admin page, you will see the credential fields.

If a field has a lock symbol, then it does have credentials applied from a template in that field. To confirm which template was used, simply hover your cursor over the field and the template will be displayed. It’s a good idea to hover over each field to double check that all are correct.

Fields may also have credentials filled in that are not locked from a device template. Two things to keep in mind at this step:

  1. Different devices will have different data in their credential fields
  2. Having an asterisk in the password field doesn’t mean that the field is populated

Testing successful authentication of a Device to templates or arbitrary credentials

To authenticate the device in question, go to Administration -> Tools -> Arbitrary Credentials Test. From here, try and make a connection to SNMP/SSH/Windows Powershell to an endpoint. The Arbitrary Credentials Tool gives a more specific response than the Test SNMP or Test WMI option. SNMP and WMI devices will respond with the Name of the device if it was successful. SSH devices will respond by attempting to run an echo “hello world” command.

In the tool, provide the IP address of the sample device and fill in the credentials manually or select a template to use. You can also use a specific Service Engine if there is one that is expected to have access.

If you get an authentication failure, you’ll need to test the local password.

Confirm Device Template Assignments

Your next step is to confirm the device in question is using templates and what templates are assigned to the device. Netreo uses cascading templates and administration can be tricky. Be sure to check out our Knowledge Base article on Device Template Administration for additional insights if you have any questions.

For troubleshooting, go to the Device Administration Page:

  1. Click the Advanced Options to see the Template section
  1. Make sure the template usage is enabled
  2. Verify that the correct Template is assigned
    1. If not, issues can be corrected by the following corrective actions
      1. Add the device to the correct Group Object (Device type, Sub type, Site, Category, Functional group)
      2. Add the Template to the correct Group Object (Device type, Sub type, Site, Category, Functional group)

Once corrected, perform a repoll of the device to make sure the change is effective.

Troubleshooting Service Engines

When troubleshooting Service Engines, identifying what Remote Poller a Device is set to is important. Navigate to the Device in question and click the gear icon to go to the Device Admin page. Click “Show Advanced Settings” and then look at the “Remote Poller” dropdown. This is where Remote Poller settings are changed and where you can confirm that the Remote Poller has proper network connectivity.

Check what Service Engine is associated with the Device

Identifying Devices associated with a particular Service Engine is very important. You can identify and confirm associations from the Service Engine Administration page by going to Administration -> System -> Service Engines. Click the “Netreo Remote Poller” or “Netreo Remote Collector” icon to see which devices are being polled by that Service Engine.

For Syslog, SNMP Traps and Netflow, another type of Service Engine role is used to show the relationships. These monitoring methods are “Pushed” by the Devices sending the information, rather than “Polled” by the monitoring system. Customers define which Service Engine receives this information by adjusting the configuration of the Device to point to the IP address of related Service Engines. You can see which Devices are recognized as having a Collector by clicking the “Netreo Logging Collector” or “Netreo Traffic Collector” link next to the Service Engine in question.

General Service Engine checks

Go to the Service Engine Administration page and navigate to Administration -> System -> Service Engines. If any of the Service Engines are showing a red alarm state, open and navigate to the Services tab and check if the “Service Engine Status Check” Service Check is in a Critical state. Some messages are Host alarms and will only show at the bottom of the Services tab in the Services State History section. Make note of any recent critical alarms, in case help is required from Netreo Customer Support. Messages to look for include the following:

  • Processes not found [‘nf_worker’, ‘nf_listen’, ‘nf_result’, ‘nf_cache’]
  • No updates received in the last 10 minutes

Incident Creation

Once you confirm configurations are applied correctly and Service Engines are working properly, confirm an Incident was created and that the notifications went out. In Netreo, for a notification to go out, an Incident must be created for the Alarms. Incidents are created only when enough soft critical alarms occur that create a hard critical alarm. This is configurable, but the default is typically an incident will get created after 3 soft alarms.

When an Alarm is triggered in Netreo and an Action Group is associated with the object that is alarming (see How To: Troubleshooting No Alerts Received for details on Action Groups), an Incident is created and the assigned Action invoked. If an Incident has been created, follow the steps below to see if notifications were sent for that Incident.

Troubleshooting Steps

If there was no notification received:

  1. Goto Quick Views -> Active Incidents
  1. Click SearchView
  2. Filter the results by adding information such as the device name in Title
    1. If you have an Incident ID, you can use that in the Incident ID field
  1. Once you have the Incident after searching, open the Incident and verify if Notifications were sent and what the lists are
  1. The Notification number is a link if greater than 0
    1. Clicking on that number shows you the history of the alerts sent and to whom

When you see notifications being sent for the Incident, validate that the email addresses being referenced are correct. If they are not correct, it could be an issue with templating and the Action Group being assigned to the managed object.

When you are getting too many notifications and seeing a large number of notifications in the Incident, document your findings and reach out to Netreo Customer Support.

If no notifications are sent for an incident, it may be that the Action Groups being applied are incorrect or not configured correctly. 

Finally, if you still haven’t identified the source of your false alert(s), reach out to Netreo Customer Support.

Conclusion

Netreo intelligent alert management is a cornerstone in effective incident management for many customers. By supporting anomaly detection, custom thresholds and other advanced features, Netreo ensures all alerts are truly meaningful, eliminate alert fatigue and fuel automated issue resolution.

Along with this post, our aforementioned August How To: Troubleshooting No Alerts Received are extremely useful for all Netreo customers with the DIY spirit. Leveraging helpful pointers and advanced capabilities, Netreo customers can eliminate alert fatigue and ensure every alert that comes through is meaningful. Of course, Customer Support is included with every Netreo deployment, so never hesitate to reach out for help with your specific environment.

If you’re not a current customer, see how the Netreo Platform delivers maximum value as your infrastructure management solution by Requesting a Demo Today!

Ready to get started?

Get in touch or schedule a demo

Get Started Learn More