In this blog series, we will be diving into how finding blind spots in your data and security can help you with your: time, coverage, scheduling and failure detection. We previously looked at how we can detect anomalies in time, coverage and scheduling that can impact our ability to deliver consistent monitoring results. Now let’s wrap this series up by discussing detecting failures.
When should you build a system for failure detection?
- Normally when you finish a project with identified outcomes
- In the case of monitoring blind spots, once those monitors are in place, you build detection to find out when those monitoring things aren’t working
How often do we check for failure detection?
- It depends on the severity of the detection or the schedule of the underlying detection
- Contingent on the bottom line - i.e. how it will impact the business and operations
Where do you check for failure detection?
- You do it within Splunk as that is preferred
- The best way to do it is to create a dashboard and/or alerts
- For example, include all of the searches used to monitor the failures or have dashboard searches saved as reports (so you can easily grab their status)
Read more below as I showcase how you can create your own "failure detection" dashboard.
how to create your failure detection dashboards
- I borrowed a search from the Monitoring Console: Runtime Statistics panel to create my dashboard.
- The primary difference is that I created a lookup (saved_searches.csv) with all my search names related to my detections. This way, I have one dashboard to look at only my items. I could also use the same search to create an alert on any field of concern.
- NOTE: The CSV contains one column named “savedsearch_name” that ends up adding “savedsearch_name = <my saved search name>” for every saved search. This filters out all the other searches I am not interested in.
reference #1
reference #2
reference #3
- Don’t forget to include the new saved search in your lookup.
- Modify the search to create an alert:
reference #4
reference #5
-
You may want to add a trigger action, such as an email to notify someone to investigate.
YOUR RETURN ON INVESTMENT
To recap, failure detection is important from security, operations, management, and compliance perspectives because it offers the following:
- Piece of mind knowing expected outcomes can be delivered
- Confidence in systems
- Improved adoption of products and services through delivering consistent results
We will be providing more tips and expertise in future blogs!