In this blog series we will be diving into how finding blind spots in your daily and security can help you with your: time, coverage, scheduling and failure detection. In this blog, we will uncover how important it is to maintain coverage for the sake of the customer and stakeholders! Don't forget to read the first blog in the series on finding blind spots to gain insight on how to maintain time and read the second blog on coverage.
It's important to maintain a schedule regarding your Splunk security operations because it ensures key items are being executed as expected such as:
- Populating Searches
This is vital because if you are monitoring antivirus with a schedule in place - it will allow you to make sure that it runs successfully. Otherwise, the alert you expect may not fire because the search failed to run. The idea is that if you are monitoring events and you are only monitoring a partial amount of them you are missing aspects that can cause potential major vulnerabilities.
When should you check for scheduling?
- Normally when you create an alert, report or populate search
- When significant changes are introduced into Splunk
- Every week or every day is better
Where do you check for scheduling?
- Within the Splunk Monitoring Console Scheduler Activity
- The example below is part of the Splunk Monitoring Console Scheduler Activity Dashboard
There are 3 things to look out for in scheduling. Read more below to learn why they are important and how it make impact your security as well as your business operations!
- Skip Ratio
- Concurrency of Searches
- Average Execution Latency
3 items to look out for in scheduling
- It is the number of searches skipped because they could not be executed during their schedule window.
- What is the goal KPI? You want it to be as close to 0 as possible
- Why is this important? Do not ignore skipped searches. If you ignore them then you're missing alerts and data which will impact your bottom business line, security and more.
- Tip: Remember every search has a purpose and expectation. You can run this search to find some of the reasons that your searches are skipping:
- index=_internal sourcetype=scheduler savedsearch_name=* status=skipped | stats count BY reason
concurrency of searches
- It is the number of searches running at the same time.
- What to look for? Numbers more significant than the denominator (Limit) in the “Concurrency of Scheduled Reports (Running/Limit)” panel in the Monitoring Console Scheduler Activity dashboard. When you run searches, Splunk limits the number of concurrent searches to preserve the performance for each search. In Splunk Cloud Platform, this concurrent limit is configured for you. The desired outcome is to balance the search schedule or increase resources where Concurrency is below the Limit.
- Why is this important? Having a number higher than the Limit (69 in this example) indicates that too many searches are running at the same time. The “Maximum Concurrency of Scheduled Reports” panel shows when the Limit is exceeded. If something goes wrong - your search will be subjected to being skipped and you can use this to ensure you don’t schedule too many reports/searches at the same time
- Tip: Refer to Splunk docs on how the Limit is calculated and the impact on search performance
- Reference: The various causes of exceeding the Limit are outside the scope of this article but this presentation is a good place to start: Making the Most of the Splunk Scheduler--conf2017.key
AVERAGE EXECUTION LATENCY
- Latency is the difference between the scheduled time of a search and when it actually started.
- What to look for? Latency should be as close to 0 as possible (a lower number for higher priority searches).
- Why is this important? Latency causes delays in alerting and skipping searches
- Refer to Configure the priority of scheduled reports - Splunk Documentation for information on reducing latency.
YOUR RETURN ON INVESTMENT
To recap scheduling is important from a security POV and operational POV -
- From a security perspective:
- You could miss key security indicators because your search failed to run on schedule.
- Could give you a false sense of security because your alert is scheduled to run while in reality, it may never run.
- From an operational perspective:
- Could be missing errors/problems that could impact the business
- Reduce Splunk user adoption when expected alerts do not work
We will be providing more tips and experts in this blog series! Don't forget to read the first blog in the series on finding blind spots to gain insight on how to maintain time and read the second blog on coverage.