Author: Yan Zhai, Principal Security Scientist, Director of Data Science
A few weeks ago I was talking to a senior security operations director at a Fortune 100 company about deploying security analytics within a large corporate environment like his.
“Ahhh… not another sensor!” he complained. “Look, we have a lot of security tools already. Each of them are generating a lot of alerts, most of which are false alerts. The last thing I need in my job is to pour more false alerts into the workflows of my understaffed SOC team! What I want is something with 99 percent alert fidelity. If it throws an alert, it means something really, really bad has happened. Can you give me an analytics solution that delivers that?!”
My first thought was to lecture him about fidelity vs. sensitivity and the trade-off between noise and detection capabilities. Then I realized this would not be helpful at all. After all, nobody wants more noise when the bandwidth is already over-consumed. So instead, I smiled and said “Yup! I totally understand your situation, and security analytics is exactly what you need.”
I wasn’t crazy or lying, nor would I tune down the sensitivity of an analytical solution to meet his fidelity requirement. What I explained to him is how security analytics is not just another IDS sensor, and how a properly integrated analytics platform can help solve the problems in his operations.
In my explanation, I broke the process into three steps.
1. Reduce the volume of alerts while maintaining proper level of monitoring.
Most security tools are notorious for generating tons of alerts, if not properly tuned. For example, an out-of-the-box network IDS sensor can generate tens of thousands of alerts per day, most of which are noise. Proper tuning of those devices is critical for enterprise security.
Tuning is the process of placing sensors into the enterprise’s IT environment, observing and analyzing the alerts generated, and adjusting the parameters of individual signatures or rules to reduce the amount of false positives. The tuning of security tools can significantly impact the effectiveness of the monitoring, as well as the quality and quantity of alerts coming into the SOC analyst’s workflow. While a lack of tuning will overwhelm analysts with a flood of alerts, overly aggressive tuning may impair the breadth and depth of monitoring of potential threats, i.e. false negatives.
Tuning happens when a monitoring tool is first deployed, and is an ongoing process throughout the life of the tool because of the dynamic nature of today’s IT environments. Without prompt and ongoing maintenance, a well-tuned sensor can be saturated with false positives quickly.
Sensor tuning is especially challenging for large enterprise environments because they tend to have a wide variety of different systems, applications, and services, which have a higher chance of causing the signatures inside these sensors to misfire. A big risk with traditional sensor tuning in such large environments is that the tuning is often done against the whole infrastructure instead of against properly partitioned sub-environments. Under such situations, one single source (e.g., an application, a host, or a subnet) triggering lots of false alerts may result in the signature being turned off for the entire enterprise. For example, a SIEM may have a rule configured to alert when it sees more than 10 logon failures within 60 seconds for all enterprise accounts. However, some test accounts or scanner service accounts tend to generate lots of failures. A SIEM engineer may simply increase the alert threshold to suppress the amount of false positives, which, unfortunately, may render the rule ineffective in monitoring for real threats.
Now, with the help of analytics, tuning will be completely different. The tuning will not be based on subjective heuristics from security specialists. Instead, it will be done dynamically with machine-learning-driven deep insight of what’s been going on and what’s going on in that moment, and an up-to-date view of what’s necessary for the subject or object being evaluated. Security analytics is able to:
- Automatically partition the environments/entities based on behaviors to enable more granular, tailored monitoring rules and parameters to reduce alert volume while increasing the monitoring efficacy.
- Dynamically set up rule parameters based on up-to-date baselining of behaviors to maintain the proper level of monitoring with the dynamic environment.
This dynamic tuning process that automatically takes place within analytics solutions can significantly reduce the amount of alerts in a well-managed manner while avoiding the pitfalls of traditional, rule-base monitoring.
2. Improve the efficiency of incident response by providing relevant contexts.
Relevant context is critical for analysts to perform pattern recognition, validation, severity assessment, and situation awareness. Good contexts can help analysts quickly assess the severity of a situation and pinpoint the right data for investigation.
In general, contexts provided by an analytical solution can be classified into two categories: risk context and behavioral context.
Risk contexts are pretty straightforward. They are information that can help assess the associated risk of an incident and the entities involved. They usually focus on the activities and assets that are of high criticality, value, or suspicion. For example, if a user attempted to access many file shares containing sensitive data, if a privileged special command is run through remote access, when a large volume of sensitive data is newly found on an endpoint, if intense connection activities are made against critical application servers, etc. Risk contexts are important for analysts to prioritize or escalate the investigations. They also help analysts to make proper hypotheses on applicable threat scenarios.
Behavioral contexts are the patterns of user/entity behaviors, which may include:
- System and network access
- Data access and movements
- Application/command usage
- Security alerts and policy violations
Such contexts help analysts have a more comprehensive view on the activities when the incident is detected.
When put together with historical records or peer groups, behavior patterns can also provide context around both dimensions, which can help analysts identify and understand the anomalies related to the incident.
Behavioral contexts also help pinpoint the time range, location, and related entities within a potential compromise so analysts can start their investigations with a more focused scope.
Similarly, behavioral contexts help improve the efficiency of data collection and retention. Certain data may be not collected regularly or retained for very long due to the associated cost of gathering and/or storing it. For example, Windows security logs from workstations, packet captures, and firewall accepts are usually not collected regularly or retained beyond 30 days because the cost of collecting and storing such a large quantity of data isn’t justifiable compared to the value they provide. It is not uncommon to find that relevant logs have already been purged when an incident is detected weeks after it actually takes place. In such situations, analysts are usually forced to follow slower, more resource-intensive forensic processes, such as disk image scanning and tape data recovery, to find the evidence they need.
With the help of behavioral analytics, we can:
- Extract more abstract, behavior metadata for longer retention time. Similar to how we use network flows instead of full packet captures to retain important network communications information, we can also use behavioral concepts, such as “scan,” “beacon,” and “exfiltration” to abstract a massive amount of data into much smaller footprints for less costly retention and better visibility.
- Focus on “smart” data gathering and retention based on recognized behavior patterns or anomalies. There are situations when it is not feasible to constantly collect certain data across the board, such as large configuration files or traffic dumps. With behavioral analytics, the data collection can be configured to trigger when abnormal behaviors are identified. In future investigations, behavioral contexts can be provided to analysts together with the related raw data.
In general, with the help of analytics solutions, we are able to preserve more “missing pixels” for analysts to reconstruct the crime scenes, and because context is provided automatically, analysts can use it to stop the crime before severe damage is done.
3. Liberate bandwidth and allow users to perform more advanced threat investigation.
Because an analytics platform is able to reduce the volume of alert flow by applying smart tuning, improve the efficiency of incident response investigation by providing relevant contexts, and enable smart logging to reduce the necessity for resource-intensive forensic procedures, analysts are freed to perform more advanced threat hunting and investigations. This is one of the most important drivers behind deploying an advanced analytics solution for cybersecurity!
There are two common ways to enable threat hunting with an analytics platform:
- Start with entity behavior anomalies, and identify potential threats by validating the legitimacy of these anomalies. This kind of hunting requires the analytics solution to provide comprehensive contexts and an efficient way to connect them together with each anomaly.
- Start with a newly discovered threat pattern, and search through past logs, alerts, and behaviors to find such patterns.
These two threat hunting methods are common because they’re basic. Identifying interesting behaviors, modeling behaviors that can identify meaningful anomalies within your environment, mapping behaviors and anomalies to corresponding threats, and properly scoring them, are a whole new set of problems to resolve. There are advanced techniques that require more advanced technology, and they deserve their own series of articles. Stay tuned, as I’ll address advanced threat hunting techniques, and the technological requirements to execute them, in the coming months.