Author: Madhukar Govindaraju, SVP of Engineering
Now that it’s 2018, it’s safe to say that machine learning and AI have left the data center and entered the world of business world-wide. Thanks to enormous growth in data volumes, significant advances in raw compute power, and scalable cloud-based infrastructure platforms, the performance of machine learning algorithms continues to improve. Today’s machine learning models and AI algorithms already support very accurate machine vision, hearing, and speech, and they can access global repositories of information and help users to do their jobs more efficiently and effectively.
Research firm Gartner expects worldwide enterprise security spending to total $96.3 billion in 2018. Despite increased investments in cyber defense stacks and platforms, cyber attacks are very much on the rise. In just the first half of 2017, we saw cyber-attack volumes double.
Research from market analysts, Juniper Research, suggests that the rapid digitization of consumers’ lives and enterprise records will increase the total cost of data breaches to $2.1 trillion globally by 2019, an increase of almost four times their estimated cost in 2015. This makes cyber crimes the greatest threat to every profession, company, and industry in the world. This report also states that nearly 60% of anticipated data breaches worldwide will occur in North America, but that this proportion will decrease over time as other countries become both richer and more digitized. By 2020, the average cost of a data breach will exceed $150 million, as more business infrastructure gets connected.
So, how do we tackle this huge problem?
Most of the information needed to uncover attacks is distributed across global landscapes of databases, applications and systems, and throughout a company’s worldwide network — specifically, when a company has the following solutions that can detect various types of malicious activities via rules and policies:
- One or more firewalls (FWs)
- Data loss prevention (DLP) solutions
- Endpoint detection and response (EDR) or endpoint protection and prevention (EPP) systems
- Security incident and event management (SIEM) systems
- Web application firewalls (WAFs)
- Cloud access security brokers (CASB)
- Identity and access management (IAM) solutions
Most of these traditional systems and next-generation SIEMs rely on rules, policies and correlation heuristics to filter the noise. The problem is that investigating all the alerts generated by all devices within a large enterprise’s global security infrastructure, and making sense of the data generated every day — sometimes down to the minute — is more than what any organization can handle. The ability to cut through the noise and identify the real threats could make a huge difference and result in much better security outcomes. This is what predictive user and entity behavioral analytics (predictive UEBA) platforms can deliver.
Predictive UEBA platforms baseline normal activity across all users and hosts/devices and leverages several machine learning algorithms to spot behavior changes and outliers that have some significance. This is how predictive UEBA systems are able to identify anomalous events and patterns that are normally missed. It fills in the blanks to instantly put activity into context so the implications can be better understood within the organization’s unique environment. It also makes it easier to identify hidden threats and support automated remediation to prevent any attack damage.
UEBA software provides user-centric analytics around user behavior, but also around other entities, such as endpoints, networks, and applications. The correlation of the analyses across various entities makes the results more accurate and threat detection more effective. This is what predictive UEBA systems, such as E8’s Fusion Platform, can deliver at a scale of petabytes of data across large enterprise networks.
What can predictive UEBA do?
Predictive UEBA systems collect data from a number of sources and analyze the activity to identify and alert on hard-to-detect events that typically indicate:
- System or host compromise
- User account compromise
- Data loss or exfiltration
- Insider threats, such as privilege abuse
- Lateral movement and persistence
They can also help focus time and resources on the most significant activities taking place in the environment. When facing thousands of alerts, predictive UEBA applies sophisticated machine-learning-based risk models to drastically consolidate incidents, automate alert prioritization by intelligently scoring every activity, and guide security operations to mitigate the biggest threats first.
Predictive UEBA analyzes billions of events originating from millions of users and devices/hosts in a global enterprise network, and generates insights into threats and behaviors. Machine learning models baseline metrics across user and device behaviors that are stored over several months, keeping historical behavioral insights intact to deliver context around new insight.
How does predictive UEBA scale to billions of events?
To leverage prediction capabilities, the advanced machine learning algorithms within predictive UEBA must analyze billions of events in a scalable way to not only pick out interesting behaviors, but also identify the source user and device from which they’re originating. This is a very challenging problem to solve.
At E8 Security, we solve this by using an entity behavior graph composed of a very large number of entity objects (users and hosts) and entity property attributes (details about each, such as MAC address) that are densely interconnected with each other via meaningful behavioral relationships.
Most threat modeling and threat detection machine learning algorithms model pairwise similarities between entity data objects using a graph, and the subsequent problem of data clustering can be translated to a graph clustering problem. E8’s Fusion Platform supports scalable and efficient clustering for large-scale entity graphs using several methods, including spectral clustering, which breaks large clusters of similar data points into smaller ones. Think of this like huge project you need to complete. When you break the overall project into smaller tasks and groups of similar tasks, the project seems less daunting and becomes more easily executable.
This same principle applies. By subdividing the original graph into sub-graphs and compressing the sub-graphs into bipartite node sets, where there are just two types of data points and the relationship between them is a straight line, we can identify and compute complex relationships much more easily. E8’s Fusion Platform can scale to a very large number of entities — users and hosts — and process billions of events as part of the hourly and daily data pipelines.
Entity graph clustering results are also used as building blocks for our threat models/algorithms to reduce complexity. E8’s machine learning models divide data into layers: the raw data that the platform ingests goes through a layer of processing to extract specific features, which are then consumed by the models, thereby reducing the computation complexity. Systematic partitioning of data points and layered feature extraction are key design points that enable the extreme scalability within the E8 Fusion’s Platform.
Interested in learning more?
Take a Joyride with E8! Test out the Fusion Platform and see for yourself how the scalability and operational efficiency of predictive UEBA can improve your cybersecurity practice.