Author: Christophe Briguet, Head of Solutions Engineering and Pre-Sales Support
Do you remember your biology teacher? I do. I remember her showing slide after slide of animal anatomy and having to memorize the Linnaeus method of biological classification. I’m sure you all know what I’m talking about: kingdom, phylum, class, order, family, genus and species — the taxonomy that hierarchically organizes living things.
However, there’s more to a living thing than just its attributes and the classification under which it fits. If you’re going to truly describe a thing, or entity, you also need to represent its state of being, or it’s interaction with its environment. This is where an ontology is useful. An ontology takes over the same purpose of knowledge representation, but in addition to taxonomy, ontology covers things like attributes of those entities as well as the relationship between other entities and groups/classes. To stay in the same field, imagine a “classification” of birds based on their acoustic ability, ecological environment, interactions with others of its species, and how it interacts with things outside its species. This would be an ontology.
Let’s take this concept of ontology — of describing behaviors — as it applies to humans and machines within an organization in terms of being able to understand security and risk. How do you describe your security analytics requirements beyond their “ability to baseline user activity and identify anomalous user behavior”? How do you translate the outcome of such a system when it’s primarily based on mathematical abstracts and complex statistical concepts?
These are the questions we at E8 Security will answer for the broader cybersecurity community via my three-part blog series, “Not Lost in Translation.”
We can start by recognizing the need for a common language to reason and characterize behaviors; next, we review how SIEM technologies address this challenge for event correlation; and finally we introduce the idea of an ontology-driven approach for modeling behaviors. In later posts I will discuss what an ontology, specifically for behaviors, should look like, and apply it to a case study.
Our Current Language, and all its Limiting Glory
As Hina stated in her post just last week, behavior modeling techniques have been introduced into the market to complement SIEM and threat detection systems. They are based on the principle that every entity (individual, machine, system, etc.) could be characterized by its behaviors (habits), and it may be possible to identify a suspicious activity by discovering an unusual pattern and abnormal behavioral characteristics. We understand that behavior analytics goes beyond “known good / known bad” classification (Indicators of Compromise) and beyond simple statistics which don’t really capture the dynamic nature of networks (e.g. “? + 3 Σ” or “if x is 10% above average, it’s bad“), but is there a clean formal model for entity behaviors? What model could ultimately be interpreted by machines and shared by threat researchers and data scientists? What model could be used to build out-of-the-box “security” content for behavior analytics products? What model could be used to craft data to test and prototype use cases?
SIEM vendors faced a similar challenge many years ago: to acquire heterogeneous sources and correlate high volumes of events on the fly, they had to come up with a way to translate them into a standardized format. This approach started the “supported device” race between vendors, where armies of developers and security experts were asked to write ad-hoc collectors and parsing rules to normalize events and manipulate the concepts provided by data sources in a homogenous way.
Inevitably, this approaches led SIEM vendors to focus on the data structure rather than semantics. Event data formats such as CEF (Common Event Format) and IDMEF [RFC 4765], were implemented in conjunction with event taxonomies. That was incredibly useful, but these formats were designed to be used by correlation rules with limited sets of operations and features (e.g. count, group-by, sequence, etc.).
Here’s an example of one such correlation rule to detect Session Hijacking:
Taxonomy works well when you know what you are looking for, but doesn’t fit behavior-based models designed to identify and learn patterns such as in the example below:
Twenty-five Different Words for “Snow”
In the security analytics world, characterization of entity behaviors tends to rely on natural language, and often briefly summarizes the event, oversimplifying and losing the meaning in the process This is why I believe the community will benefit from the development of a behavior ontology, so the essence of the action isn’t lost in translation.
An ontology supported by a set of functions that can be used in combination with a specific syntax, such as an “Entity Behavior Modelling Language” – a language that supports quantitative and qualitative operations (ex: rare, unexpected, sudden, significant, excessive, increase/decrease, widely, contained, similar, fast, slow, common, uncommon, globally, locally, suspicious (IOC), typical, work hour, weekend, etc.).
With this language, security practitioners would be able to characterize the pattern and behaviors they are trying to identify with a given threat detection model. For example, “Find any unexpected access to critical assets by new employees during unusual working hours.”
Ultimately, models defined and formalized with a behavior-based ontology would be machine readable, understandable by humans, and sharable between products and systems.
Have you faced similar problems in your own Security Analytics implementation? How have you addressed them?
In my next post, I’ll discuss what a behavior ontology should look like and what some of the requirements might be for creating it. Stay tuned…
Read more of Christophe Briguet’s “User & Entity Behavior Ontology” series: