Author: Christophe Briguet, Co-Founder, Head of Solutions Engineering and Pre-Sales Support
In my last blog post, I introduced the need for a behavior ontology — a common language to characterize and describe how things act — to help us capture completely the qualitative and quantitative essence of behaviors in a natural, human way.
In this post, I describe an ontology architecture we’ve been thinking about since the initial conception of E8 Security, when we were designing and refining the concepts behind E8’s technology.
Know Your Audience
Before we can begin on an architecture, we need to draw a line connecting the source to the people for whom we’re designing this ontology. On one end, we have the underlying data sources that will be integrated into our scope: the users and assets that are actively affected by cyber-behaviors and the information they contain, which is often what’s targeted in cyberattacks. On the other end, there is the audience; we need to figure out who they are and what kind of language they use so we can communicate behaviors effectively.
Our audience consists of two groups:
- Data scientists who would use a behavior ontology to define use cases, and to build and assess models for threat and risk detection, including the machine learning algorithms that go into them.
- Security practitioners who evaluate, implement, and operate security analytics technologies, and who would use a behavior ontology to describe and document use case requirements and characterize security incidents along with their appropriate responses.
Once we understand who our audience is we can identify some of the questions they will need to ask regarding behaviors, which will help us understand the kinds of answers our ontology needs to provide, and help us better define the scope for the data sources we need to integrate.
Here are some examples of conceptual questions and the language in which they’re asked, based on existing use cases:
- Are new employees accessing critical assets during unusual working hours?
- Are there any users authenticating from unusual locations and accessing an unusual number of files?
- Have any devices recently started communicating to any unusual external hosts simultaneously?
- Are any users accessing critical business data exclusively via a VPN connection?
- Are there any devices communicating at regular intervals with an unpopular external host?
- Are any users regularly accessing systems that aren’t also being accessed by any of their peers with similar job roles?
- Have any executive users started to behave like an administrative user?”
- Have any users started behaving like a user from a different authority position?
- Have any non-privileged users started behaving like an administrator user?
- Have any users started copying an unusual number of files since they gave their resignation notice?
Layers of the Ontology Cake
At first glance, it looks like we’ll need to integrate a lot of concepts, but if we break down the structure of these questions, we can more easily imagine a layered approach to analyzing them. At the lowest level, there’s the “Foundational Elements” layer to capture concepts related to entities and their contexts, such as whether the entity is a person or an asset, their geo-location, and that entity’s relationship to other entities. Then a mid-level “Abstracted Actions” layer to provide a characterization of the action performed, such as the kind of digital action taking place (authentication, email, search, etc.) or the objective of the software behavior (remote access, self-defense, evasion, etc.). At the top, we have the high-level “Meta-Behaviors” layer that is as behavior-independent as possible and provides the quantitative and qualitative representation of the behavior. This layer also contains properties or features which cannot be captured by simple queries on log or event data.
Example of structural model of anomalous behavior-based ontology
To simplify the concept of these layers, let’s take an example from everyday life: A typical commute from home to the office in your car. The route, vehicle model, departure and arrival times, and locations would be the foundational elements at the bottom layer. The action of driving a car, altering your route to avoid traffic, finding parking, stopping to pick up coffee and to fill the gas tank would be the abstracted actions at the middle layer. And finally, the typical route, typical starting time, typical commute duration, typical audio program — or the “unusualness” for your commute — would be the meta-behaviors at the top layer.
Let’s apply the same “layer” principal to one of the conceptual questions our audience might ask, and see how it would be represented with this “three layers” ontology:
Within the realm of cybersecurity, both the foundational elements and the abstracted actions are already well-defined domains, and an existing ontology could be re-used for our behavior ontology. For example, An Insider Threat Indicator Ontology could be leveraged to represent our bottom layer of foundational elements (actor, action, information, asset), and The Malware Behavior Ontology could contribute models for our middle layer of abstracted actions for suspicious software behavior.
Besides proposing a set of terms and abstract concepts, our behavior ontology requires axioms — a set of accepted truths — to define the meaning of these terms, as well as their relationships and constraints. In addition, a certain loss of fidelity is expected when going from natural text to a model. On the one hand, the ontology might miss contextual clues and other detailed information, but on the other hand, it is free of ambiguity and redundancy; it is also understood entirely through analysis of its tokens and structure. This tradeoff is acceptable for the task we want to accomplish.
Ideally, our ontology should be built automatically via tools like machine learning or artificial intelligence, and because each organization is unique in terms of job roles and what “normal” is, it should be constructed dynamically and collectively curated or freely shared. Ultimately, a behavior ontology in combination with the available cyber information, temporal dependencies, various time scale and modalities, and repetitive behavior patterns will further improve the automated detection of anomalous and suspicious user and entity behavior, especially as we all come to realize that risky behaviors are not black and white, but shades of gray.
In my next post, I’ll discuss how data scientists and security practitioners would consume our proposed behavior ontology and how certain use cases might leverage it. Stay tuned…
Read more of Christophe Briguet’s “User & Entity Behavior Ontology” series: