At the TimeMachine* company there are two special old friends Bob** and Alice**. Bob, as a team manager, usually has a very busy schedule filled with meetings all day long. You can even find him working late into the night trying to catch up on email he received during the day. Alice on the other hand, since she became a mom, has a more restricted time schedule with consistent office hours every day that work around her kid’s schedule.
Recently, the TimeMachine security team detected possible malicious software accessing and data theft from Alice’s laptop. The security team detected anomalies in the time of day at which data was being accessed by Alice’s laptop. Data attacks were being executed at 3:00 AM when Alice was home sleeping, not during her regular working hours. Furthermore, the security team reported the attacks tried to access sensitive database tables of the company that Alice neither has access to nor has ever tried to access before.
How did the security team know to accurately flag Alice’s laptop? The answer is based on the fact that companies are equipped with clever cyber security systems to protect their data from being stolen. These systems autonomously learn employees’ typical behavior. If there were a deviation from typical behavior later, the security team then raises doubt regarding truthful intent. With the combination of Machine Learning (ML), a process that identifies an anomaly in the time the data was accessed, and knowledge domain (i.e. what are the sensitive tables Alice’s laptop was trying to access) the system successfully detected a true attacker manipulating Alice’s laptop without her noticing it. In this post, we’ll focus on the anomaly in working hours detector to introduce how a rather simple, intuitive, and useful protection method can be a very effective tool for ensuring data security.
What does it mean to work during non-working hours?
Defining the working hours is not a trivial task. During COVID-19, the definition became even more complicated because of new “work from home” policies that many companies adopted which increased the flexibility to work at different hours for each employee.
Yet even in that state of “new normal”, people tend to have their daily routine which revolves around a consistent range of working hours. Hence, it is possible to define predictable work hours for individuals, and therefore when they would be likely to access data. A typical example would be working from morning until the afternoon and not working from evening until morning.
Once working hours intervals are established, it could be suspicious if someone would have worked during their typical non-working hours. It could be a result of an outsider hacker using a compromised employee’s credentials as happened to Alice, and it could even be that the employee himself has malicious intentions and is trying to keep a low profile by working during non-standard working hours.
Using knowledge about typical working and non-working hours of individuals and their peers and looking at the context, one could build a good understanding of how, when, and where an attack was likely to occur and reduce FP.
Profiling working hours per group
Before we can describe how a protection mechanism could be based on working hours anomalies, we need to determine the working hours.
It is natural to break down working hours to three logical groups:
Global working hours
Generally, people have a routine of working days and working hours. People tend to work five days a week from the morning and gradually leave in the afternoon as illustrated in Figure 1. Hence this group represents the standard working hours of the entire organization. Other macro events which relate to most of the workers such as holidays could also be noticed in such scale.
Joint working hours
It is possible to subgroup the organization due to the natural common features that exist between some workers. A natural group could include teammates, peers from different teams that work with each other on the same project and managers such as Bob who in the process of checking messages in the evening, may often need to access data to formulate responses. Groups could also evoke an “out of work” context, such as parents that tend to arrive at work early in the morning so they can leave relatively early during the day to pick up their kids from school as Alice does.
Individual working hours
Last, and not least, the highest resolution is to focus separately on each individual which describes each employee’s personal pattern of his work-life cycle.
Figure 1: Count of all employees actions per day and per hour in a certain company. As can be seen, during the weekend, Saturday and Sunday, the count is rare, and during the week people start working around 08:00 and gradually leave in the afternoon.
The first group is obtained by concerning all employees in the company where the last group is obtained per employee. The second group, however, needs some information to be obtained such as the Active Directory Department field if dividing by department. In cases where Active Directory data is missing, a ML clustering approach could be useful. For instance, measuring L1 distance between all individual’s working hours distributions and grouping whoever has short distances can help to find these sub-groups.
Profiling working hours per time context
Though it might seem useful to define as an outlier whoever works outside the main interval around the median of activity count, this binary clustering is too naïve and lacks resolution. For example, working at 2:00 AM or 3:00 AM might be equally suspicious, but if the latter case was to occur during Holiday Eve it would be more suspicious.
Hence, the context of the working hour could help to increase the security incident significance. Three context factors that could be taken into account to improve detection are:
Weekends
In most companies the assumption that on weekends there is less activity is true. We can measure the activity (Figure 2A) and by using unsupervised ML clustering methods we can define 2 subgroups of activity (Figure 2B) – low activity (red dots) and high activity (green dots). Last, under the assumption there are two days of weekend each week, we can define the weekends as the two sequential days that come every 5 days and have the least actions. This could be obtained by performing autocorrelation between the weekend vector (i.e. [1,1,0,0,0,0,0,1,1,0….]) and the activity vector and taking the highest result (Figure 2C). This will give the most aligned position between the vectors indicating when the weekends occur.
Holidays / Fun company days outside the office
After detecting the weekends which repeat constantly, each low activity dot (red) which is not a weekend can be considered as a holiday / fun day outside the office.
Sleep time
Under the assumption people sleep before they start their workday, the hours before are defined as sleep time. As an example, in Figure 3, 08:00 – 17:00 is the standard working hours interval. There is a drop off in working hours spread along the evening night but during 00:00 until 06:00 there is no activity and could be defined as the sleeping time of that individual.
Figure 2: Unsupervised learning helps to detect weekends and holidays. A. Per day we count the number of all actions by all employees in the company. B. Using K=2 means, we detect two groups – one with low count of actions (red dots) and second with high count of actions (green dots). C. Taking the highest result of an autocorrelation between a weekend vector i.e. [1,1,0,0,0,0,0,1,1,0….] and the low activity vector (red dots in B) we can detect the weekends. Each future data point which is not a weekend but has a similar activity count as the low count group (i.e. beneath black dashed line in B) is considered a holiday/fun company days outside the office.
Detection Approach
Equipped with the ability to track each employee’s working hours and knowledge of non-standard working hours per group (global, joint, individual) and per context (weekends, holidays/fun company days outside the office, sleep), we can combine all to detect per employee per hour suspicious working hours. By giving a different severity to each interval violation we can detect the most suspicious working out of hours per employee and alert when there is a significant anomaly in the expected and presumably standardized working hours as shown in Figure 3 by the red dot. We would like to emphasize that this detector tells one side of the story and an alert should be supported with additional information such as the attempt to approach sensitive tables in Alice’s case. We will address this point next.
Figure 3: An example of an individual working hours mapped on defined working and non-working hours intervals. Y-axis indicates the hour of the day and x-axis indicates the date. Blue dots – indicate there was an activity, gray squares – indicates individual, joint and global working hours, red squares – indicates individual non-working hours, green squares – represents global non-working hours, yellow squares – represents the conjunction of global and joint non-working hours, black squares – represents the conjunction of individual, joint and global non-working hours, solid black lines indicate weekend days and blue solid lines indicate holidays (here 4th of July). The activity on the 30-06-2019 at 23:00 (red dot) is the most suspicious hour since the employee worked during a late Sunday night though usually, he does not work during weekends or holidays.
Joint indicators as a complete system
It is intuitive that when someone is working out of expected working hours, we should open an eye. By profiling the working hours per group (global, joint, individual) and per context (weekends, holidays/fun company days outside the office, sleep) we are able to detect a significant working out of hours activity.
However, chances are that anomalies in working out of hours will occur from time to time. For instance, an anomaly occurred when Alice had to work late one night to prepare for her first presentation on a new initiative she developed. Another false positive occurred when Bob decided to skip his Wednesday morning yoga and begin working instead. Furthermore, a data breach could happen at any time of the day, so we cannot in principle count on working hours detection alone. Hence, such anomaly detection in working hours should serve as a recommendation to help strengthen a suspicion and not solely trigger an alarm.
As an example for the above, Imperva’s DRA (Database Risk Analytics) is a cyber security system that takes many factors into account and by using a sophisticated risk analytics platform, it analyzes employees activity to provide insight about security risks. Working out of hours detection is one of the many security incidents the system is able to detect and as more insights trigger off the probability for an actual attack raises and helps the SOC team to decide what action should be taken.
* Not a real company
** Not real people
Try Imperva for Free
Protect your business for 30 days on Imperva.