You are the eventuality of an anomaly..

You are the eventuality of an anomaly..

In the movie Matrix reloaded, The Architect of Matrix explains to ‘neo’ what he is. “ You are the eventuality of an anomaly, which despite my sincerest efforts I have been unable to eliminate from what is otherwise a harmony of mathematical precision. While it remains a burden to sedulously avoid it, it is not unexpected, and thus not beyond a measure of control.”

‘The problem’ of this era is neither the scarcity of data nor the means to analyse it. We have blazed a path over last decade or so that we have moved to the other end of spectrum, we are being overwhelmed by it. In the world of automation and machine learning there is still a Digital campaign manager who is spending his entire day shifting through 100s of these automated reports trying to make sense out of the numbers. Since one have put lot of effort in making sure that these automations are correct, and expectedly so, almost no attention goes over validating these numbers in the report. But anomalies occur, it is inevitable.

So, if the architect found it impossible to avoid these anomalies so would you and me and hence let’s focus more on identifying them.

In this article we shall focus on how to find anomalies in a time series data.

Anomaly detection problem for time series is usually formulated as finding outlier data points relative to some standard or usual signal. While there could be many types of anomalies, in a time series we would most commonly associated them to a spike or a drop or an unexpected trend.

Types of anomaly:

1.       Additive: A sudden data breach happened and as a data security company you saw a huge spike in short period for new bookings. These types of anomalies are called additive.

Fig 1: Additive anomaly

2. Temporal: Your attribution engine went down and there was zero or very low numbers reported for your sales through some channels for some small period. These types of anomalies are called Temporal.

Fig 2: Temporal anomaly

There are basically two ways you could detect anomalies either label each point as anomalous or normal or predict the normal range for each time point and see the data fits into that range or not. In the second approach we are better setup to visualize a confidence interval and find the reasons for anomalous occurrences.

The simplest approach to identifying irregularities in data is to flag the data points that deviate from common statistical properties of a distribution, including mean, median, mode, and quantiles. Let’s say the definition of an anomalous data point is one that deviates by a certain standard deviation from the mean. Traversing mean over time-series data isn’t exactly trivial, as it’s not static. You would need a rolling window to compute the average across the data points. Technically, this is called a rolling average or a moving average, and it’s intended to smooth short-term fluctuations and highlight long-term ones.

These are the few algorithms we tried

1. STL decomposition

STL stands for seasonal-trend decomposition procedure based on Loess. This technique gives you an ability to split your time series data into three parts: seasonal, trend and residue. It works on seasonal time series.

It used median absolute deviation to get a more robust detection of anomalies.

Fig 3: Decomposing time series data in Seasonal, Trend and Residue

2. Classification and Regression Trees

1. Supervised: we used supervised learning to teach trees to classify anomaly and non-anomaly data points. the output from Moving avg methodology was taken as an input to train the model.

2. Unsupervised:  Second approach is to use unsupervised learning to teach ML model to predict the next data point in your series and have some confidence interval or prediction error as in the case of the STL decomposition approach. You can check if your data point lies inside or outside the confidence interval using Generalized ESD test, or Grubbs’ test.

Fig 4 : Comparing Predicted vs actual to identify anomaly

3. Neural Networks: Recurrent Neural Network was used for anomaly detection, where the model keeps the temporal data points in memory and uses it to learn about time-dependent structure in a series of data.

RNN looks for the pattern in after every periodic shift and compares with the patterns in the historical data to find the anomalous pattern even if the point in time values are in range.


Most Popular

Let's Connect

Please enable JavaScript in your browser to complete this form.

Join Factspan Community

Subscribe to our newsletter

Related Articles

Add Your Heading Text Here


Meta’s LLAMA 2 Vs Open AI’s ChatGPT

Explore the world of cutting-edge AI with a detailed analysis of Meta’s LLaMA and OpenAI’s ChatGPT. Uncover their workings, advantages, and considerations to help you make the right choice for your specific needs. Dive into the future of AI and its profound impact on content creation and data analysis.

Read More ...

Data Contract Implementation in a Kafka Project: Ensuring Data Consistency and Adaptability

Data contracts are essential for ensuring data consistency and adaptability in data engineering projects. This blog explains how to implement data contract in a Kafka project and how it can be utilized to solve data quality and inconsistency issues.

Read More ...

CDP: A band-aid solution?

Step into the world of Customer Data Platforms (CDPs) with our captivating blog, designed to guide you through every angle. Discover the origin story of CDPs – why they stepped into the spotlight. Uncover their true essence and explore the four common categories they belong to. Delve into real-life scenarios with eight compelling use cases that are revolutionizing businesses today. Tackle the question: are CDPs a quick fix or a sustainable solution? And don’t shy away from addressing the challenges that come with CDP territory. Wrapping it all up, you’ll find key takeaways that provide fresh insights into this dynamic technology.

Read More ...

The Magical Transformation: How Nike Used Marketing Intelligence to Win the Game

Discover how Marketing Intelligence and Generative AI shape effective strategies. Learn from Nike’s success against Adidas in 2018. Dive into personalized content, automation, and insights.

Read More ...

Web 3.0: Transforming the Future of E-commerce

With Web 3.0, users will experience heightened control over their data, leading to faster and safer transactions. For businesses, this paradigm shift will necessitate embracing AI, blockchain, and machine learning technologies to better connect with customers and thrive in this new era of digital commerce.

Read More ...

Unveiling Insights: Checking File Trend Analysis for Data Engineers

This blog highlights the significance of file trend analysis in data engineering, addressing challenges faced by professionals in managing and utilizing data effectively. It explores the benefits of file trend analysis, including performance optimization, data quality assurance, and decision-making support.

Read More ...