Anomaly Detection System

Home / Anomaly Detection System

Introduction

The GBS solution which is addressed to financial trading, requires some fundamental needs which includes:

  • Guarantee fast response times and high performance both to the end customer and to bank / finance company operators
  • Guarantee the operational continuity of the whole system

This is done to:

  • Allow the bank / finance company to maximize revenues and avoid economic losses
  • Allow the customer and the bank operator to carry out the desired operations at the appropriate time, without facing issues.
  • Guarantee to customer high performance of the trading system functions (response times and absence of interruptions), to obtain the ‘Customer Satisfaction’

To cope with the above needs it is necessary to avoid any wrong behavior: of the production environment, of the system and of the entire architecture in which GBS is running (servers, equipment and networks).

Given the architectural complexity of GBS, these needs are motivated by the extreme difficulty encountered in finding the root cause in cases of system malfunction and system crash.

To overcome these difficulties, it was decided to create a software capable of recognizing when the potential critical situations start, to be able to alert the team of systems engineers and architectural managers as soon as possible, providing them with the possibility of intervening in advance with changing that can mitigate or avoid the risk of malfunction and crash.

Disservice early detection
In order to identify scenarios that could potentially lead to abnormal GBS operativity as far in advance as possible, we have created a watchdog based on Machine Learning algorithms capable of recognizing non-standard system behaviors and metrics (AI-powered anomaly detection).

Thanks to its prediction capabilities, the watchdog is able to peremptorily send various types of alerts such as e-mails and SMS to the technical staff involved in application management.

The solution

System architecture
Based on the critical issues and needs described above, the solution is based on the following architecture:

Below is present a description of the modules in the previous diagram

Data Gatherer REST API

The module aims to provide GBS with the possibility to send data to the watchdog system, allowing the collection of system metrics and other parameters via REST API.

Data Gatherer Load Balancer
The load balancing module will manage the number of instances and the routing of messages to the data gathering modules, depending on the workload generated by the various subscribed services.

Data Gatherer Instance

The Data Gatherer module is the one that takes care of physically collecting log data by the subscribed services and making their correct subscription within the database. It plays an important role within the anomaly detection architecture and must be implemented to maximize throughput, while minimizing the footprint at the level of used resources so that it can be easily instantiated in an architecture with microservices like the one proposed.

Reasoner

The Reasoning module takes care of analyzing the collected data and it can trigger alarm and notification events. It is a synchronous type of module so it will periodically query the database to carry out an analysis.

The algorithmic implementation of the reasoner depends mostly on the results of the data analysis phase that will be carried out in the initial tasks of the project.

Alarm Manager

The Alarm Management module is sending alarm messages or notifications generated by the Reasoner to the teams of systems engineers and architectural managers, who can start corrective actions. A possible implementation is sending warning messages about the “system overload” to the session manager’s Load Balancer who can decide to increase the number of active sessions.

Since the “alarm & notification” service has a strictly asynchronous character, it is implemented on message queues.

Reasoner REST API

The module provides GBS with a unique access point through, which it can interface with the Reasoner in order to carry out actions of:

  • Reasoner configuration
  • Query the data generated by the Reasoner

Data DB
Since the heterogeneous nature of the data flows entering the system, a No-SQL database was created, possibly in “sharding” mode, in order to preserve the throughput when writing the data flows from the different subscribed services .

Click here to change this text