Anomaly Detection and Auto Error Correction using AI/ML



Client is conducting flow surveys of underground wastewater systems using data collected from multiple flow meters. The meters generate flow and velocity reports at two minute intervals which are transmitted upwards over Mobile Network. The meters operate in hostile environments which can result in erroneous or missed data points. A manual process is required to identify and correct erroneous data points , a process that is error prone and time consuming. The purpose of this project is to use ML techniques to auto-correct the entire data sample to high levels of accuracy in a compute efficient manner.

Executive Summary

Client is in the business of conducting wastewater flow surveys for water utilities

Data collected needs auto-correction due to outliers in the data sets and ensure consistency in the survey final reports.

Cloud Solution using AWS Sagemaker provides end-to-end ML solution with highly intensive servers only when needed, also supports auto scaling and hyper-tuning features with third party libraries.


powered by

Collections of raw water pipe data Flow/Depth/Velocity analysed for data accuracy.

Data segmented into component parts and uploaded securely into AWS S3 buckets and mapped with AWS Sagemaker.

Using ARIMA and linear regression model, time series data flow/ velocity is analysed to identify anomalies and to predict correct values based on previously observed values.

Predicted values checked with error rate to evaluate model performance for accuracy of data. Predicted data sets converted into CSV files and stored in s3 buckets to upload to AWS Quick sight for data visualisation and presentation

Benefits to Organisation

The effort of correcting data samples for flow reporting has been reduced by a factor of several hundred with greater accuracy in the data. This is enabling the client to expand its service delivery capacity and client base with no extra cost.

The benefit of using AWS is reduced complexity and development cost for the solution and the compute intensive resources needed to run the ML algorithms are provisioned for short period when reports are being run thus reducing operational costs.