# A Survey of Outlier Detection Methodologies

@article{Hodge2004ASO, title={A Survey of Outlier Detection Methodologies}, author={Victoria J. Hodge and Jim Austin}, journal={Artificial Intelligence Review}, year={2004}, volume={22}, pages={85-126} }

Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as… Expand

#### 2,872 Citations

A Survey of Outlier Detection Methods in Network Anomaly Identification

- Computer Science
- Comput. J.
- 2011

A comprehensive survey of well-known distance-based, density-based and other techniques for outlier detection and compare them is presented and definitions of outliers are provided and their detection based on supervised and unsupervised learning in the context of network anomaly detection are discussed. Expand

A Comparative Study of Outlier Detection Algorithms

- Computer Science
- MLDM
- 2009

This paper presents a comprehensive analysis of three outlier detection methods Extensible Markov Model (EMM), Local Outlier Factor (LOF) and LCS-Mine, where algorithm analysis shows the time complexity analysis and outlier Detection accuracy. Expand

Outlier Detection in Multiple Linear Regression

- Business
- 2014

Outlier detection as a branch of data mining has many important applications, and deserves more attention from data mining community. Outliers are normally treated as noise that needs to be removed… Expand

Outlier detection based on neighborhood proximity.

- Computer Science
- 2010

A novel scheme for classifying and combining various outlier detectors in order to exploit their own advantages is presented, and it is pointed out that this method yields better detection accuracy than existing ones on high-dimensional datasets. Expand

Comparative Study of Outlier Detection Approaches

- Computer Science
- 2018 International Conference on Inventive Research in Computing Applications (ICIRCA)
- 2018

This paper presents a study of the various algorithms used recently in the literature for outlier detection, classified as supervised, unsupervised and semi-supervised. Expand

Different Outlier Detection Algorithms in Data Mining: A Review

- Geography
- 2014

Outlier is defined as an observation that deviates too much from other observations. The identification of outliers can lead to the discovery of useful and meaningful knowledge. Outlier detection has… Expand

Robust and Unsupervised Anomaly Detection for Multivariate Dataset

- 2020

Anomaly detection (also outlier detection [1]) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.[1]… Expand

A simple sequential outlier detection with several residuals

- Computer Science
- 2015 23rd European Signal Processing Conference (EUSIPCO)
- 2015

This paper focuses on the sequential (on-line) outlier detection schemes, that are based on the `delete-replace' approach, and demonstrates that three different types of residuals can be used to design the outlier Detection scheme to achieve accurate sequential estimation: marginal residual, conditional residual, and contribution. Expand

A STUDY ON DIFFERENT APPROACHES OF OUTLIER DETECTION IN DATA MINING

- Engineering
- 2015

Data mining is a process of extracting knowledge from large databases. Knowledge is appreciated as ultimate power now a days and considered as very important factor for the success of any… Expand

Outlier Detection: Applications And Techniques

- Engineering
- 2012

Outliers once upon a time regarded as noisy data in statistics, has turned out to be an important problem which is being researched in diverse fields of research and application domains. Many outlier… Expand

#### References

SHOWING 1-10 OF 85 REFERENCES

Outlier detection for high dimensional data

- Computer Science
- SIGMOD '01
- 2001

New techniques for outlier detection which find the outliers by studying the behavior of projections from the data set are discussed. Expand

Novelty detection using extreme value statistics

- Mathematics
- 1999

Extreme value theory is a branch of statistics that concerns the distribution of data of unusually low or high value, i.e. in the tails of some distribution. These extremal points are important in… Expand

Robust Decision Trees: Removing Outliers from Databases

- Computer Science
- KDD
- 1995

This paper examines C4.5, a decision tree algorithm that is already quite robust - few algorithms have been shown to consistently achieve higher accuracy, and extends the pruning method to fully remove the effect of outliers, and this results in improvement on many databases. Expand

Algorithms for Mining Distance-Based Outliers in Large Datasets

- Computer Science
- VLDB
- 1998

This paper provides formal and empirical evidence showing the usefulness of DB-outliers and presents two simple algorithms for computing such outliers, both having a complexity of O(k N’), k being the dimensionality and N being the number of objects in the dataset. Expand

Unsupervised Profiling Methods for Fraud Detection

- Business
- 2002

Credit card fraud falls broadly into two categories: behavioural fraud and application fraud. Application fraud occurs when individuals obtain new credit cards from issuing companies using false… Expand

Detecting graph-based spatial outliers: algorithms and applications (a summary of results)

- Mathematics, Computer Science
- KDD '01
- 2001

This paper defines statistical tests, analyzes the statistical foundation underlying the approach, design several fast algorithms to detect spatial outliers, and provides a cost model for outlier detection procedures. Expand

A Linear Method for Deviation Detection in Large Databases

- Computer Science
- KDD
- 1996

The problem of finding deviations in large data bases is described, a formal description of the problem is given and a linear algorithm for detecting deviations is presented, using the implicit redundancy of the data. Expand

Informal identification of outliers in medical data

- Computer Science
- 2000

The removal of outliers increased the descriptive classification accuracy of discriminant analysis functions and nearest neighbour method, while the predictive ability of these methods reduced somewhat. Expand

Efficient algorithms for mining outliers from large data sets

- Computer Science
- SIGMOD '00
- 2000

A novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor is proposed and the top n points in this ranking are declared to be outliers. Expand

Procedures for Detecting Outlying Observations in Samples

- Mathematics
- 1969

Procedures are given for determining statistically whether the highest observation, the lowest observation, the highest and lowest observations, the two highest observations, the two lowest… Expand