English
 
Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions

Li, Z., Zhao, Y., Hu, X., Botta, N., Ionescu, C., Chen, G. H. (2023): ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions. - IEEE Transactions on Knowledge and Data Engineering, 35, 12, 12181-12193.
https://doi.org/10.1109/TKDE.2022.3159580

Item is

Files

show Files
hide Files
:
Li, Botta et al. ECOD Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions.pdf (Any fulltext), 4MB
 
File Permalink:
-
Name:
Li, Botta et al. ECOD Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions.pdf
Description:
-
Visibility:
Private
MIME-Type / Checksum:
application/pdf
Technical Metadata:
Copyright Date:
-
Copyright Info:
-
License:
-

Locators

show

Creators

show
hide
 Creators:
Li, Zheng1, Author
Zhao, Yue1, Author
Hu, Xiyang1, Author
Botta, Nicola2, Author              
Ionescu, Cezar1, Author
Chen, George H.1, Author
Affiliations:
1External Organizations, ou_persistent22              
2Potsdam Institute for Climate Impact Research, ou_persistent13              

Content

show
hide
Free keywords: outlier detection; anomaly detection; distributed learning; scalability; empirical cumulative distribution function
 Abstract: Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical-Cumulative-distribution-based Outlier Detection), which is inspired by the fact that outliers are often the “rare events” that appear in the tails of a distribution. In a nutshell, ECOD first estimates the underlying distribution of the input data in a nonparametric fashion by computing the empirical cumulative distribution per dimension of the data. ECOD then uses these empirical distributions to estimate tail probabilities per dimension for each data point. Finally, ECOD computes an outlier score of each data point by aggregating estimated tail probabilities across dimensions. Our contributions are as follows: (1) we propose a novel outlier detection method called ECOD, which is both parameter-free and easy to interpret; (2) we perform extensive experiments on 30 benchmark datasets, where we find that ECOD outperforms 11 state-of-the-art baselines in terms of accuracy, efficiency, and scalability; and (3) we release an easy-to-use and scalable (with distributed support) Python implementation for accessibility and reproducibility.

Details

show
hide
Language(s): eng - English
 Dates: 2022-03-052022-03-162023-12-01
 Publication Status: Finally published
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: Peer
 Identifiers: DOI: 10.1109/TKDE.2022.3159580
PIKDOMAIN: RD4 - Complexity Science
Organisational keyword: RD4 - Complexity Science
Model / method: Machine Learning
MDB-ID: No data to archive
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

show
hide
Title: IEEE Transactions on Knowledge and Data Engineering
Source Genre: Journal, SCI, Scopus
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 35 (12) Sequence Number: - Start / End Page: 12181 - 12193 Identifier: CoNE: https://publications.pik-potsdam.de/cone/journals/resource/transactions-knowledge-data-engineering
Publisher: Institute of Electrical and Electronics Engineers (IEEE)