Outlier Analysis by Charu C. Aggarwal

By Charu C. Aggarwal

This booklet offers finished assurance of the sector of outlier research from a working laptop or computer technology viewpoint. It integrates equipment from facts mining, computing device studying, and facts in the computational framework and as a result appeals to a number of groups. The chapters of this publication might be prepared into 3 categories:
  • Basic algorithms: Chapters 1 via 7 talk about the basic algorithms for outlier research, together with probabilistic and statistical tools, linear equipment, proximity-based tools, high-dimensional (subspace) tools, ensemble tools, and supervised methods.
  • Domain-specific tools: Chapters eight via 12 speak about outlier detection algorithms for varied domain names of information, corresponding to textual content, specific facts, time-series info, discrete series facts, spatial info, and community data.
  • Applications: bankruptcy thirteen is dedicated to varied purposes of outlier research. a few information is usually supplied for the practitioner.
The moment version of this e-book is extra targeted and is written to attract either researchers and practitioners. major new fabric has been further on themes resembling kernel tools, one-class support-vector machines, matrix factorization, neural networks, outlier ensembles, time-series tools, and subspace tools. it truly is written as a textbook and will be used for school room teaching. 

Show description

Read or Download Outlier Analysis PDF

Best mathematical & statistical books

S Programming

S is a high-level language for manipulating, analysing and showing facts. It kinds the foundation of 2 hugely acclaimed and favourite info research software program structures, the economic S-PLUS(R) and the Open resource R. This booklet presents an in-depth consultant to writing software program within the S language less than both or either one of these platforms.

IBM SPSS for Intermediate Statistics: Use and Interpretation, Fifth Edition (Volume 1)

Designed to aid readers examine and interpret study info utilizing IBM SPSS, this effortless ebook indicates readers tips to pick out the perfect statistic in line with the layout; practice intermediate records, together with multivariate information; interpret output; and write in regards to the effects. The ebook studies examine designs and the way to evaluate the accuracy and reliability of information; tips to make sure even if information meet the assumptions of statistical checks; easy methods to calculate and interpret influence sizes for intermediate records, together with odds ratios for logistic research; the way to compute and interpret post-hoc strength; and an outline of simple facts should you want a assessment.

An Introduction to Element Theory

A clean substitute for describing segmental constitution in phonology. This publication invitations scholars of linguistics to problem and think again their latest assumptions concerning the type of phonological representations and where of phonology in generative grammar. It does this via providing a complete creation to point concept.

Algorithmen von Hammurapi bis Gödel: Mit Beispielen aus den Computeralgebrasystemen Mathematica und Maxima (German Edition)

Dieses Buch bietet einen historisch orientierten Einstieg in die Algorithmik, additionally die Lehre von den Algorithmen,  in Mathematik, Informatik und darüber hinaus.  Besondere Merkmale und Zielsetzungen sind:  Elementarität und Anschaulichkeit, die Berücksichtigung der historischen Entwicklung, Motivation der Begriffe und Verfahren anhand konkreter, aussagekräftiger Beispiele unter Einbezug moderner Werkzeuge (Computeralgebrasysteme, Internet).

Additional info for Outlier Analysis

Sample text

As in the case of autoregressive models of continuous data, it is possible to use (typically Markovian) prediction-based techniques to forecast the value of a single position in the sequence. Deviations from forecasted values are identified as contextual outliers. It is often desirable to perform the prediction in real time in these settings. In other cases, anomalous events can be identified only by variations from the normal patterns exhibited by the subsequences over multiple time stamps. This is analogous to the problem of unusual shape detection in time-series data, and it represents a set of collective outliers.

This is referred to as the Positive-Unlabeled Classification (PUC) problem in machine learning. This variation is still quite similar to the fully supervised rare-class scenario, except that the classification model needs to be more cognizant of the contaminants in the negative (unlabeled) class. • Only instances of a subset of the normal and anomalous classes may be available, but some of the anomalous classes may be missing from the training data [388, 389, 538]. Such situations are quite common in scenarios such as intrusion detection in which some intrusions may be known, but other new types of intrusions are continually discovered over time.

For example, network intrusion events may cause aggregate change points in a network stream. On the other hand, individual point novelties may or may not correspond to aggregate change points. The latter case is similar to multidimensional anomaly detection with an efficiency constraint for the streaming scenario. Methods for anomaly detection in time-series data and multidimensional data streams are discussed in Chapter 9. 2 CHAPTER 1. AN INTRODUCTION TO OUTLIER ANALYSIS Discrete Sequences Many discrete sequence-based applications such as intrusion-detection and fraud-detection are clearly temporal in nature.

Download PDF sample

Rated 4.22 of 5 – based on 43 votes