By Honghua Dai, Ramakrishnan Srikant, Chengqi Zhang

This e-book constitutes the refereed complaints of the eighth Pacific-Asia convention on wisdom Discovery and information mining, PAKDD 2004, beld in Sydney, Australia in may perhaps 2004.

The 50 revised complete papers and 31 revised brief papers provided have been conscientiously reviewed and chosen from a complete of 238 submissions. The papers are equipped in topical sections on type; clustering; organization ideas; novel algorithms; occasion mining, anomaly detection, and intrusion detection; ensemble studying; Bayesian community and graph mining; textual content mining; multimedia mining; textual content mining and net mining; statistical tools, sequential information mining, and time sequence mining; and biomedical information mining.

**Additional info for Advances in Knowledge Discovery and Data Mining: 8th Pacific-Asia Conference, PAKDD 2004, Sydney, Australia, May 26-28, 2004, Proceedings**

**Sample text**

We call this method SVMs with heterogeneous feature kernels (denoted SVM-HF). The complete pseudo-code is shown in Figure 1. This approach is directly related to our previous work on Cross- Training [4] where label mappings between two different taxonomies help in building better classification models for each of the taxonomies. Testing: During application, all test documents are classified using S(0). For each document, the transformed scores are appended in the new columns with appropriate scaling.

Definition 5. A cell is called isolated if its neighboring cells are all cells. The data points, which are contained in some isolated sparse cells, are defined as noises. A proper density-connected equivalent subclass usually contains most data points of each cluster. The data points of sparse cell may be either the boundary points of some cluster or outliers. The property for each sparse cell to be isolated or not is an important key to recognize border points and outliers. Outliers are essentially those points of isolated sparse cell.

The scaling factor: The differential scaling of term and feature dimensions has special reasons. This applies a special kernel function to documents during training S(1). The kernel function in linear SVMs gives the similarity between two document vectors, When document vectors are scaled to unit norm, this becomes simply the of the angle between the two document vectors, a standard IR similarity measure.