WEKA Classification Algorithms

1.7 GPL (GNU General Public License)    
3.2/5 22
WEKA Classification Algorithms is a WEKA Plug-in.




WEKA Classification Algorithms is a WEKA Plug-in.

It provides implementation for a number of artificial neural network (ANN) and artificial immune system (AIS) based classification algorithms for the WEKA (Waikato Environment for Knowledge Analysis) machine learning workbench.

The WEKA platform was selected for the implementation of the selected algorithms because I think its an excellent piece of free software. The WEKA project is required to run the algorithms provided in this project, and is included in the download. This is an open source project (released under the GPL) so the source code is available.


Learning Vector Quantization (LVQ)
Self-Organizing Map (SOM)
Feed-Forward Artificial Neural Network (FF-ANN)
Artificial Immune Recognition System (AIRS)
Clonal Selection Algorithm (CLONALG)

What is Learning Vector Quantization?
A competitive learning algorithm said to be a supervised version of the Self-Organizing Map (SOM) algorithm by Kohonen
Goal of the algorithm is to approximate the distribution of a class using a reduced number of codebook vectors where the algorithm seeks to minimise classification errors
Codebook vectors become exemplars for a particular class - attempting to represent class boundaries
The algorithm does not construct a topographical ordering of the dataset (there is no concept of explicit neighbourhood in LVQ as there is in the SOM algorithm)
Algorithm was proposed by Kohonen in 1986 as an improvement over Labelled Vector Quantization
The algorithm is associated with the neural network class of learning algorithms, though works significantly differently compared to conventional feed-forward networks like Back Propagation

What are some advantages of the Learning Vector Quantization algorithm?
The model is trained significantly faster than other neural network techniques like Back Propagation
It is able to summarise or reduce large datasets to a smaller number of codebook vectors suitable for classification or visualisation
Able to generalise features in the dataset providing a level of robustness
Can approximate just about any classification problem as long as the attributes can be compared using a meaningful distance measure
Not limited in the number of dimensions in the codebook vectors like nearest neighbour techniques
Normalisation of input data is not required (normalised may improve accuracy if attribute values vary greatly)
Can handle data with missing values
The generated model can be updated incrementally

What are some disadvantages of the Learning Vector Quantization algorithm?
Need to be able to generate useful distance measures for all attributes (Euclidean is usually used for numeric attributes)
Model accuracy is highly dependent on the initialisation of the model as well as the learning parameters used (learning rate, training iterations, etcetera)
Accuracy is also dependent on the class distribution in the training dataset, a good distribution of samples is needed to construct useful models
It is difficult to determine a good number of codebook vectors for a given problem
Last updated on October 2nd, 2007

0 User reviews so far.