WEKA Classification Algorithms is a WEKA Plug-in.
It provides implementation for a number of artificial neural network (ANN) and artificial immune system (AIS) based classification algorithms for the WEKA (Waikato Environment for Knowledge Analysis) machine learning workbench.
The WEKA platform was selected for the implementation of the selected algorithms because I think its an excellent piece of free software. The WEKA project is required to run the algorithms provided in this project, and is included in the download. This is an open source project (released under the GPL) so the source code is available.
· Learning Vector Quantization (LVQ)
· Self-Organizing Map (SOM)
· Feed-Forward Artificial Neural Network (FF-ANN)
· Artificial Immune Recognition System (AIRS)
· Clonal Selection Algorithm (CLONALG)
What is Learning Vector Quantization?
· A competitive learning algorithm said to be a supervised version of the Self-Organizing Map (SOM) algorithm by Kohonen
· Goal of the algorithm is to approximate the distribution of a class using a reduced number of codebook vectors where the algorithm seeks to minimise classification errors
· Codebook vectors become exemplars for a particular class - attempting to represent class boundaries
· The algorithm does not construct a topographical ordering of the dataset (there is no concept of explicit neighbourhood in LVQ as there is in the SOM algorithm)
· Algorithm was proposed by Kohonen in 1986 as an improvement over Labelled Vector Quantization
· The algorithm is associated with the neural network class of learning algorithms, though works significantly differently compared to conventional feed-forward networks like Back Propagation
What are some advantages of the Learning Vector Quantization algorithm?
· The model is trained significantly faster than other neural network techniques like Back Propagation
· It is able to summarise or reduce large datasets to a smaller number of codebook vectors suitable for classification or visualisation
· Able to generalise features in the dataset providing a level of robustness
· Can approximate just about any classification problem as long as the attributes can be compared using a meaningful distance measure
· Not limited in the number of dimensions in the codebook vectors like nearest neighbour techniques
· Normalisation of input data is not required (normalised may improve accuracy if attribute values vary greatly)
· Can handle data with missing values
· The generated model can be updated incrementally
What are some disadvantages of the Learning Vector Quantization algorithm?
· Need to be able to generate useful distance measures for all attributes (Euclidean is usually used for numeric attributes)
· Model accuracy is highly dependent on the initialisation of the model as well as the learning parameters used (learning rate, training iterations, etcetera)
· Accuracy is also dependent on the class distribution in the training dataset, a good distribution of samples is needed to construct useful models
· It is difficult to determine a good number of codebook vectors for a given problem