Clj-ml 0.4.0 API documentation

Machine Learning library for Clojure built around Weka and friends



This namespace contains several functions for building classifiers using different
classification algorithms: Bayes networks, multilayer perceptron, decision tree or
support vector machines are available. Some of these classifiers have incremental
versions so they can be built without having all the dataset instances in memory.

Functions for evaluating the classifiers built using cross validation or a training
set are also provided.

A sample use of the API for classifiers is shown below:

 (use 'clj-ml.classifiers)

 ; Building a classifier using a  C4.5 decision tree
 (def *classifier* (make-classifier :decision-tree :c45))

 ; We set the class attribute for the loaded dataset.
 ; *dataset* is supposed to contain a set of instances.
 (dataset-set-class *dataset* 4)

 ; Training the classifier
 (classifier-train *classifier* *dataset*)

 ; We evaluate the classifier using a test dataset
 (def *evaluation*   (classifier-evaluate *classifier* :dataset *dataset* *trainingset*))

 ; We retrieve some data from the evaluation result
 (:kappa *evaluation*)
 (:root-mean-squared-error *evaluation*)
 (:precision *evaluation*)

 ; A trained classifier can be used to classify new instances
 (def *to-classify* (make-instance *dataset*  {:class :Iris-versicolor
                                               :petalwidth 0.2
                                               :petallength 1.4
                                               :sepalwidth 3.5
                                               :sepallength 5.1}))

 ; We retrieve the index of the class value assigned by the classifier
 (classifier-classify *classifier* *to-classify*)

 ; We retrieve a symbol with the value assigned by the classifier
 ; and assigns it to a certain instance
 (classifier-label *classifier* *to-classify*)

A classifier can also be trained using cross-validation:

 (classifier-evaluate *classifier* :cross-validation *dataset* 10)

Finally a classifier can be stored in a file for later use:

 (use 'clj-ml.utils)

 (serialize-to-file *classifier*


This namespace contains several functions for
building clusterers using different clustering algorithms. K-means, Cobweb and
Expectation maximization algorithms are currently supported.

Some of these algorithms support incremental building of the clustering without
having the full data set in main memory. Functions for evaluating the clusterer
as well as for clustering new instances are also supported

This namespace contains several functions for
building creating and manipulating data sets and instances. The formats of
these data sets as well as their classes can be modified and assigned to
the instances. Finally data sets can be transformed into Clojure sequences
that can be transformed using usual Clojure functions like map, reduce, etc.


Generates different distance metrics that can be passed as parameters to certain
classifiers and clusterers like K-Means.

Euclidean, Manhattan and Chebysev distance functions are supported.

Public variables and functions:


This namespace defines a set of functions that can be applied to data sets to modify the
dataset in some way: transforming nominal attributes into binary attributes, removing
attributes etc.

There are a number of ways to use the filtering API.  The most straight forward and
idomatic clojure way is to use the provided filter fns:

  ;; ds is the dataset
  (def ds (make-dataset :test [:a :b {:c [:g :m]}]
                                  [ [1 2 :g]
                                    [2 3 :m]
                                    [4 5 :g]]))
  (def filtered-ds
     (-> ds
         (add-attribute {:type :nominal, :column 1, :name "pet", :labels ["dog" "cat"]})
         (remove-attributes {:attributes [:a :c]})))

The above functions rely on lower level fns that create and apply the filters which you may
also use if you need more control over the actual filter objects:

  (def filter (make-filter :remove-attributes {:dataset-format ds :attributes [:a :c]}))

  ;; We apply the filter to the original data set and obtain the new one
  (def filtered-ds (filter-apply filter ds))

The previous sample of code could be rewritten with the make-apply-filter function:

  (def filtered-ds (make-apply-filter :remove-attributes {:attributes [:a :c]} ds))

Functions for reading and saving datasets, classifiers and clusterers to files and other
persistence mechanisms.

Public variables and functions:


Kernel functions that can be passed as parameters to support vector machines classifiers.

Polynomic, radial basis and string kernels are supported



Public variables and functions: