This page describes how to use the machine learning (aka mining) bundle.
This operator clusters a set of tuples.
This example uses the weka-clusterer. The weka-clusterer should use the "simplekmeans" algorithm. the arguments to set up the weka-simplekmeans is "-N 3".
clustered = CLUSTERING({ attributes=['age', 'income'], clusterer='weka', algorithm = [ 'model'='SimplekMeans' , 'arguments'='-N 3' ] }, inputoperator) |
For weka, there are currently the following algorithms that can be used as the "model". Further details and possible arguments can be found in the Weka Docs
This operator is used to create a classifier. Therefore, the result is a stream of classifiers (this is an own datatype!)
This example uses the weka-clusterer. The weka-clusterer should use the "simplekmeans" algorithm. the arguments to set up the weka-simplekmeans is "-N 3".
learned = CLASSIFICATION_LEARN({ class='attack', nominals = ['attack'=['back', 'smurf', 'spy']], learner = 'weka', algorithm = [ 'model'='J48' ] }, inputoperator) |
For weka, there are currently the following algorithms that can be used as the "model". Further details and possible arguments can be found in the Weka Docs
Classification (nominal values):
Regression (continuous values):
This operator classifies a tuple by using a classifier. The operator needs two inputs: A stream of tuples that should be classified and a stream of classifiers (that normally comes from a CLASSIFICATION_LEARN operator).
It a appends a new attribute called "clazz" which contains the nominal class value or continuous value from a regression
For the classify operator, the type of the classifier (tree, list, bayes net... ) doesn't matter. You may even mixup them to classify the same tuple with different classifiers (see Ensembles). The left port is the input for the tuples that should be classified and the right input is the one with the classifiers.
classified = CLASSIFY({classifier='classifier'}, test, learned) |
This operator create frequent item sets from a given stream.
The result stream creates a tuple with 3 attributes:
This example uses FP-Growth for finding frequent item sets. it does not need any parameters in algorithm
/// support is 3 out of 1000 transactions fpm = FREQUENTITEMSET({support=3.0, transactions=1000, learner = 'fpgrowth'}, inputoperator) /// support is 60% out of 1000 transactions, so it is equal to a support of 600.0 fpm = FREQUENTITEMSET({support=0.6, transactions=1000, learner = 'fpgrowth'}, inputoperator) |
This operator uses a list of tuples and creates rules like "x => y".
A rule is a special datatype called "AssociationRule", which is principally a tuple of two patterns (one for the premise and one for the consequnce of the rule)
This example generates only rule with a confidence of 60% or more
rule = GENERATERULES({ itemset='set', confidence=0.6, support='support' }, fpm) |
You can simply combine all operators with other operators in Odysseus to create ensembles.
If we have, for examaple a stream with windspeed and power (speedandpower) and another one containing the windspeed (e.g. a forecast).
Then, it is possible to create different regressionfunctions, use them and weight the regression results by an aggregation.
One example:
/// create the first classifier by using SMO smo = CLASSIFICATION_LEARN({ class='power', learner = 'weka', algorithm = ['model'='SMO-REGRESSION'] }, speedandpower ) /// create the second classifier by using gaussian processes gaussian = CLASSIFICATION_LEARN({ class='power', learner = 'weka', algorithm = ['model'='GAUSSIAN-PROCESSES'] }, speedandpower ) /// create the thirs classifier by using a linear regression linear = CLASSIFICATION_LEARN({ class='power', learner = 'weka', algorithm = ['model'='LINEAR-REGRESSION'] }, speedandpower ) /// union them all into one stream unioned = UNION(smo, gaussian, linear) /// then, classify them - each tuple will be classified by using all three classifiers ensemble = CLASSIFY(speed, unioned) /// aggregate the clazz using average - which allows a weighted kind of voting agg = AGGREGATE({ aggregations=[ ['AVG', 'clazz', 'powerForecast', 'DOUBLE'] ] }, ensemble ) |
to be added