Classification

Beneath the already existing Machine Learning Feature, we provide a new (experimental) feature, that focuses on classfication and utilizes the Aggregation operator for this. To use this feature, you need to install the Classification Feature.

Learner

This operator creates classifiers as output. These classifiers can be used in the classify operator to classify elements.

NonIncrementalClassificationLearner

classifier = AGGREGATION({AGGREGATIONS = [
	['FUNCTION' = 'NonIncrementalClassificationLearner',
	'LABEL_ATTRIBUTE' = 'label', 'ALGORITHM' = 'J48',
	'WEKA_OPTIONS' = '-U']
	], EVAL_AT_NEW_ELEMENT = false, EVAL_BEFORE_REMOVE_OUTDATING = true},
windowed)

This first version, the NonIncrementalClassificationLearner, is a wrapper for WEKA classifier (current supported version is 3.8) learners and needs the following parameters

LABEL_ATTRIBUTE: In the input data, which is the attribute with the label, that should be learned
ALGORITHM: Which WEKA Algorithm should be used. At the moment, the following algorithms are available. See WEKA for more information:
- BayesNet
- NaiveBayes
- NaiveBayesMultinomial
- NaiveBayesUpdateable
- GaussianProcesses
- Logistic
- MultilayerPerceptron
- SimpleLogistic
- SMO
- IBk
- KStar
- LWL
- DecisionTable
- JRip
- OneR
- PART
- DecisionStump
- HoeffdingTree
- J48
- LMT
- RandomForest
- RandomTree
- REPTree
WEKA_OPTIONS: The options that should be given for the algorithm (see https://weka.sourceforge.io/doc.stable/weka/classifiers/Classifier.html for information about the given parameters)

Important: EVAL_AT_NEW_ELEMENT = false, EVAL_BEFORE_REMOVE_OUTDATING = true must be provided this way. Currently, there is no check, for this and output may be wrong.

IncrementalClassificationLearner

classifier = AGGREGATION({AGGREGATIONS = [
 	['FUNCTION' = 'IncrementalClassificationLearner',
 	'LABEL_ATTRIBUTE' = 'label', 'ALGORITHM' = 'HATT',
 	'BATCH_SIZE' = '100', 'CONFIDENCE' = '0.01']
 	]}, trainingdata)

This is an inkremental learner ('FUNCTION' = 'IncrementalClassificationLearner'). ATM only Hoeffding Anytime Tree (HATT) is supported ('ALGORITHM' = 'HATT'). The operator needs the following parameters:

BATCH_SIZE: With this factor is it possible to define the number of elements that should be processed, before a new classifier is created.
CONFIDENCE: This is a HATT-specific parameter for the attribute selection

Classify

This operator has two inputs:

The first input is the source with the data, that should be classified (remark, this must be the same content as the Learner, without the label of course)
The second input is the classifier to use. This can be retrieved from a learner operator or read from outside.

classified = CLASSIFICATION(testdata, classifier)

Reading and Writing Classifier

As it is not always feasable to create a new classifier for each new query, Odysseus provides an experimental approach to store and load classifiers. To avoid problems with not printable characters, use the MAP operator and convert the classifier to base64. This classifier can be written to a database or as in the following into a csv file:

map = MAP({EXPRESSIONS = [['base64encode(classifier)','encoded']]}, classifier)

out = CSVFILESINK({SINK = 'output', WRITEMETADATA = false, FILENAME = '${PROJECTPATH}/out/classifierOut.csv'}, map)

Reading of classifiers can be done as in the following and feed into a classification operator.

#PARSER PQL
#ADDQUERY
classIn = CSVFILESOURCE({SCHEMA = [['classifierBASE64', 'String']], FILENAME = '${PROJECTPATH}/out/classifierOut.csv', SOURCE = 'classifierSource'})

classifier = MAP({EXPRESSIONS = [['base64decode(classifierBASE64)','classifier']]}, classIn)
classified = CLASSIFICATION(testdata, classifier)

Remark: This work is experimental. Please provide an Bug Report (How to report a bug) if you find any problems.

Space shortcuts

Page tree

Learner

NonIncrementalClassificationLearner

IncrementalClassificationLearner

Classify

Reading and Writing Classifier