Beneath the already existing Machine Learning Feature, we provide a new (experimental) feature, that focuses on classfication and utilizes the Aggregation operator for this. To use this feature, you need to install the Classification Feature.
This operator creates classifiers as output. These classifiers can be used in the classify operator to classify elements.
classifier = CLASSIFICATION_LEARNER( {LABELATTRIBUTE = 'label', ALGORITHM = 'WekaGeneric', SUBALGORITHM = 'J48', CLASSIFIEROPTIONS = '-U'}, windowed) |
Remark: Internally, this will be translated to. See Explanations of parameters there (LABELATTRIBUTE = LABEL_ATTRIBUTE, SUBALGORITHM=WEKA_ALGORITHM, CLASSIFIEROPTIONS=WEKA_OPTIONS)
classifier = AGGREGATION({ aggregations = [ ['FUNCTION' = 'NonIncrementalClassificationLearner', 'LABEL_ATTRIBUTE' = 'label', 'ALGORITHM' = 'WekaGeneric', 'WEKA_ALGORITHM' = 'J48', 'WEKA_OPTIONS' = '-U'] ], eval_at_new_element = false, eval_before_remove_outdating = true }, windowed ) |
This first version, the NonIncrementalClassificationLearner, is a wrapper for WEKA classifier (current supported version is 3.8) learners and needs the following parameters
Important: EVAL_AT_NEW_ELEMENT = false, EVAL_BEFORE_REMOVE_OUTDATING = true must be provided this way. Currently, there is no check, for this and output may be wrong.
classifier = AGGREGATION({AGGREGATIONS = [ ['FUNCTION' = 'IncrementalClassificationLearner', 'LABEL_ATTRIBUTE' = 'label', 'ALGORITHM' = 'HATT', 'BATCH_SIZE' = '100', 'CONFIDENCE' = '0.01'] ]}, trainingdata) |
This is an inkremental learner ('FUNCTION' = 'IncrementalClassificationLearner'). ATM only Hoeffding Anytime Tree (HATT) is supported ('ALGORITHM' = 'HATT'). The operator needs the following parameters:
This operator has two inputs:
The operator can be used with the following parameters:
classified = CLASSIFICATION(testdata, classifier) |
As it is not always feasable to create a new classifier for each new query, Odysseus provides an experimental approach to store and load classifiers. To avoid problems with not printable characters, use the MAP operator and convert the classifier to base64. This classifier can be written to a database or as in the following into a csv file:
map = MAP({EXPRESSIONS = [['base64encode(classifier)','encoded']]}, classifier) out = CSVFILESINK({SINK = 'output', WRITEMETADATA = false, FILENAME = '${PROJECTPATH}/out/classifierOut.csv'}, map) |
Reading of classifiers can be done as in the following and feed into a classification operator.
#PARSER PQL #ADDQUERY classIn = CSVFILESOURCE({SCHEMA = [['classifierBASE64', 'String']], FILENAME = '${PROJECTPATH}/out/classifierOut.csv', SOURCE = 'classifierSource'}) classifier = MAP({EXPRESSIONS = [['base64decode(classifierBASE64)','classifier']]}, classIn) classified = CLASSIFICATION(testdata, classifier) |
Remark: This work is experimental. Please provide an Bug Report (How to report a bug) if you find any problems.
It is also possible to use trained weka classifiert, that are not stored with Odysseus. In this case you will need to use a FilteredClassifier as meta classifier. In options add the wanted classifier. If data are based on strings, add StringToNominalFilter else add NumericToNominalFilter as option. Learn model and store. See following activity diagram (in german).
This operator has the following optional parameter:
isWekaModel: This factor makes it possible to use a model trained in Weka (outside of Odysseus) as input on port 1. The following example shows a use case, where the Weka model is first loaded from a database and then used in the CLASSIFICATION operator:
timer = TIMER({PERIOD = 1000000000, SOURCE = 'testdata'}) wekaModel = dbenrich({connection='connection3', query='SELECT id, model_name, labels, model_content, output_attributes FROM trained_models where id=5', multiTupleOutput='false', attributes=[]}, timer) classified = CLASSIFICATION({isWekaModel='true'}, testdata, wekaModel) |