Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The field of use of Data Mining is hard to narrow and also the representation of the process is very abstract. The Predictive Model Markup Language (PMML) is a

standard exchange format by DMG for the process itself and the structure of the needed data, so that the compability of different software developer is maximized.

PMML follows a XML-Scheme which can be found on the Homepage of DMG  http://dmg.org/pmml/v4-3/GeneralStructure.html

 Das Austauschformat folgt einem
XML-Schema, welches auf der Homepage von der
DMG\footnote{DMG Homepage \url{http://dmg.org/pmml/v4-3/GeneralStructure.html}}
eingesehen werden kann. Da der Standard mittlerweile (aktuell in v4.3, Stand
April 2017) über 700 Sprachelemente verfügt, wird hier des Umfangs halber darauf

...

 homepage of DMG. The standard (now in v4.3, state october 2017) has about 700 language elements.

PMML can be enhanced at a lot of (nearly all) elements. These enhancements are added as extensions to child-elements at the appropriate elements. Further information can be found here.

...

General extensions should be defined for the header. An example for that is, that KafkaTopics encapsulates data sources for each data schema. To bind to more than one topic, you could create unions with an optional number of topic elements, which serve as data sources.

...

The PMML document is loaded by an ACCESS operator (e.g. by a Kafka Topictopic) and runs through all necessary extensions to get all essential information. A mining operator, which gets the model as an input, can then pass on current models (if a new PMML is send over a Topictopic), without starting the query again. If a UNION operator is the prefix of a mining operator, you could access further Topics topics over a COMMAND operator in the future, as long as they don't change during runtime (this type of overloads can be defined in extentions). 

Execute PMML in Odysseus

Execute correct product:

...

...

Execute the product from de.pgestream.odysseus.studio (so all necessary bundles are automatically loaded).


PMML Operators

ProtocolHandler:

To access a PMML document you can assign 'PMML' as protocol handler. There are no other parameters necessary and the PMML model has port 0 as an output. You can use e.g. Kafka as a transport handler to reload PMML documents during runtime of queries.

Parameters:

  • fireOnce: If the value is set to 'true', then only the very first document is read by the TransportHandler and the state of the operators is set to 'done' afterwards. If the InputStream ends somewhere else, the query is terminated. 

...

A PMML document can potentially contain more than one mining models and is therefore outside of the TransporthandlerTransportHandler. After the procurement of models you can choose or run a model by name with the operator PMML_EVAL.

The input on port 1 (data) should contain data, which corresponds to the schema of the requested data of the PMML document. More attributes are ignored. Different models can be executed with the same data without integrating more operators in between for filtering, or transporting more meta-data with the analyzed data.

Parameters:

  • modelName: Every model within a PMML document has a unique name, so if 'default' is chosen the first model (in XML-order) is executed.
  • output (optional, default: NONE): Describes the output mode of the operator, where you can choose between none, model and input 
    • NONE: Output is only the result, the attribute is then called 'prediction'.
    • MODEL: All  fields used by the model are output in addition to 'prediction'. 'prediction' is put at the end of the column.
    • INPUT: All values from the data are forwarded to the output.
  • stackSize (optional, default: 100): Describes FIFO cache as a stack, which is used if the data returns elements, but no EvaluationModel was received. As soon as a model is received, all elements in the stack are executed before the next arriving element (until maximum size of the stack is reached). After each new model, the stack gets emptied.

Following example shows usage of thePMMLthe PMML_EVAL operator:

Code Block
languagejs
titlePMML_EVAL Operator
linenumberstrue
results = PMML_EVAL({
    modelName='KMeans_Model',
    output='input',
    stackSize=5
}, models, data)

Complete example:

In http://dmg.org/pmml/pmml_examples/index.html you can find examples with correlating data sets: 

...