PMML can be enhanced at a lot of (nearly all) elements. These enhancements are added as extensions to child-elements at the appropriate elements. Further information can be found on http://dmg.org/pmml/v4-3/GeneralStructure.html#xsdElement_Extension.

The XSD for an extension provides own elements:

<xs:element name="Extension">
  <xs:complexType>
    <xs:complexContent mixed="true">
      <xs:restriction base="xs:anyType">
        <xs:sequence>
          <xs:any processContents="skip" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence> 
        <xs:attribute name="extender" type="xs:string" use="optional"/>
        <xs:attribute name="name" type="xs:string" use="optional"/>
        <xs:attribute name="value" type="xs:string" use="optional"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>


An example for an extension:

<DataField name="foo" dataType="double" optype="continuous">
  <Extension>
    <DataFieldSource sourceKnown="yes">
      <Source>derivedFromInput</Source>
    </DataFieldSource>
  </Extension>
</DataField>


General extensions should be defined for the header. An example for that is, that Kafka topics encapsulate data sources for each data schema. To bind to more than one topic, you could create unions with an optional number of topic elements, which serve as data sources.


<Header>
   <Extension>
      <KafkaTopics model="modelA">
         <Topic name="exampleTopic" />
      </KafkaTopics>
      <KafkaTopics model="modelB">
         <Topic name="exampleTopic2" />
         <Topic name="exampleTopicFoo" />
      </KafkaTopics>
   </Extension>
<!-- other Elements -->
</Header>

Structure

A possible access architecture might look like this:

The PMML document is loaded by an ACCESS operator (e.g. by a Kafka Topic) and runs through all necessary extensions to get all essential information. A mining operator, which gets the model as an input, can then pass on current models (if a new PMML is send over a Topic), without starting the query again. If a UNION operator is the prefix of a mining operator, you could access further Topics over a COMMAND operator in the future, as long as they don't change during runtime (this type of overloads can be defined in extentions). 


Query of Mining Models (PMML)


Name

Query of Mining Models

ActorsSystem PG EStream, System PG DAvE
TriggerSystem PG EStream is started
Description

Existing PMML models can be requested by an Http query. The Http query contains the latest appropriate PMML model if the query was successful and there was at least one appropriate PMML specification in the Big Data Archive. If there is no model with this specification yet, this information is sent as a Http response.

Pre-Requirements
  1. If the Http interface is protected through authentication, a PG EStream user with enough rights to access the needed queries exists.
  2. Default PMML models exist and don't have to be commissioned.
  3. There has to be a query available, which enables the user to request all accessible processes/models.
Essential Steps
  1. PMML model is requested for appropriate PMML specification.
  2. Based on the PMML specification a hash is computed and and used to search through the Big Data Archive for the last appropriate PMML model.
  3. If the request was successful and there was at least one PMML model matching the declared PMML specification, it is returned as a response.
Exceptions

Model doesn't exist, because the query is not an arranged standard PMML model:

  1. If the PMML model is not (anymore) supported, then a matching error message should be returned.
  2. If the requested is supported, matching information should be returned, stating that currently no appropriate model exists but that a new one can be commissioned.
Post-RequirementsLatest PMML model is available in Http response.
Time Behaviourimmediately
UML

Execute PMML in Odysseus

Execute correct product:

  1. Either check out the Odysseus or EStream branch.

  2. Execute the product from de.pgestream.odysseus.studio (so all necessary bundles are automatically loaded).

PMML Operators


ProtocolHandler:

To access a PMML document you can assign 'PMML' as protocol handler. There are no other parameters necessary and the PMML model has port 0 as an output. You can use e.g. Kafka as a transport handler to reload PMML documents during runtime of queries.

Parameters:

Following example shows access to a PMML document provided by DMG.

models = ACCESS({
    source='pmml-data',
    wrapper='GenericPull',
    transport='HTTP',
    protocol='PMML',
    dataHandler='Tuple',
    options=[
        ['uri', 'http://dmg.org/pmml/pmml_examples/rattle_pmml_examples/AuditKMeans.xml'], 
        ['method', 'get'],
        ['scheduler.delay', '100000'],
        ['fireOnce', 'true']
    ],
    schema=[
        ['models', 'Object'],
    ]
})

Operator:

A PMML document can potentially contain more than one mining models and is therefore outside of the Transporthandler. After the procurement of models you can choose or run a model by name with the operator PMML_EVAL.

The input on port 1 (data) should contain data, which corresponds to the schema of the requested data of the PMML document. More attributes are ignored. Different models can be executed with the same data without integrating more operators in between for filtering, or transporting more meta-data with the analyzed data.

Parameters:

Following example shows usage of thePMML_EVAL operator:

results = PMML_EVAL({
    modelName='KMeans_Model',
    output='input',
    stackSize=5
}, models, data)

Complete example:

In http://dmg.org/pmml/pmml_examples/index.html you can find examples with correlating data sets: 

#PARSER PQL
#RUNQUERY

models = ACCESS({
    source='pmml-data',
    wrapper='GenericPull',
    transport='HTTP',
    protocol='PMML',
    dataHandler='Tuple',
    options=[
        ['uri', 'http://dmg.org/pmml/pmml_examples/rattle_pmml_examples/AuditKMeans.xml'], 
        ['method', 'get'],
        ['scheduler.delay', '100000'],
        ['fireOnce', 'true']
    ],
    schema=[
        ['meta', 'Object'],
        ['models', 'Object'],
        ['dictionary', 'Object']
    ]
})

/// data to be tested against
data = ACCESS({
    source='audit-data',
    wrapper='GenericPull',
    transport='HTTP',
    protocol='CSV',
    dataHandler='Tuple',
    options=[
        ['uri', 'http://dmg.org/pmml/pmml_examples/Audit.csv'], 
        ['method', 'get'],
        ['readfirstline', 'false'],
        ['delimiter',','],
        ['textDelimiter','"'],  
        ['', '']
    ],
    schema=[
        ['ID', 'Integer'],
        ['Age', 'Integer'],
        ['Employment', 'String'],
        ['Education', 'String'],
        ['Marital', 'String'],
        ['Occupation', 'String'],
        ['Income', 'Double'],
        ['Gender', 'String'],
        ['Deductions', 'Double'],
        ['Hours', 'Double'],
        ['IGNORE_Accounts', 'String'],
        ['RISK_Adjustment', 'Double'],
        ['TARGET_Adjusted', 'Double']
    ]
})

results = PMML_EVAL({
    modelName='KMeans_Model',
    output='input',
    stackSize=5
}, models, data)