You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

PMML can be enhanced at a lot of (nearly all) elements. These enhancements are added as extensions to child-elements at the appropriate elements. Further information can be found on http://dmg.org/pmml/v4-3/GeneralStructure.html#xsdElement_Extension.

The XSD for an extension provides own elements:

<xs:element name="Extension">
  <xs:complexType>
    <xs:complexContent mixed="true">
      <xs:restriction base="xs:anyType">
        <xs:sequence>
          <xs:any processContents="skip" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence> 
        <xs:attribute name="extender" type="xs:string" use="optional"/>
        <xs:attribute name="name" type="xs:string" use="optional"/>
        <xs:attribute name="value" type="xs:string" use="optional"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
</xs:element>


An example for an extension:

<DataField name="foo" dataType="double" optype="continuous">
  <Extension>
    <DataFieldSource sourceKnown="yes">
      <Source>derivedFromInput</Source>
    </DataFieldSource>
  </Extension>
</DataField>


General extensions should be defined for the header. An example for that is, that Kafka topics encapsulate data sources for each data schema. To bind to more than one topic, you could create unions with an optional number of topic elements, which serve as data sources.


<Header>
   <Extension>
      <KafkaTopics model="modelA">
         <Topic name="exampleTopic" />
      </KafkaTopics>
      <KafkaTopics model="modelB">
         <Topic name="exampleTopic2" />
         <Topic name="exampleTopicFoo" />
      </KafkaTopics>
   </Extension>
<!-- other Elements -->
</Header>

Structure

A possible access architecture might look like this:

The PMML document is loaded by an ACCESS operator (e.g. by a Kafka Topic) and runs through all necessary extensions to get all essential information. A mining operator, which gets the model as an input, can then pass on current models (if a new PMML is send over a Topic), without starting the query again. If a UNION operator is the prefix of a mining operator, you could access further Topics over a COMMAND operator in the future, as long as they don't change during runtime (this type of overloads can be defined in extentions). 


Query of Mining Models (PMML)


Name

Query of Mining Models

ActorsSystem PG EStream, System PG DAvE
TriggerSystem PG EStream is started
Description

Existing PMML models can be requested by an Http query. The Http query contains the latest appropriate PMML model if the query was successful and there was at least one appropriate PMML specification in the Big Data Archive. If there is no model with this specification yet, this information is sent as a Http response.

Pre-Requirements
  1. If the Http interface is protected through authentication, a PG EStream user with enough rights to access the needed queries exists.
  2. Default PMML models exist and don't have to be commissioned.
  3. There has to be a query available, which enables the user to request all accessible processes/models.
Essential Steps
  1. PMML model is requested for appropriate PMML specification.
  2. Based on the PMML specification a hash is computed and and used to search through the Big Data Archive for the last appropriate PMML model.
  3. If the request was successful and there was at least one PMML model matching the declared PMML specification, it is returned as a response.
Exceptions

Model doesn't exist, because the query is not an arranged standard PMML model:

  1. If the PMML model is not (anymore) supported, then a matching error message should be returned.
  2. If the requested is supported, matching information should be returned, stating that currently no appropriate model exists but that a new one can be commissioned.
Post-RequirementsLatest PMML model is available in Http response.
Time Behaviourimmediately
UML

Execute PMML in Odysseus

Execute correct product:

  1. Either check out the Odysseus or EStream branch.

  2. Execute the product from de.pgestream.odysseus.studio (so all necessary bundles are automatically loaded).

PMML Operators


ProtocolHandler:

To access a PMML document you can assign 'PMML' as protocol handler. There are no other parameters necessary and the PMML model has port 0 as an output. You can use e.g. Kafka as a transport handler to reload PMML documents during runtime of queries.

Parameters:

  • fireOnce: If the value is set to 'true', then only the very first document is read by the TransportHandler and the state of the operators is set to 'done' afterwards. If the InputStream ends somewhere else, the query is terminated. 

Following example shows access to a PMML document provided by DMG.

ProtocolHandler
models = ACCESS({
    source='pmml-data',
    wrapper='GenericPull',
    transport='HTTP',
    protocol='PMML',
    dataHandler='Tuple',
    options=[
        ['uri', 'http://dmg.org/pmml/pmml_examples/rattle_pmml_examples/AuditKMeans.xml'], 
        ['method', 'get'],
        ['scheduler.delay', '100000'],
        ['fireOnce', 'true']
    ],
    schema=[
        ['models', 'Object'],
    ]
})

Operator:

A PMML document can potentially contain more than one mining models and is therefore outside of the Transporthandler. After the procurement of models you can choose or run a model by name with the operator PMML_EVAL.

The input on port 1 (data) should contain data, which corresponds to the schema of the requested data of the PMML document. More attributes are ignored. Different models can be executed with the same data without integrating more operators in between for filtering, or transporting more meta-data with the analyzed data.

Parameters:

  • modelName: Every model within a PMML document has a unique name, so if 'default' is chosen the first model (in XML-order) is executed.
  • output (optional, default: NONE): Describes the output mode of the operator, where you can choose between none, model and input 
    • NONE: Output is only the result, the attribute is then called 'prediction'.
    • MODEL: All  fields used by the model are output in addition to 'prediction'. 'prediction' is put at the end of the column.
    • INPUT: All values from the data are forwarded to the output.
  • stackSize (optional, default: 100): Describes FIFO cache as a stack, which is used if the data returns elements, but no EvaluationModel was received. As soon as a model is received, all elements in the stack are executed before the next arriving element (until maximum size of the stack is reached). After each new model, the stack gets emptied.

Following example shows usage of thePMML_EVAL operator:

PMML_EVAL Operator
results = PMML_EVAL({
    modelName='KMeans_Model',
    output='input',
    stackSize=5
}, models, data)

Complete example:

In http://dmg.org/pmml/pmml_examples/index.html you can find examples with correlating data sets: 

PMML Example
#PARSER PQL
#RUNQUERY

models = ACCESS({
    source='pmml-data',
    wrapper='GenericPull',
    transport='HTTP',
    protocol='PMML',
    dataHandler='Tuple',
    options=[
        ['uri', 'http://dmg.org/pmml/pmml_examples/rattle_pmml_examples/AuditKMeans.xml'], 
        ['method', 'get'],
        ['scheduler.delay', '100000'],
        ['fireOnce', 'true']
    ],
    schema=[
        ['meta', 'Object'],
        ['models', 'Object'],
        ['dictionary', 'Object']
    ]
})

/// data to be tested against
data = ACCESS({
    source='audit-data',
    wrapper='GenericPull',
    transport='HTTP',
    protocol='CSV',
    dataHandler='Tuple',
    options=[
        ['uri', 'http://dmg.org/pmml/pmml_examples/Audit.csv'], 
        ['method', 'get'],
        ['readfirstline', 'false'],
        ['delimiter',','],
        ['textDelimiter','"'],  
        ['', '']
    ],
    schema=[
        ['ID', 'Integer'],
        ['Age', 'Integer'],
        ['Employment', 'String'],
        ['Education', 'String'],
        ['Marital', 'String'],
        ['Occupation', 'String'],
        ['Income', 'Double'],
        ['Gender', 'String'],
        ['Deductions', 'Double'],
        ['Hours', 'Double'],
        ['IGNORE_Accounts', 'String'],
        ['RISK_Adjustment', 'Double'],
        ['TARGET_Adjusted', 'Double']
    ]
})

results = PMML_EVAL({
    modelName='KMeans_Model',
    output='input',
    stackSize=5
}, models, data)
  • No labels