You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

This operator clusters a set of tuples.

Parameter

  • ATTRIBUTES: The attributes of the incoming tuples that should be recognized for clustering by the distance/similarity function. Notice, not all kinds of attribute types work here
  • CLUSTERER: The clustering algorithm that should be used
    • Currently implemented: kMeans, Weka (which in turn has further algorithms)
  • ALGORITHM: A set of options to describe the algorithm

Example

This example uses the weka-clusterer. The weka-clusterer should use the "simplekmeans" algorithm. the arguments to set up the weka-simplekmeans is "-N 3".

Operator
clustered = CLUSTERING({
                attributes=['age', 'income'],
                clusterer='weka',                
                algorithm =                
                  [
                  'model'='SimplekMeans'
                  , 'arguments'='-N 3'
                  ]                                   
              }, inputoperator)

For weka, there are currently the following algorithms that can be used as the "model". Further details and possible arguments can be  found in the Weka Docs

  • No labels