This operator clusters a set of tuples. Available mining or machine learning operators are described here: Machine Learning
Parameter
- ATTRIBUTES: The attributes of the incoming tuples that should be recognized for clustering by the distance/similarity function. Notice, not all kinds of attribute types work here
- CLUSTERER: The clustering algorithm that should be used
- Currently implemented: kMeans, Weka (which in turn has further algorithms)
- ALGORITHM: A set of options to describe the algorithm
Example
This example uses the weka-clusterer. The weka-clusterer should use the "simplekmeans" algorithm. the arguments to set up the weka-simplekmeans is "-N 3".
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
clustered = CLUSTERING({
attributes=['age', 'income'],
clusterer='weka',
algorithm =
[
'model'='SimplekMeans'
, 'arguments'='-N 3'
]
}, inputoperator) |
For weka, there are currently the following algorithms that can be used as the "model". Further details and possible arguments can be found in the Weka Docs
- SIMPLEKMEANS (http://weka.sourceforge.net/doc.dev/weka/clusterers/SimpleKMeans.html)
- EM (http://weka.sourceforge.net/doc.dev/weka/clusterers/EM.html)
- COBWEB (http://weka.sourceforge.net/doc.dev/weka/clusterers/Cobweb.html)
- FARTHESTFIRST (http://weka.sourceforge.net/doc.dev/weka/clusterers/FarthestFirst.html)
- DENSITY_KMEANS (http://weka.sourceforge.net/doc.dev/weka/clusterers/MakeDensityBasedClusterer.html)
- HIERARCHICAL (http://weka.sourceforge.net/doc.dev/weka/clusterers/HierarchicalClusterer.html)