This operator can be used to process median results faster than in an aggregation for the special case that every element that is delivered to the operator has an end timestamp that is larger of equals the last end time stamp. If your stream does not fullfill this, the operator will produce wrong results.

Parameter

  • Attribute: The attribute for which the median should be calculated
  • Group_by: A set of attributes. For each distinct set a median will be calculated. This is the same as group by in aggregations
  • appendGlobalMedian(boolean): If a GROUP_BY element is given, the global median (i.e. median without respecting groups) will be annotated to each element.
  • numerical (boolean): Is the input numeric. In cases of a even element set that should be used to calc the median, the average of the both two middle elements will be used, else the left middle element.
  • percentiles: This is a list of double values. If given, not only the 0.5 percentile (== median) will be calculated for the attribute, but also the given percentiles with values 0<x<1
  • histogram (boolean): The are different algorithms implements to calculate the median. If the possible set of values, contains many equal values, the histogramm version should behave muchbetter.
  • roundingFactor (long): When using the histogram version of the operator, this factor can be used to create more equals elements by rounding the attribute. The factor gives the number of elements after the decimal point, i.e. 1 means no, 10 means 1, 100 means 2, and so on.

Example

FASTMEDIAN({name="PlugMedian",
            attribute='value',
            numerical=true,
            histogram = true,
            group_by=['house_id', 'household_id', 'plug_id'],
            appendglobalmedian = false,
            roundingfactor=100
           },WINDOW)                         
  • No labels