This operator is used to aggregate and group the input stream.
Parameter
group_by:
An optional list of attributes over which the grouping should occur.aggregations:
A list if elements where each element contains up to four string:- The name of the aggregate function, e.g. MAX
- The input attribute over which the aggregation should be done
- The name of the output attribute for this aggregation
- The optional type of the output
dumpAtValueCount:
This parameter is used in the interval approach. Here a result is generated, each time an aggregation cannot be changed anymore. This leads to fewer results with a larger validity. With this parameter the production rate can be raised. A value of 1 means, that for every new element the aggregation operator receives new output elements are generated. This leads to more results with a shorter validity.outputPA
: This parameter allow to dump partial aggregates instead of evaluted values. The partial aggregates can be send to other aggregation operators and do a final aggregation (e.g. in case of distribution). The input schema of an aggregate operator that read partial aggregates must state a datatype that is a partial aggregated (see example below). Remark: Aggregate has one input and requires ordered input. To combine different parital aggregations e.g. a union operator is needed to reorder the input elements.
Aggregation Functions
The set of aggregate functions is extensible. The following list is in the core Odysseus:
MAX
: The maximum elementMIN
: The minimum elementAVG
: The average elementSUM
: The sum of all elementsCOUNT
: The number of elementsSTDDEV
: The standard deviation
Some nonstandard aggregations: These should only be used, if you a familiar with them:
FIRST
: The first elementLAST
: The last elementNTH
: The nth elementRATE
: The number of elements per time unit- NEST: Nest the attribute values in a list
- COMPLETENESS: Ratio of NULL-value elements to number of elements
Example:
PQL
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
output = AGGREGATE({
group_by = ['bidder'],
aggregations=[ ['MAX', 'price', 'max_price', 'double'] ]
}, input)
// Parital Aggregate example
pa = AGGREGATE({
name='PRE_AGG',
aggregations=[
['count', 'id', 'count', 'PartialAggregate'],
['sum', 'id', 'avgsum', 'PartialAggregate'],
['min', 'id', 'min', 'PartialAggregate'],
['max', 'id', 'max', 'PartialAggregate']
],
outputpa='true'
},
nexmark:person
)
out = AGGREGATE({
name='AGG',
aggregations=[
['count', 'count', 'count', 'Integer'],
['sum', 'avgsum', 'sum', 'Double'],
['avg', 'avgsum', 'avg', 'Double'],
['min', 'min', 'min', 'Integer'],
['max', 'max', 'max', 'Integer']
]
},
pa
) |
CQL
Code Block | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
SELECT MAX(price) AS max_price FROM input GROUP BY bidder |