Page History

The probabilistic feature provides functions and operators to process discrete and continuous probabilistic values in a data stream. Continuous probabilistic values are represented using Gaussian Mixtures.

To enable the probabilistic processing you have to include the probabilistic feature and use the probabilistic metadata (#METADATA Probabilistic) in your Odysseus script.

Estimating probabilistic values

ToDo:

Expectation Maximization

The EM operator allows the fit a Gaussian mixture model (GMM) with predefined number of mixtures to the values of a data stream.

Kalman Filter

The Kalman Filter operator can be used if the variance of of the values in the data stream is known from some datasheet.

Filtering probabilistic values

For filtering probabilistic filtering probabilistic values you can use the same syntax that you already use for deterministic values. However, the result of the operators differ. In case of discrete probabilistic values the Select operator returns a tuple with a lower tuple existence probability.

...

Code Block

theme	Eclipse
language	javascriptPQL
title	Probabilistic discrete select
linenumbers	true

filteroutput = SELECT({predicate = RelationalPredicateProbabilisticRelationalPredicate('x > 1.0 AND x < 3.0')}, input)

...

Code Block

theme	Eclipse
language	javascriptpql
title	Probabilistic continuous select
linenumbers	true

filteroutput = SELECT({predicate = RelationalPredicateProbabilisticRelationalPredicate('x > 1.0 AND x < 4.0')}, input)

...

Code Block

theme	Eclipse
language	sqlcql
title	Probabilistic Join
linenumbers	true

Select * From input1,input2 WHERE input1.x=input2.y;

As you can see, the probabilistic processing is not limit to PQL. You can use the same CQL syntax you already used for deterministic values.

Working with probabilistic values

Now that you know how to filter and join probabilistic values you probably want to do something with the values like performing mathematic operations on them. To do so you can use the algebraic operator (+, *, -, /, ^) on probabilistic values in i.e. a Map operator. Attention, when using multiplication or division on continuous probabilistic values, the result is estimated by fitting Gaussian mixture models to resulting distribution.

Mathematical Functions

Int(Distribution, Lower Limit, Upper Limit)

Estimates the multivariate normal distribution probability with lower and upper integration limit.

Statistical Functions

Similarity(Distribution, Distribution)

Calculates the Bhattacharyya distance between two distributions.

Code Block

language	cql
title	Example

SELECT similarity(as2DVector(x1,y1), as2DVector(x2,y2)) FROM stream

Distance(Distribution, Value)

Calculates the Mahalanobis distance between the distribution and the value. The value can be a scalar value or a vector.

Code Block

language	cql
title	Example

SELECT distance(as3DVector(x, y, z), [1.0;2.0;3.0]) FROM stream

Datatype Functions

as2DVector(Object, Object)

Converts the two object into a 2D vector.

as3DVector(Object, Object, Object)

Similar to the as2DVector function, this function creates a 3D vector with the given objects.

Access to tuple existence

...

Code Block

theme	Eclipse
language	javascriptpql
title	Probabilistic continuous select
linenumbers	true

filteroutput = ExistenceToPayload(SELECT({predicate = RelationalPredicateProbabilisticRelationalPredicate('x > 1.0 AND x < 4.0')}, probabilistic:data))input))

ProbabilisticRelationalPredicatefilter = SELECT({predicate = RelationalPredicate('x > 1.0 AND x < 4.0')}, probabilistic:data)

Space shortcuts

Page tree

Versions Compared

Old Version 6

New Version Current

Key

Estimating probabilistic values

Expectation Maximization

Kalman Filter

Filtering probabilistic values

Working with probabilistic values

Mathematical Functions

Int(Distribution, Lower Limit, Upper Limit)

Statistical Functions

Similarity(Distribution, Distribution)

Distance(Distribution, Value)

Datatype Functions

as2DVector(Object, Object)

as3DVector(Object, Object, Object)

Access to tuple existence