Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

The probabilistic feature provides functions and operators to process discrete and continuous probabilistic values in a data stream. Continuous probabilistic values are represented using Gaussian Mixtures.

To enable the probabilistic processing you have to include the probabilistic feature and issue the StandardProbabilistic transformation configuration (#TRANSCFG StandardProbabilistic) in your Odysseus script.

Filtering probabilistic values

For filtering probabilistic values you can use the same syntax that you already use for deterministic values. However, the result of the operators differ. In case of discrete probabilistic values the Select operator returns a tuple with a lower tuple existence probability.

Lets assume you have an attribute x and that attribute is 1.0 with probability 0.25, 2.0 with probability 0.25, and 3.0 with probability 0.5. The following Select operation will now filter the attribute value such that the resulting attribute value can only be instantiated to 2.0 and the resulting tuple existence is reduced to 0.25.

Code Block
themeEclipse
languagePQL
titleProbabilistic discrete select
linenumberstrue
output = SELECT({predicate = RelationalPredicate('x > 1.0 AND x < 3.0')}, input)

The filtering of continuous probabilistic distributions is similar to the processing of discrete probabilistic values in the fact that it may reduce the tuple existence probability.

Lets assume you have a random variable x with mean 0.0 and σ2 1.0 the following query will set the tuple existence to the cumulative probability that this random variable will take a value between the upper and lower bound that is ~0.1586235826896239.

Code Block
themeEclipse
languagepql
titleProbabilistic continuous select
linenumberstrue
output = SELECT({predicate = RelationalPredicate('x > 1.0 AND x < 4.0')}, input)

Joining probabilistic values

The join with a predicate based on probabilistic discrete values uses the same syntax as for deterministic values. Although it looks similar the result is different in the sense that the Join operator performs a join of the input streams in each possible world and as such the operator may produce more tuple.

Code Block
themeEclipse
languagecql
titleProbabilistic Join
linenumberstrue
Select * From input1,input2 WHERE input1.x=input2.y;

As you can see, the probabilistic processing is not limit to PQL. You can use the same CQL syntax you already used for deterministic values.

Working with probabilistic values

Now that you know how to filter and join probabilistic values you probably want to do something with the values like performing mathematic operations on them. To do so you can use the algebraic operator (+, *, -, /, ^) on probabilistic values in i.e. a Map operator. Attention, when using multiplication or division on continuous probabilistic values, the result is estimated by fitting Gaussian mixture models to resulting distribution.

Code Block
themeEclipse
languagepql
titleAlgebraic operator on probabilistic discrete values
linenumberstrue
output = MAP({expressions = ['x + 2.0', 'x * 2.0', 'x * toProbabilisticDouble([1.0,0.5;2,0.5])', 'x * toProbabilisticDouble([1.0,0.5])']}, input)

Mathematical Functions

SQRT(Probabilistic Value)

Computes the probabilistic square root of the given probabilistic value.

Int(Distribution, Lower Limit, Upper Limit)

Estimates the multivariate normal distribution probability with lower and upper integration limit.

Statistical Functions

Similarity(Distribution, Distribution)

Calculates the Bhattacharyya distance between two distributions.

Code Block
languagecql
titleExample
SELECT similarity(as2DVector(x1,y1), as2DVector(x2,y2)) FROM stream

Distance(Distribution, Value)

Calculates the Mahalanobis distance between the distribution and the value. The value can be a scalar value or a vector.

Code Block
languagecql
titleExample
SELECT distance(as3DVector(x, y, z), [1.0;2.0;3.0]) FROM stream

Datatype Functions

ToProbabilisticDouble(Matrix)

Constructs a discrete probabilistic value using the first column of the given matrix for the values and the second column of the matrix for the probabilities for each value.

DoubleToShort(Probabilistic Value)

Converts the given probabilistic double value to a probabilistic short value

DoubleToByte(Probabilistic Value)

Converts the given probabilistic double value to a probabilistic byte value

DoubleToInteger(Probabilistic Value)

Converts the given probabilistic double value to a probabilistic integer value

DoubleToFloat(Probabilistic Value)

Converts the given probabilistic double value to a probabilistic float value

DoubleToLong(Probabilistic Value)

Converts the given probabilistic double value to a probabilistic long value

as2DVector(Object, Object)

Converts the two object into a 2D vector.

as3DVector(Object, Object, Object)

Similar to the as2DVector function, this function creates a 3D vector with the given objects.

Access to tuple existence

To access the tuple existence during processing you can use the ExistenceToPayload operator that copies the tuple existence to the payload where you can access them with the attribute name "meta_existence".

Code Block
themeEclipse
languagepql
titleProbabilistic continuous select
linenumberstrue
output = ExistenceToPayload(SELECT({predicate = RelationalPredicate('x > 1.0 AND x < 4.0')}, input))