The probabilistic feature provides functions and operators to pr= ocess discrete and continuous probabilistic values in a data stream. Contin= uous probabilistic values are represented using Gaussian Mixtures.

To enable the probabilistic processing you have to include the probabili=
stic feature and use the probabilistic metadata *(#METADATA Probabilisti=
c)* in your Odysseus script.

ToDo:

The EM operator allows the= fit a Gaussian mixture model (GMM) with predefined number of mixtures to t= he values of a data stream.

For filtering probabilistic values you can use the same syntax that= you already use for deterministic values. However, the result of the opera= tors differ. In case of discrete probabilistic values the Select operator r= eturns a tuple with a lower tuple existence probability.

Lets assume you have an attribute x and that attribute is 1.0 with proba= bility 0.25, 2.0 with probability 0.25, and 3.0 with probability 0.5. The f= ollowing Select operation will now filter the attribute value such that the= resulting attribute value can only be instantiated to 2.0 and the resultin= g tuple existence is reduced to 0.25.

=20

output =3D SEL= ECT({predicate =3D ProbabilisticRelationalPredicate('x > 1.0 AND x < = 3.0')}, input)=20

The filtering of continuous probabilistic distributions is similar to th= e processing of discrete probabilistic values in the fact that it may reduc= e the tuple existence probability.

Lets assume you have a random variable x with mean 0.0 and *=CF=83*^{2} 1.0 the following query will=
set the tuple existence to the cumulative probability that this random var=
iable will take a value between the upper and lower bound that is ~0.158623=
5826896239.

=20

output =3D SEL= ECT({predicate =3D ProbabilisticRelationalPredicate('x > 1.0 AND x < = 4.0')}, input)=20

The join with a predicate based on probabilistic discrete values uses th= e same syntax as for deterministic values. Although it looks similar the re= sult is different in the sense that the Join operator performs a join of th= e input streams in each possible world and as such the operator may produce= more tuple.

=20

Select * From = input1,input2 WHERE input1.x=3Dinput2.y;=20

As you can see, the probabilistic processing is not limit to PQL. You ca= n use the same CQL syntax you already used for deterministic values.

Now that you know how to filter and join probabilistic values you probab= ly want to do something with the values like performing mathematic operatio= ns on them. To do so you can use the algebraic operator (+, *, -, /, ^) on = probabilistic values in i.e. a Map operator. Attention, when using multipli= cation or division on continuous probabilistic values, the result is estima= ted by fitting Gaussian mixture models to resulting distribution.

Estimates the multivariate normal distribution probability with lower an= d upper integration limit.

Calculates the Bhattacharyya distance= between two distributions.

=20

SELECT = similarity(as2DVector(x1,y1), as2DVector(x2,y2)) FROM stream=20

Calculates the Mahalanobis distance bet= ween the distribution and the value. The value can be a scalar value or a v= ector.

=20

SELECT = distance(as3DVector(x, y, z), [1.0;2.0;3.0]) FROM stream=20

Converts the two object into a 2D vector.

Similar to the as2DVector function, this function creates a 3D vector wi= th the given objects.

To access the tuple existence during processing you can use the Existenc= eToPayload operator that copies the tuple existence to the payload where yo= u can access them with the attribute name "meta_existence".

=20

output =3D Exi= stenceToPayload(SELECT({predicate =3D ProbabilisticRelationalPredicate('x &= gt; 1.0 AND x < 4.0')}, input))=20

ProbabilisticRelationalPredicate