Date: Wed, 4 Oct 2023 05:12:27 +0200 (CEST) Message-ID: <1245594005.1483.1696389147430@odysseus.offis.uni-oldenburg.de> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_1482_1407129281.1696389147429" ------=_Part_1482_1407129281.1696389147429 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html Probabilistic Feature

# Probabilistic Feature

The probabilistic feature provides functions and operators to pr= ocess discrete and continuous probabilistic values in a data stream. Contin= uous probabilistic values are represented using Gaussian Mixtures.

To enable the probabilistic processing you have to include the probabili= stic feature and use the probabilistic metadata (#METADATA Probabilisti= c) in your Odysseus script.

## Estimating pr= obabilistic values

ToDo:

### Expectation Maximiz= ation

The EM operator allows the= fit a Gaussian mixture model (GMM) with predefined number of mixtures to t= he values of a data stream.

## Filtering prob= abilistic values

For filtering probabilistic values you can use the same syntax that= you already use for deterministic values. However, the result of the opera= tors differ. In case of discrete probabilistic values the Select operator r= eturns a tuple with a lower tuple existence probability.

Lets assume you have an attribute x and that attribute is 1.0 with proba= bility 0.25, 2.0 with probability 0.25, and 3.0 with probability 0.5. The f= ollowing Select operation will now filter the attribute value such that the= resulting attribute value can only be instantiated to 2.0 and the resultin= g tuple existence is reduced to 0.25.

Probabilistic discrete select
=20
```output =3D SEL=
ECT({predicate =3D ProbabilisticRelationalPredicate('x > 1.0 AND x < =
3.0')}, input)```
=20

The filtering of continuous probabilistic distributions is similar to th= e processing of discrete probabilistic values in the fact that it may reduc= e the tuple existence probability.

Lets assume you have a random variable x with mean 0.0 and =CF=832 1.0 the following query will= set the tuple existence to the cumulative probability that this random var= iable will take a value between the upper and lower bound that is ~0.158623= 5826896239.

Probabilistic continuous select
=20
```output =3D SEL=
ECT({predicate =3D ProbabilisticRelationalPredicate('x > 1.0 AND x < =
4.0')}, input)```
=20

## Joining probabil= istic values

The join with a predicate based on probabilistic discrete values uses th= e same syntax as for deterministic values. Although it looks similar the re= sult is different in the sense that the Join operator performs a join of th= e input streams in each possible world and as such the operator may produce= more tuple.

Probabilistic Join
=20
```Select * From =
input1,input2 WHERE input1.x=3Dinput2.y;```
=20

As you can see, the probabilistic processing is not limit to PQL. You ca= n use the same CQL syntax you already used for deterministic values.

## Working with= probabilistic values

Now that you know how to filter and join probabilistic values you probab= ly want to do something with the values like performing mathematic operatio= ns on them. To do so you can use the algebraic operator (+, *, -, /, ^) on = probabilistic values in i.e. a Map operator. Attention, when using multipli= cation or division on continuous probabilistic values, the result is estima= ted by fitting Gaussian mixture models to resulting distribution.

### Mathematical Function= s

#### Int= (Distribution, Lower Limit, Upper Limit)

Estimates the multivariate normal distribution probability with lower an= d upper integration limit.

### Statistical Functions<= /h3>

#### Simil= arity(Distribution, Distribution)

Calculates the Bhattacharyya distance= between two distributions.

Example
=20
```SELECT =
similarity(as2DVector(x1,y1), as2DVector(x2,y2)) FROM stream
```
=20

#### Distance(Distr= ibution, Value)

Calculates the Mahalanobis distance bet= ween the distribution and the value. The value can be a scalar value or a v= ector.

Example
=20
```SELECT =
distance(as3DVector(x, y, z), [1.0;2.0;3.0]) FROM stream```
=20

### Datatype Functions

#### as2DVector(Object= , Object)

Converts the two object into a 2D vector.

#### as3DVector= (Object, Object, Object)

Similar to the as2DVector function, this function creates a 3D vector wi= th the given objects.

To access the tuple existence during processing you can use the Existenc= eToPayload operator that copies the tuple existence to the payload where yo= u can access them with the attribute name "meta_existence".

Probabilistic continuous select
=20
```output =3D Exi=
`ProbabilisticRelationalPredicate`