Date: Fri, 29 Mar 2024 09:51:28 +0100 (CET) Message-ID: <95824430.141.1711702288197@vmisdata19.uni-oldenburg.de> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_140_1240106862.1711702288196" ------=_Part_140_1240106862.1711702288196 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
MEP functions can be used to perform arbitrary things with your = data (e.g., mathematic operations, string operations, etc.). These function= s can be used in different operators like Map, Select= , or Join. To implement a M= EP function, one has to extend the AbstractFunction class. To impl= ement your own MEP function you basically have to implement the get= Value function that calculates the return value and call the super= constructor with the configuration of your MEP function. The configu= ration contains at least the symbol, the number of parameters, the accepted= data types for the parameters, and the data type of the return value. In a= ddition, the configuration can contain a flag to indicate if the MEP functi= on should be evaluated each time or if it is a constant and the time and sp= ace complexity of the MEP function.
public class M= yFunction extends AbstractFunction<Double> { public static final SDFDatatype[][] accTypes =3D new SDFDatatype[][] {{= SDFDatatype.DOUBLE }, { S= DFDatatype.DOUBLE }}; public MyFunction() { super("myFunction", 2, accTypes, SDFDatatype.DOUBLE, true, 3, = 5); } @Override public Double getValue() { double a =3D (double) this.getInputValue(0); double b =3D this.getNumericalInputValue(1); return a +b; } }
In this example a MEP function is defined with the symbol myFunction= and can be used in a predicate or map expression with two parameters = of type Double and will also return a value of type Double which will be th= e sum or the two input parameters as defined by the implementation of the <= em>getValue() function. Further, the MEP function can be optimized if = the two input parameters are constant. Then, the result will be calculated = only once. In addition, the MEP function provides a time complexity score o= f 3 and a space complexity score of 5 which means that th= e optimizer will try to place the call of myFunction before any ot= her MEP function with a higher time or space complexity and behind any MEP = function with a lower time or space complexity during optimization. = p>
To Access the attributes of the function you can use the getInputVal= ue or the getNumericalInputValue methods. While the first met= hod returns an object, the second already cast the input value to a double = value. Both methods takes the position index of the attribute as an argumen= t. The name of the function, the total number of attributes, and the data t= ype of the accepted attributes is set in the constructor. Thus, a = MEP function can handle multiple data types for each attribute.
To access the meta attributes of an incoming streaming object you can us= e the getMetaAttribute function.
To access the additional content of an incoming streaming object you can= use the getAdditionalContents method to access all contents. If y= ou only want to access a special field you can issue the getAdditionalC= ontent(fieldName) method.
The MEP optimizer tries to determine if an expression is a constant and = should not be evaluated each time. For this, the getValue method is called.= This behavior can be changed by setting the fifth parameter in the constru= ctor to false. To support the optimization of predicates, the time and spac= e complexity can be set in the constructor as the last two parameters. Both= values should be in the range between 0-9 depending on their average expec= ted complexity. Depending on the value, the MEP function will be placed dif= ferently in the resulting optimized predicate.
Image the following scenario with a predicate expression that checks if = an attribute x holds a value higher than the return value of a function cal= led lastPrimeNumber(x) (that might estimate the last prime number = lower or equals to x), a value higher than the result of function defined a= bove, and higher than 0:
(x > lastPrimeNumber(x)) || (x > myFunction(y, z)) || (x > 0)= p>
During predicate optimization, the optimizer checks the complexity value= s and reorder the terms according to their values resulting in an optimized= predicate as follows:
(x > 0) || (x > myFunction(y, z)) || (x > lastPrimeNumber(x))= p>
Here, the last term comparing the value of x with 0 is moved to the fron= t and the more expensive function is moved to the tail.
A rule of thumb should be, MEP functions with logarithmic complexit= y should have a value between 0-3, linear complexity a value between = 4-6, and exponential complexity a value between 7-9. However, this is = just a first draft. The basic idea is to evaluate cheap functions first and= avoid expensive functions if possible.