Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

MEP functions can be used to perform arbitrary things with your data (e.g., mathematic operations, string operations, etc.). These functions can be used in different operators like Map, Select, or Join. To implement a MEP function, ones one has to extend the AbstractFunction class. To implement your own MEP function you basically have to implement 4 methods, namely the getSymbol that returns the unique name of the function, the getArity, getAcceptedTypes, and getReturnType that describe the parameters and result of you function, and the getValue method which includes the processing of your function. 

Remark: The MEP optimizer tries to determine if an expression is a constant and should not be evaluated each time. For this, the getValue method is called. This behaviour can be changed by implementing the method optimizeConstantParameter and returning false.

...

the getValue function that calculates the return value and call the super constructor with the configuration of your MEP function.  The configuration contains at least the symbol, the number of parameters, the accepted data types for the parameters, and the data type of the return value. In addition, the configuration can contain a flag to indicate if the MEP function should be evaluated each time or if it is a constant and the time and space complexity of the MEP function.

Code Block
languagejava
themeEclipse
titleMEP Function stub
linenumberstrue
public class MyFunction extends AbstractFunction<Double> {

    public static final SDFDatatype[][] accTypes = new SDFDatatype[][] {{ SDFDatatype.DOUBLE },
                                                                        { SDFDatatype.DOUBLE }};

    public MyFunction() {
        super super("myFunction", 2, accTypes, SDFDatatype.DOUBLE, true, 3, 5);
    }

    @Override
    public Double getValue() {
        double a = (double) this.getInputValue(0);
        double b = this.getNumericalInputValue(1);

        return a  +b;
    }

 }

In this example a MEP function is defined with the symbol myFunction and can be used in a predicate or map expression with two parameters of type Double and will also return a value of type Double which will be the sum or the two input parameters as defined by the implementation of the getValue() function. Further, the MEP function can be optimized if the two input parameters are constant. Then, the result will be calculated only once. In addition, the MEP function provides a time complexity score of 3 and a space complexity score of 5 which means that the optimizer will try to place the call of myFunction before any other MEP function with a higher time or space complexity and behind any MEP function with a lower time or space complexity during optimization. 

Access to function attributes

To Access the attributes of the function you can use the getInputValue or the getNumericalInputValue methods. While the first method returns an object, the second already cast the input value to a double value. Both methods takes the position index of the attribute as an argument. The name of the function, the total number of attributes, and the datatype data type of the accepted attributes is set in the constructor. Thus, a MEP  function can handle multiple data types for each attribute.

...

 

Support for optimization

The MEP optimizer tries to determine if an expression is a constant and should not be evaluated each time. For this, the getValue method is called. This behavior can be changed by setting the fifth parameter in the constructor to false. To support the optimization of predicates, the time and space complexity can be set in the constructor as the last two parameters. Both values should be in the range between 0-9 depending on their average expected complexity. Depending on the value, the MEP function will be placed differently in the resulting optimized predicate.

Image the following scenario with a predicate expression that checks if an attribute x holds a value higher than the return value of a function called lastPrimeNumber(x) (that might estimate the last prime number lower or equals to x), a value higher than the result of function defined above, and higher than 0: 

(x > lastPrimeNumber(x)) || (x > myFunction(y, z)) || (x > 0)

During predicate optimization, the optimizer checks the complexity values and reorder the terms according to their values resulting in an optimized predicate as follows:

(x > 0) || (x > myFunction(y, z)) || (x > lastPrimeNumber(x))

Here, the last term comparing the value of x with 0 is moved to the front and the more expensive function is moved to the tail.

A rule of thumb should be, MEP functions with logarithmic complexity should have a value between 0-3,  linear complexity a value between 4-6, and exponential complexity a value between 7-9. However, this is just a first draft. The basic idea is to evaluate cheap functions first and avoid expensive functions if possible.

...