...
Code Block |
---|
#CONFIG DISTRIBUTE true |
This statement in Odysseus Script indicates that the queries which follows should be distributed in the network.
Info |
---|
If you want to run several nodes on one machine, ensure that you set the configuration net.querydistribute.randomport=true. Otherwise, port conflicts may occur. |
A better way would be to use docker-compose. |
General process of query distribution
...
The strategies are explained in the next sections. Partition and Allocation allow only one selected strategy. The other phases can be used multiple times per query. If the user had not specified strategies for some phases, OdysseusNet will use specific strategies as default (configurable):
Phase | Default strategy |
---|---|
Preprocess | <none> |
Partition | querycloud |
Modification | <none> |
Allocation | querycount |
Postprocess | merge |
The Transmission phase cannot be altered.
Preprocess
Strategy name | Description |
---|---|
source | StreamAOs are replaced with their logical operators. |
Partition
Strategy name | Description |
---|---|
querycloud | The entire query is one query part (no partitioning). |
operatorcloud | Each logical operator is its own query part (max. partitioning) |
Modification
Strategy name | Description |
---|---|
replication <degree> | Query parts are replicated (with replicationdegree degree ), executed multiple times. See Replication. |
Allocation
Strategy name | Description | ||
---|---|---|---|
direct | All query parts are assigned to the specified OdysseusNode.
|
| |
querycount | Query parts are assigned to nodes with the least count of queries. |
roundrobin | Query parts are assigned in order. |
user | Query parts are assigned to user-specified nodes. For this, each logical operator has a DESTINATION -Parameter in PQL. Two operators which have the same DESTINATION -value are assigned to the same node. |
Postprocess
Strategy name | Description |
---|---|
localsink | Logical sink operators are staying local (on the distributing node) overriding allocation. Useful if the user wants to have the last operator to check the data stream results. |
localsource | Logical source operators are staying local (on the distribution node) overriding allocation. Useful if the user do not want to share its source operators to other nodes. |
merge | Two adjacent query parts which are assigned to the same node are merged to omit unneeded network-transmission operators. |
discardedreplicates | Post processor to insert a sender for each {@link ReplicationMergeAO}. The sender will be inserted for the output port 1 that is not used normally. All discarded replicates are sent to port 1. The sender writes the data in a CSV file (one file per merger). The only argument for this post processor is the path to the CSV files. The names of the files are determined by the {@link ReplicationMergeAO} (name and hashCode). Used sender settings: |
csv" | |
CalcLatency | Inserts a CalcLatencyAO before every real sink. Note that the ILatency metadata will be added to the sources automatically. This postprocessor needs the feature "Calculation Postprocessor Feature". |
CalcDatarate | Inserts a DatarateAO after every source. Note that the IDatarate metadata will be added to the sources automatically. This postprocessor needs the feature "Calculation Postprocessor Feature". |
RemoteQuery
A simple way to distribute whole queries (inklusing Odysseus Script parts) to other nodes can be done with the #REMOTEQUERY command:
Code Block |
---|
#REMOTEQUERY (name=worker_1)
#PARSER PQL
#RUNQUERY
timer = TIMER({PERIOD = 1000, SOURCE = 'source'})
map = MAP({EXPRESSIONS = ['toString("marco")'], KEEPINPUT = true}, timer)
#REMOTEQUERY (name=worker_2)
#PARSER PQL
#RUNQUERY
timer = TIMER({PERIOD = 1000, SOURCE = 'source'})
map = MAP({EXPRESSIONS = ['toString("marco")'], KEEPINPUT = true}, timer) |
Here, anything between two #REMOTEQUERY commands (or the end of the document) are copied and send as whole to the node to be processed there. Remark: This this different than "direct" from above as the query is not translated locally. By this, you could have e.g. multiple master nodes that will get the whole queries from another (super master) node.