Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following parameters are to further describe the wrapper GenericPush and GenericPull. GenericPull is needed, when the data needs to be extracted from the sources (e.g. from a file) and GenericPush is needed, when the data from the source is send actively. Pull requires scheduling (done automatically), push not.
Each parameter typically needs further configurations parameters (e.g. a file name for a file wrapper). These additional parameters are set in the options-Parameter, consisting of key-value pairs:
Options = [['key1', 'value1'], ['key2', 'value2'], … , ['keyN', 'valueN']]

Transport

see Transport Handler for current information.

This parameter selects the input type of the Wrapper. The following values are currently supported for the GenericPull-Wrapper:

  • File: File references to a local file (local where the query is executed, i.e. in a client/server scenario this file must be located on the server).
    • The following options are available:
      • Filename: The path to the file.
      • append: append values to end of file. If set to false (default) the values are overwritten.
  • HDFS: File reference to a Hadoop Filesystem (wrapper feature must be installed)
    • The following options are available
      • fs.default.name: The name of the Hadoop filesystem
      • Filename: The path to the file
      • append: append values to end of file. If set to false (default) the values are overwritten.
  • TcpSocket: This parameter defines a tcp socket connection to a server where the input data need to be retrieved from the source. It blocks until data is available
    • The following options are available:
      • Host: The name or IP of the server
      • Port: The port number of the server

If the source needs login and password

      • User
      • Password

The following values are currently supported for the GenericPush-Wrapper:

  • NonBlockingTcp: This parameter defines a TCP socket connection to a server where the communication does not block. Each time new data is available it is send to the system automatically (Java Nio).
    • The following options are available:
      • Host: The name or IP of the server
      • Port: The port number of the server
      • Autoconnect: A boolean indicating if on a access failure the connection should be reinitialized again. (currently not supported!)

If the source needs login and password

      • User: The login
      • Password: The password

Protocol

See Protocol , see Transport Handler for current information.

Protocol

The parameter determines how the input from the transport is processed. The main task for this component is the identification of objects in the input and the preparation for the data handler (see next parameter).
The following protocols are currently available in Odysseus.

GenericPull

  • Line: This simple handler just reads one line from the input and sends the Text to the data handler.
    • Delay [in ms] To reduce the data rate
  • SimpleCSV: This handler is similar to line. Additionally, it splits the line based on a delimiter that needs to be set in the options. This handler does not treat escaping of the delimiter (e.g. by quotes or backslash). A string array is send to the data handler.
    • Delimiter: The delimiter that separates each element from another.
  • CSV: Same as SimpleCSV but treats quotes. Because this version is slower used SimpleCSV if no quoted elements are contained in the source.
  • Text: This handler can be used to identify elements in a character stream where a distinct delimiter is used to separate the objects. The whole object is send to the data handler.
    • Delimiter: The delimiter that should be used to separate the object
    • KeepDelimiter: A flag that indicates if the delimiter should be part of the result send to the data handler or not
    • Charset: The Java char set that should be used to decode the input (e.g. "utf-8").

 

GenericPush

...

, see Protocol Handler for current information.

DataHandler

See Data Handler for current information.

...