Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Tika protocol handler allows to use the Apache Tika framework to parse arbitrary documents.

...

To access the different attribute values the KeyValueToTuple operator can be used to transform the required attributes into a relational tuple.

...

Code Block
languagepql
titleExample
linenumberstrue
input = ACCESS({source='source', 
                wrapper='GenericPush',
                transport='TCPClientdirectory',
                protocol='tika',
                dataHandler='KeyValueObject',
                options=[['hostdirectory','192.168.1.20'],['port','2111 '/var/log']]
})

out = KEYVALUETOTUPLE({
          schema=[['content', 'String']],
          type='Document',
          keepinput='false' 
        },
        input
      )