Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

The HTML protocol handler parses HTML documents (websites) and extracts attribute values using XPath expressions. This protocol handler can even parse unbalanced HTML documents using the CyberNeko HTML parser.

Options

  • delay: Delay of reading in milliseconds (default 0).
  • nanodelay: Delay of reading in nanoseconds (default 0).
  • Attribute : XPath for the attribute in the document

Example

PQL

Code Block
themeEclipse
languagejavascript
titleHTML Protocol Handler
linenumberstrue
input = ACCESS({source='Web', wrapper='GenericPush',
transport='File',protocol='HTML',
  dataHandler='Tuple',options=[['symbol','//td[last()-1]'],['points','//td[last()]']],
  schema=[
    ['symbol','String'],
    ['points','Double'] ]
})