The HTML protocol handler parses HTML documents (websites) and extracts attribute values using XPath expressions. This protocol handler can even parse unbalanced HTML documents using the CyberNeko HTML parser.

Options

  • delay: Delay of reading in milliseconds (default 0).
  • nanodelay: Delay of reading in nanoseconds (default 0).
  • Attribute : XPath for the attribute in the document

Example

PQL

HTML Protocol Handler
input = ACCESS({source='Web', wrapper='GenericPush',
transport='File',protocol='HTML',
  dataHandler='Tuple',options=[['symbol','//td[last()-1]'],['points','//td[last()]']],
  schema=[
    ['symbol','String'],
    ['points','Double'] ]
})
  • No labels