View Source

The HTML protocol handler parses HTML documents (websites) and extracts attribute values using XPath expressions. This protocol handler can even parse unbalanced HTML documents using the CyberNeko HTML parser.

Options

delay: Delay of reading in milliseconds (default 0).
nanodelay: Delay of reading in nanoseconds (default 0).
Attribute : XPath for the attribute in the document

Example

PQL

input = ACCESS({source='Web', wrapper='GenericPush',
transport='File',protocol='HTML',
  dataHandler='Tuple',options=[['symbol','//td[last()-1]'],['points','//td[last()]']],
  schema=[
    ['symbol','String'],
    ['points','Double'] ]
})