The HTML protocol handler parses HTML documents (websites) and extracts attribute values using XPath expressions. This protocol handler can even parse unbalanced HTML documents using the CyberNeko HTML parser.
Options
- delay: Delay of reading in milliseconds (default 0).
- nanodelay: Delay of reading in nanoseconds (default 0).
- Attribute : XPath for the attribute in the document
Example
PQL
HTML Protocol Handler
input = ACCESS({source='Web', wrapper='GenericPush',
transport='File',protocol='HTML',
  dataHandler='Tuple',options=[['symbol','//td[last()-1]'],['points','//td[last()]']],
  schema=[
    ['symbol','String'],
    ['points','Double'] ]
})