The HTML protocol handler parses HTML documents (websites) and extracts attribute values using XPath expressions. This protocol handler can even parse unbalanced HTML documents using the CyberNeko HTML parser.
Options
- delay: Delay of reading in milliseconds (default 0).
- nanodelay: Delay of reading in nanoseconds (default 0).
- Attribute : XPath for the attribute in the document
Example
PQL
Code Block |
---|
theme | Eclipse |
---|
language | javascript |
---|
title | HTML Protocol Handler |
---|
linenumbers | true |
---|
|
input = ACCESS({source='Web', wrapper='GenericPush',
transport='File',protocol='HTML',
dataHandler='Tuple',options=[['symbol','//td[last()-1]'],['points','//td[last()]']],
schema=[
['symbol','String'],
['points','Double'] ]
}) |