Solved: Getting an specific tag from HTML file - Qlik Community

Anonymous · ‎2018-11-16

Hi everyone,

I'm trying to get an href link from an HTML file obtained from a HTTP GET request, but seems that I cannot iterate in the correct xml tags to get the data.

the xpath wich I'm trying to dive into is: "//*[@id="node-24615"]/div/div/div/div/center/div[3]/div/table/tbody/tr[2]/td[2]/div/a"

and the link that I have to get is: "http://obieebr.banrep.gov.co/analytics/saw.dll?Download&Format=excel2007&Extension=.xlsx&BypassCache..."

thanks for your help!!

Anonymous · ‎2018-11-16

HTML is not XML so this will only work in rare cases. A better solution to this is use something like jsoup https://jsoup.org/

It will require a bit of java, but is entirely possible.

View solution in original post

Anonymous · ‎2018-11-16

HTML is not XML so this will only work in rare cases. A better solution to this is use something like jsoup https://jsoup.org/

It will require a bit of java, but is entirely possible.

Anonymous · ‎2018-11-16

Thanks for your help, finally I could import jsoup library and write a short java code to extract the link.

try {
Document doc = Jsoup.connect(context.webURI).timeout(20000).get();
Elements tds = doc.select(context.elementSelector);
context.webURIExcel = tds.first().attr(context.hrefLabel);
} catch (IOException e) {
e.printStackTrace();
}

talend solution.PNG

Anonymous · ‎2018-11-17

Nice work!

Getting an specific tag from HTML file

v7.x

XML