Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Bucharest on Sept 18th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

parsing XML/HTML

Hello everyone
first of all thank you for your time to help me
in fact I want to parsing xml / html from site https://www.cert.ssi.gouv.fr/
I'm expecting to have a table like that

0683p000009M791.png

ie I want to parsing html and extract all the CERTEFs with a title and a publication date and all the VECs that it exists in each CERTEF
I do not know which component I can use and with which configuration that extract exlace the same table

thank you for helping me

Labels (2)
3 Replies
fdenis
Master
Master

hi,
there is no component for that but you can open html pages as xml and parse tem using xml components.
!!be advice that today a lot of site are filling using javascript so you cannot directly access data!!!
is there a way to export data as xls or csv? if yes, it's the best way.
an other possibility is to use RPA (Robotic Process Autoation) to extract data from web.
good luck
Anonymous
Not applicable
Author

first I thank you for your answer, no I can extract the site as CSV, Xls, it is possible that you look at the site
but maybe I do not know how
any way, I created a job as following but I have a problem in writing the codes

0683p000009M7D4.png

I searched between the questions in community and I find it https://community.talend.com/t5/Design-and-Development/Extract-Multiple-table-using-tHTTPTableInput-...

but I do not know how I can use this way for my project because the site has several div and pdf and link and the data is not exactly in the specific tables

thanks

regards

 

fdenis
Master
Master

thttprequest alow you to get http response like rest htlm or soap.
tJajaFlex is a free java code component. I think data are extracted in this component.
Regards,
good luck