Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I saw the other topic posted. Unfortunately the solution does not fit my needs.
I have different pdfs in the site https://www.cert.ssi.gouv.fr/ , how can extract info from each pdf to export a database and integrate in a table
Thank you very much,
first of all I thank you for your answer but I didn't understand correctly, I use the pdfs that are in this following site https://www.cert.ssi.gouv.fr/ and extract a data table that exists in each pdf and then I integrate them in a table
regards
I found that there is a tpdftotext component that has been created by other users on talendexchange but I need to extrat the table that is in each pdf so it doesn't work for me
Hi,
If you are using a custom component, I would suggest you to contact the author of the component directly. Reading from PDFs is not a good strategy as the data in PDF is meant for easy reading from human perspective. But if you have to read the data lying in PDF, why don't you go to the source system which is providing data to PDF and pick it from there?
That is the ideal way of doing in case of an enterprise environment.
Tail Note:- Amazon is creating a new feature called Textract to read PDF but it is currently in Preview mode. Once its ready, you can make API calls from Talend to get result set. There are lot of third party companies go allow API call to fetch the data. You can try that route.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
hi @nthampi
first of all I thank you for your time and answer, in fact I'm new in talend if I want to ask my very simple question in an example I would like to know how I can have the data in part: DOCUMENT MANAGEMENT from site https://www.cert.ssi.gouv.fr/alerte/CERTFR-2019-ALE-008/
in a table like that:
Reference: CERTFR-2019-ALE-008
Title: Vulnerability in Microsoft SharePoint Server
Date of first version 29 May 2019
Date of last version 29 May 2019
Source(s) Microsoft Security Bulletin CVE-2019-0604 dated February 12, 2019
thanks
regards
Hi,
The simple answer is there are no standard components from Talend palette for this requirement There might be components created by Talend community members in exchange.talend.com
Other option to do is to write custom java code to read the data using routine options in Talend or call any third party API using REST API calls from Talend.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂