Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

OCR (Optical character recognition) Scanner for talend.

Hi
I have been scouring the internet for about an hour and a half looking for some way to scan a pdf or doc. into talend but have not found any answers or components that can help.
I was wondering if anyone had made or knows of a component that can scan documents and put their produced text into talend for processing. At the moment I am using Free-OCR which is not really the way I want to go as I have to run the program before each talend process which is not very efficient.
Im really hoping somene has a solution to this.
Thanks in advance.
Dean Wake
P.S. I wasnt quite sure where to post this.
Labels (2)
6 Replies
Anonymous
Not applicable
Author

Hi,
We don't have such a component to scan a pdf or doc. Talend is a code generator ETL which use JAVA as the underline technology generated to perform the Data Extraction, Transformation and Loading.
Best regareds
Sarbina
Anonymous
Not applicable
Author

Is it possible for me to request such a component? I know it is possible to do through talends as there are many OCR SDK's based on java
Anonymous
Not applicable
Author

Hi,
You can open a JIRA issue in the Talend DI project of the JIRA bugtracker for your new feature. Our component developer will see if this feature can be available in further version.
Certainly, you can create a custom component by yourself.
Please see the reference: componentCreation
Best regards
Sabrina
Anonymous
Not applicable
Author

Hi, using ocr scanning technique to extract text or images from pdf, it supports full-page OCR, auto and manual zonal OCR creation, meanwhile, you can do some simple image processing, such as deskew, despeckle...
http://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/
Anonymous
Not applicable
Author

Hi, using ocr scanning technique to extract text or images from pdf, it supports full-page OCR, auto and manual zonal OCR creation, meanwhile, you can do some simple image processing, such as deskew, despeckle...
http://www.rasteredge.com/how-to/csharp-imaging/ocr-sdk/

i have seen it , looked wonderful
Anonymous
Not applicable
Author

if you want to use free ocr, you can try this free online ocr service, it supports 40+ languages, and can save converted text to editable txt file and searchable pdf document.