Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

Reading from a pdf file?


Hi,

can someone please advise me on (where to find info on) how to read data from a pdf file into QlikView?

=> The scenario is that I have one or several "presence sheets" from trainings and all I need is the nr. of signatures on the sheet (from those who were present) compared to the nr. of names on the sheet (those who were supposed to be there).

Thanks a lot!

Best regards,

DataNibbler

9 Replies
swuehl
MVP
MVP

So you are talking about scanned documents, i.e. bitmaps embedded in pdf?

Not applicable

First you need to convert these PDF files into text files with third party tool.  After that load new created text files into Qlikview.

datanibbler
Champion
Champion
Author

Hi dathu.qv,

that is simpler said than done - yes, I am talking about scanned docs. It's in Excel first, but then printed, signed by the attendees and scanned - scanning is automatically to pdf.

I was thinking of then looking over the lists - there are equivalent lists in Excel - and simply ticking off the people who have attended. That would be very easy to display then - but if it can be done automatically, all the better.

If I need a third-party-tool to convert them first, I'll have to talk to IT. I cannot simply install anything on my laptop, it is technically blocked, but there might be something in that respect around.

Let's see...

Thanks a lot already!

Best regards,

DataNibbler

P.S.: Aha - I just found out - well, I had explained to me - that it is actually possible to select other formats to scan to - there is JPG, TIFF, OOXML or XPS - but I'm sceptic: The documents are distributed to dpts, those hold the trainings, have the docs signed and scan them - explaining them all how to do it would be difficult at best - well, that shall be my worry. Can QlikView read any of those other formats without conversion?

Peter_Cammaert
Partner - Champion III
Partner - Champion III

To get the sheet with signatures back into a data format, you'll need to OCR it as the PDF only contains a bitmap-representation of the sheet. AFAIK QlikView is not capable of performing this conversion by itself.

Have a look at the main OCR packages, Omnipage Pro and ABBY FineReader. The advanced editions often offer automation tools for integrating OCR into a workflow.

As a future enhancement, you could consider putting your excel on a tablet, and let the attendees sign in right into the worksheet. Probably makes things a bit easier to handle.

Best,

Peter

Peter_Cammaert
Partner - Champion III
Partner - Champion III

JPG and TIFF: same problem as your original PDF, these are just bitmap file formats. OCR needed.

OOXML: Microsft Office native format (since Office 2007). QlikView can read this format if it contains Excel information. Probably not what your scanning package generates.

XPS: just another format like PDF, so same problem.

Does the scanning software they use for those attendee sheets contain any OCR functionality? That would be a big help!

Peter

datanibbler
Champion
Champion
Author

Hi Peter,

this is solved - of sorts 😉 The team just told me they are going to make it easier for me and do just what I posted: => They are going to read through the lists (returned to them in paper) and just tick off the names on their Excel_lists.

So there's no problem anymore and I'll close this thread.

Yes, a tablet would be a good idea - but we don't have many and I'm afraid if we distributed them to the dpts, they would somehow grow wings... We had some iPads at a time - well, there are cheaper alternatives, too - but IT won't support them for anything but email, so they were never adopted on a larger scale 😉

Best regards,

DataNibbler

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Great. "Keep it simple" is always preferable.

Good luck,

Peter

datanibbler
Champion
Champion
Author

That's right.

QlikView is a very powerful software - but then, so is Excel - but if you overdo it, it can be very complex for others to understand. Getting the dpts to modify the data_base, if possible, is preferable in many cases.

Best regards,

DataNibbler

Not applicable

first, scan pdf document with ocr technology, and then read pdf data with professional pdf software. i think this method will enable us to read pdf data. but what pdf software should i use, i heard QlikView has one scanning software, is it good?