9 Replies Latest reply: Mar 11, 2014 3:52 PM by fds fdsf RSS

    Reading from a pdf file?

    Friedrich Hofmann


      Hi,

       

      can someone please advise me on (where to find info on) how to read data from a pdf file into QlikView?

      => The scenario is that I have one or several "presence sheets" from trainings and all I need is the nr. of signatures on the sheet (from those who were present) compared to the nr. of names on the sheet (those who were supposed to be there).

       

      Thanks a lot!

      Best regards,

       

      DataNibbler

        • Re: Reading from a pdf file?
          Stefan Wühl

          So you are talking about scanned documents, i.e. bitmaps embedded in pdf?

          • Re: Reading from a pdf file?
            Srikanth P

            First you need to convert these PDF files into text files with third party tool.  After that load new created text files into Qlikview.

              • Re: Reading from a pdf file?
                Friedrich Hofmann

                Hi dathu.qv,

                 

                that is simpler said than done - yes, I am talking about scanned docs. It's in Excel first, but then printed, signed by the attendees and scanned - scanning is automatically to pdf.

                I was thinking of then looking over the lists - there are equivalent lists in Excel - and simply ticking off the people who have attended. That would be very easy to display then - but if it can be done automatically, all the better.

                If I need a third-party-tool to convert them first, I'll have to talk to IT. I cannot simply install anything on my laptop, it is technically blocked, but there might be something in that respect around.

                Let's see...

                Thanks a lot already!

                Best regards,

                 

                DataNibbler

                 

                P.S.: Aha - I just found out - well, I had explained to me - that it is actually possible to select other formats to scan to - there is JPG, TIFF, OOXML or XPS - but I'm sceptic: The documents are distributed to dpts, those hold the trainings, have the docs signed and scan them - explaining them all how to do it would be difficult at best - well, that shall be my worry. Can QlikView read any of those other formats without conversion?

                  • Re: Reading from a pdf file?
                    Peter Cammaert

                    To get the sheet with signatures back into a data format, you'll need to OCR it as the PDF only contains a bitmap-representation of the sheet. AFAIK QlikView is not capable of performing this conversion by itself.

                     

                    Have a look at the main OCR packages, Omnipage Pro and ABBY FineReader. The advanced editions often offer automation tools for integrating OCR into a workflow.

                     

                    As a future enhancement, you could consider putting your excel on a tablet, and let the attendees sign in right into the worksheet. Probably makes things a bit easier to handle.

                     

                    Best,

                     

                    Peter

                      • Re: Reading from a pdf file?
                        Friedrich Hofmann

                        Hi Peter,

                         

                        this is solved - of sorts ;-) The team just told me they are going to make it easier for me and do just what I posted: => They are going to read through the lists (returned to them in paper) and just tick off the names on their Excel_lists.

                        So there's no problem anymore and I'll close this thread.

                         

                        Yes, a tablet would be a good idea - but we don't have many and I'm afraid if we distributed them to the dpts, they would somehow grow wings... We had some iPads at a time - well, there are cheaper alternatives, too - but IT won't support them for anything but email, so they were never adopted on a larger scale ;-)

                         

                        Best regards,

                         

                        DataNibbler

                      • Re: Reading from a pdf file?
                        Peter Cammaert

                        JPG and TIFF: same problem as your original PDF, these are just bitmap file formats. OCR needed.

                        OOXML: Microsft Office native format (since Office 2007). QlikView can read this format if it contains Excel information. Probably not what your scanning package generates.

                        XPS: just another format like PDF, so same problem.

                         

                        Does the scanning software they use for those attendee sheets contain any OCR functionality? That would be a big help!

                         

                        Peter