Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Read data from a pdf file

Hi All,

Is there any way or any extension available to read data from a pdf file,

just like we use to read from other sources as excel , or db etc..

10 Replies
Anonymous
Not applicable
Author

No, but there are converters from PDF to Excel.

rbecher
MVP
MVP

Interesting.. Could you post an example PDF file to understand the use case?

- Ralf

Astrato.io Head of R&D
brijesh1991
Partner - Specialist
Partner - Specialist

Interesting. . . Could you post an example PDF file to understand the use case?
-Brijesh

Anonymous
Not applicable
Author

PFA a sample.........

rbecher
MVP
MVP

Hi Nitin,

this is quite a long road.. You can do it with a file conversion using pdftohtml.exe from Sourceforge:

// Set path of source file

Set vPath = C:\Projekte\QVPDF\;

// Set amount of columns

Set vCols = 2;

// convert PDF file to XML

EXECUTE cmd.exe /C pdftohtml.exe -xml $(vPath)sample.pdf;

// Load from XML (this is very dependent from PDF layout!)

RawData:

LOAD text%Table as value

FROM [$(vPath)sample.xml] (XmlSimple, Table is [pdf2xml/page/text]);

// Load field names from header for later renaming

HeaderMap:

Mapping First $(vCols) LOAD '@' & RecNo() as x,  value as y

Resident RawData;

// build a proper input table

InputTable:

LOAD ceil(RecNo()/2)-1 as %key, if(Mod(RecNo(),2)>0, '@1', '@2') as attribute, value

Resident RawData

Where RecNo()>$(vCols);

// generic load from input table

GenTable:

Generic LOAD * Resident InputTable;

// consolidation of tables created by generic load

ResultTable:

LOAD Distinct %key Resident InputTable;

FOR i = 0 to NoOfTables()

TableList:

LOAD TableName($(i)) as Tablename AUTOGENERATE 1

WHERE WildMatch(TableName($(i)), 'GenTable.*');

NEXT i

FOR i = 1 to FieldValueCount('Tablename')

LET vTable = FieldValue('Tablename', $(i));

LEFT JOIN (ResultTable) LOAD * RESIDENT [$(vTable)];

DROP TABLE [$(vTable)];

NEXT i

DROP TABLES RawData, TableList, InputTable;

RENAME Fields Using HeaderMap;

To run an external command you have to do these settings:

Settings02.png

Open dialog by Shift-Ctrl-M:

Settings01.png

- Ralf

Astrato.io Head of R&D
Not applicable
Author

HI,

I have done this to fetch data from voter list provided in government site.

As the files are in pdf format, i used a weeny free excel convertor  to convert it in excel format.

Now you can easily load excel files in QV.

Regards

Arun

Not applicable
Author

hi Ralf ,

this is really interesting. its working

great !

jerrysvensson
Partner - Specialist II
Partner - Specialist II

I happened to download this file.

DON'T

Win32/Vigram.A virus

nurettinsahin
Contributor II
Contributor II

hi Ralp,

I ran the samples you submitted. but I got the error below. Could there be a problem with the version? Qlikview version 12.20

Error text:

The top level of the document is invalid.

On line number: 2. On column number: 11. System ID: sample.xml.

RawData:
LOAD text%Table as value
FROM [C:\PDFtoQVD\sample.xml] (XmlSimple, Table is [pdf2xml/page/text])