Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Read data from a pdf file

Hi All,

Is there any way or any extension available to read data from a pdf file,

just like we use to read from other sources as excel , or db etc..

10 Replies
Anonymous
Not applicable
Author

No, but there are converters from PDF to Excel.

rbecher
MVP
MVP

Interesting.. Could you post an example PDF file to understand the use case?

- Ralf

Astrato.io Head of R&D
brijesh1991
Partner - Specialist
Partner - Specialist

Interesting. . . Could you post an example PDF file to understand the use case?
-Brijesh

Anonymous
Not applicable
Author

PFA a sample.........

rbecher
MVP
MVP

Hi Nitin,

this is quite a long road.. You can do it with a file conversion using pdftohtml.exe from Sourceforge:

// Set path of source file

Set vPath = C:\Projekte\QVPDF\;

// Set amount of columns

Set vCols = 2;

// convert PDF file to XML

EXECUTE cmd.exe /C pdftohtml.exe -xml $(vPath)sample.pdf;

// Load from XML (this is very dependent from PDF layout!)

RawData:

LOAD text%Table as value

FROM [$(vPath)sample.xml] (XmlSimple, Table is [pdf2xml/page/text]);

// Load field names from header for later renaming

HeaderMap:

Mapping First $(vCols) LOAD '@' & RecNo() as x,  value as y

Resident RawData;

// build a proper input table

InputTable:

LOAD ceil(RecNo()/2)-1 as %key, if(Mod(RecNo(),2)>0, '@1', '@2') as attribute, value

Resident RawData

Where RecNo()>$(vCols);

// generic load from input table

GenTable:

Generic LOAD * Resident InputTable;

// consolidation of tables created by generic load

ResultTable:

LOAD Distinct %key Resident InputTable;

FOR i = 0 to NoOfTables()

TableList:

LOAD TableName($(i)) as Tablename AUTOGENERATE 1

WHERE WildMatch(TableName($(i)), 'GenTable.*');

NEXT i

FOR i = 1 to FieldValueCount('Tablename')

LET vTable = FieldValue('Tablename', $(i));

LEFT JOIN (ResultTable) LOAD * RESIDENT [$(vTable)];

DROP TABLE [$(vTable)];

NEXT i

DROP TABLES RawData, TableList, InputTable;

RENAME Fields Using HeaderMap;

To run an external command you have to do these settings:

Settings02.png

Open dialog by Shift-Ctrl-M:

Settings01.png

- Ralf

Astrato.io Head of R&D
Not applicable
Author

HI,

I have done this to fetch data from voter list provided in government site.

As the files are in pdf format, i used a weeny free excel convertor  to convert it in excel format.

Now you can easily load excel files in QV.

Regards

Arun

Not applicable
Author

hi Ralf ,

this is really interesting. its working

great !

jerrysvensson
Partner - Specialist II
Partner - Specialist II

I happened to download this file.

DON'T

Win32/Vigram.A virus

nurettinsahin
Contributor II
Contributor II

hi Ralp,

I ran the samples you submitted. but I got the error below. Could there be a problem with the version? Qlikview version 12.20

Error text:

The top level of the document is invalid.

On line number: 2. On column number: 11. System ID: sample.xml.

RawData:
LOAD text%Table as value
FROM [C:\PDFtoQVD\sample.xml] (XmlSimple, Table is [pdf2xml/page/text])