Qlik Community

QlikView App Dev

Discussion Board for collaboration related to QlikView App Development.

Announcements
QlikWorld 2022, LIVE in Denver CO., May 16-19, 2022. REGISTER NOW TO RECEIVE EARLY BIRD PRICING
cancel
Showing results for 
Search instead for 
Did you mean: 
gupta_n8
Specialist II
Specialist II

Read data from a pdf file

Hi All,

Is there any way or any extension available to read data from a pdf file,

just like we use to read from other sources as excel , or db etc..

10 Replies
mov
Champion III
Champion III

No, but there are converters from PDF to Excel.

rbecher
MVP
MVP

Interesting.. Could you post an example PDF file to understand the use case?

- Ralf

Vizlib Head of R&D
brijesh1991
Partner
Partner

Interesting. . . Could you post an example PDF file to understand the use case?
-Brijesh

gupta_n8
Specialist II
Specialist II
Author

PFA a sample.........

rbecher
MVP
MVP

Hi Nitin,

this is quite a long road.. You can do it with a file conversion using pdftohtml.exe from Sourceforge:

// Set path of source file

Set vPath = C:\Projekte\QVPDF\;

// Set amount of columns

Set vCols = 2;

// convert PDF file to XML

EXECUTE cmd.exe /C pdftohtml.exe -xml $(vPath)sample.pdf;

// Load from XML (this is very dependent from PDF layout!)

RawData:

LOAD text%Table as value

FROM [$(vPath)sample.xml] (XmlSimple, Table is [pdf2xml/page/text]);

// Load field names from header for later renaming

HeaderMap:

Mapping First $(vCols) LOAD '@' & RecNo() as x,  value as y

Resident RawData;

// build a proper input table

InputTable:

LOAD ceil(RecNo()/2)-1 as %key, if(Mod(RecNo(),2)>0, '@1', '@2') as attribute, value

Resident RawData

Where RecNo()>$(vCols);

// generic load from input table

GenTable:

Generic LOAD * Resident InputTable;

// consolidation of tables created by generic load

ResultTable:

LOAD Distinct %key Resident InputTable;

FOR i = 0 to NoOfTables()

TableList:

LOAD TableName($(i)) as Tablename AUTOGENERATE 1

WHERE WildMatch(TableName($(i)), 'GenTable.*');

NEXT i

FOR i = 1 to FieldValueCount('Tablename')

LET vTable = FieldValue('Tablename', $(i));

LEFT JOIN (ResultTable) LOAD * RESIDENT [$(vTable)];

DROP TABLE [$(vTable)];

NEXT i

DROP TABLES RawData, TableList, InputTable;

RENAME Fields Using HeaderMap;

To run an external command you have to do these settings:

Settings02.png

Open dialog by Shift-Ctrl-M:

Settings01.png

- Ralf

Vizlib Head of R&D
Not applicable

HI,

I have done this to fetch data from voter list provided in government site.

As the files are in pdf format, i used a weeny free excel convertor  to convert it in excel format.

Now you can easily load excel files in QV.

Regards

Arun

Not applicable

hi Ralf ,

this is really interesting. its working

great !

jerrysvensson
Partner
Partner

I happened to download this file.

DON'T

Win32/Vigram.A virus

nurettinsahin
Contributor II
Contributor II

hi Ralp,

I ran the samples you submitted. but I got the error below. Could there be a problem with the version? Qlikview version 12.20

Error text:

The top level of the document is invalid.

On line number: 2. On column number: 11. System ID: sample.xml.

RawData:
LOAD text%Table as value
FROM [C:\PDFtoQVD\sample.xml] (XmlSimple, Table is [pdf2xml/page/text])