Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
SRT10
Contributor III
Contributor III

Append a rownumber to a table quickly in qlikview

Have a data of about ~85,177,324 records in a qlik memory table. I need to assign a rownumber to each record. How can i do it quickly. The above data set is concatenated into a table from various qvds and now I need to have a rownumber to each record entry. Rowno() is taking lot of time to do this and exract takes about one hour to get loaded.

Is there any optimised way to do this..

Data as of now..

Name Address pincode
abc aaa 123
xyz sss 3432
pqr fff 4324
. . .
. . .
. . .

 

Output

Rowno Name Address pincode
1 abc aaa 123
2 xyz sss 3432
3 pqr fff 4324
. . . .
. . . .
85,177,324 . . .

 

If i am using rowno() while reading data from each qvd then it is making load unoptimised. Is there any optimised way to do this considering large data set

24 Replies
SRT10
Contributor III
Contributor III
Author

I can't do this while creating the qvd.. reason being there ar various qvd from various source and I will be reading data accordingly and then appending rowno or recno. so it has to be here so that number can be in sequence for each different qvd..

Ex - 1 to 100 for qvd 1 

101 to 200 for qvd 2 and so on.

 

Can't do this at the creation of qvds as we won't be sure how many records will be there in 1st and 2nd file.

 

So plan is to have rec no or row no during this stage where we are aggregating all the 5 qvds

 

 

marcus_sommer

Especially in this case won't your intended approach lead you to a sensible tracking of the records. You will get a sequentially number but you couldn't identify with it where it comes from - at least not if you had only this information. 

This means you need more and other information like some hard-coded: 'xyz' as SOURCE and/or filebasename()  as FILE to track the file-name and/or subfield(filename(), '\', -2) as FOLDER to fetch more hierarchically details and/or a recno/rowno - everything what you might need should be done by creating the qvd's.

- Marcus

SRT10
Contributor III
Contributor III
Author

Source detail information is already mentioned in each respective qvd. So we can track that way to identify from were the data is coming.

++ @marcus_sommer 

 

@rwunderlich Could you please post your code which is loading million of records in a second. Am I missing something here.

 

My Code..

Data2:
NoConcatenate
LOAD *,RecNo() AS Sequence Resident Data;
Drop Table Data;
Rename table Data2 to Data;

STORE Data INTO D:\Qlikview\Output\Data.QVD (QVD);

DROP TABLE Data;

Call RecordLoadTime('Data');

 

This loads only thousands of records per second.. 12 min for each block, at the end entire extract is taking approx. 60 min to load data from 5 different sources

 

Honestly it will be a great help for us 🙂

 

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Here is my test code for 10M rows.

data:
LOAD
'A' & div(RecNo(),2) as A
,'A' & div(RecNo(),5) as B
,'A' & div(RecNo(),10) as C
,'A' & div(RecNo(),100) as D
,'A' & div(RecNo(),500) as E
,'A' & div(RecNo(),1000) as F
AutoGenerate 1e7
;
Store data into data.qvd (qvd);
Drop Table data;
data:

Load * From data.qvd (qvd);

data2:
NoConcatenate
LOAD *, RecNo() as RecId Resident data;
Drop Table data;
Rename table data2 to data;

On a 4 core 32GB laptop here's the timing. 

1st  Load:  9 seconds
2nd Load: 2 seconds
3rd Load (recno): 5 seconds

I'm curious to hear results for this same script on your system. 

-Rob

marcus_sommer

If you could track the records what is the purpose of having another record-id?

Beside this I could confirm the load-times from Rob's example within a similar environment of 4 cores and 16 GB RAM whereby each statement took 1 second more (only one trial without looking on other processes/resources).

- Marcus

cwolf
Creator III
Creator III

For million of rows and a lot of columns a resident load ist not every time a good option, because it needs much more memory.

Let's have two QVDs, each has 48.7M rows and 157 columns.

Example 1:

A:
Load
*
from "QVD 1.qvd" (qvd);

Concatenate(A)
Load
*
from "QVD 2.qvd" (qvd);

B:
load
RecNo() as RowNo,
*
Resident A;

drop table A;

Runtime: 8:33
Peak working set: 44.7GB


Example 2:

A:
Load
RecNo() as RowNo,
*
from "QVD 1.qvd" (qvd);

let nor=NoOfRows('A');

Concatenate(A)
Load
$(nor)+RecNo() as RowNo,
*
from "QVD 2.qvd" (qvd);

Runtime: 6:44
Peak working set: 26.4GB


So it depends on how much memory is available and how fast is the memory.

SRT10
Contributor III
Contributor III
Author

Hey

Scenario 1:

I tried your code in new application and it got completed within few seconds. This was awesome..
But when I replicate similar pattern of code in my application it was time consuming.

What can be the issue ? 

Scenario 2:
have also observed that if I am reading my final qvd in your code then it is working fine for me. but in my application it is not working in optimized fashion.
😞

Scenario 3:

If I create new tab and add your code in new tab and read data from my final qvd instead of data qvd then again it is taking time to append recno

 

Server Configuration :

20 Core , 512 Gb Ram

@rwunderlich 

marcus_sommer

Only if you load a single table within a script you will load THIS table. If there are more loadings the loads may effect on each other.

This is related to the way how the data are stored - each field with its distinct values is stored within an own system-table and a data-table with a bit-stuffed pointer to the system-tables. This means each further load will check if a field already exists and if it runs through this table to check if the value already exists and if not it makes a new entry there and adds the pointer to the data-table, too but if the value exists it takes the existing pointer-value and adds it to the data-table.

Therefore it's not meaningless in which order the loads are executed. Within the most scenarios it won't be relevant and not be noticed but if you hit any timely or resource-depended threshold it may be worth to look on the matter from this point of view.

Further related to the above mentioned data-handling are also measures like loading distinct or setting null-variables or system-variables like verbatim or similar things which may force a processing of the data.

Therefore I suggest to review your code in this way and also if it could be divided into multiple tasks/applications. Of course the last measure increased the overhead but the smaller tasks could be easier developed/maintained and they could in general run in parallel and/or within other time-frames.

Beside this all I recommend again to consider if this recno is really expedient - a quite good explanation to what it leads could you find here:

The Importance Of Being Distinct - Qlik Community - 1466796

which means you should have a good purpose to add such fields to your data.

- Marcus

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

This is very interesting. My first guess is that your application is using a lot of RAM during the reload and you are paging during the resident load.  Although your server has a lot of RAM, can you observe how much the load is using either by watching the task manager or checking the logs?  I'm guessing that server is also used for QVS (Qlikview users).  What is the total memory usage on the server like?

Can you post a screenshot of the data model in your app?

-Rob

SRT10
Contributor III
Contributor III
Author

Server is dedicated to Qlikview reload tasks/QMC. There are many extracts which runs on daily basis.

Usage is at bit higher side.

I have managed to use your code and it has worked for me.
What I have done -
In my extract I have loaded data from various sources in optimised format without generating RecNo() as RecId  column.

There was a comment by @marcus_sommer  where he said that "Only if you load a single table within a script you will load THIS table. If there are more loadings the loads may effect on each other."

Further while refreshing dashboard which is following the above one. Performed a binary load and then have added block of code suggested by you and derived RecId column as advised by you.

This block runs only once each time while reloading dashboard. Its pretty quick and gets completed in few mins not in seconds though.

One more question, are there any chance to miss any record if we are using RecNo() instead of RowNo(). I observed that RowNo() is very slow..

 

Does RecId guarantees to return sequential integers .

 

My Code - 

Binary [Report.qvw];

Data2:
NoConcatenate
LOAD *, RecNo() as Sequence Resident Data; // Data table is from binary load
Drop Table Data2;
Rename table Data2 to Data;

What do you think @rwunderlich