Solved: Anybody knows a faster/optimized way to do this? (... - Qlik Community

marcel_olmo · ‎2010-12-14

Hey guys, I have to do a demo with millions of records of a "selling table".

My code works perfectly. But it does autogenerates, and they are very slow if you have to generate a big bunch of data.

I have to do this script for 10 million records, and if I trust my computer, it takes 3 days of generating data. 😞

Here is what I have, if anybody knows a way to do it faster, I'm gonna thank you a lot :

set nClientes = '1000000';

set nProductos = '5000000';

set nCabeceras = '10000000';

set nLineas = '10';

FOR k = 0 TO ($(nCabeceras) -1)

FOR i = 0 to ($(nLineas)-1)

LineasVentas:

load

mod(2000 + ( $(k) ),$(nCabeceras)) + 2000 as [Nº documento],
10000 + ($(i)) as [Nº línea],
(mod(2000 + ( $(k) ),$(nCabeceras)) + 2000) &'-'&( 10000 + (($(i))) ) as [Nº documento-Nº línea],
mod(10000 + ( $(k) ),$(nClientes)) + 10000 as [Venta a-Nº cliente],
date( today() - mod($(k),500)) as [Fecha registro],
mod(1000+ (rowno()-1),$(nProductos))+ 1000 as [Nº producto]

autogenerate 1;

NEXT i

NEXT k

store LineasVentas into LineasVentas.qvd;

Report Inappropriate Content · ‎2010-12-14

Hi Marcel,

You can create a table temporary to generate the values of variables "i", "k". After use this table to generate your data.


set nClientes = '1000000';
set nProductos = '5000000';
tabTemp:
LOAD (100 - RowNo()) AS k
AUTOGENERATE 10000000;
Lineas:
JOIN(tabTemp)
LOAD (10 - RowNo()) AS i
AUTOGENERATE 10;
LineasVentas:
LOAD
 mod(2000 + (k),$(nCabeceras)) + 2000 as [Nº documento],
 10000 + (i) as [Nº línea],
 (mod(2000 + (k),$(nCabeceras)) + 2000) &'-'&( 10000 + (($(i))) ) as [Nº documento-Nº línea],
 mod(10000 + (k),$(nClientes)) + 10000 as [Venta a-Nº cliente],
 date( today() - mod(k,500)) as [Fecha registro],
 mod(1000+ (rowno()-1),$(nProductos))+ 1000 as [Nº producto]
RESIDENT tabTemp;
DROP TABLE tabTemp;
store LineasVentas into LineasVentas.qvd;

View solution in original post

maneshkhottcpl · ‎2010-12-14

What is the use of this script?

or u can use incremental load.

suniljain · ‎2010-12-14

is this code for increamental load or anything else ?. not getting your code .

marcel_olmo · ‎2010-12-14

Thank you Manesh for your quick reply.

What is the use of this script? I don't understand you. Are you asking me why I'm doing it like this way? Because I don't know a better way to do it, of course, haha.

It works fine for few thousands of records, but this time I have to do it for millions of records, and it spends a lot of time to do it, and I'm looking for better / faster ways to do that.

nCabeceras is the "selling header", and I have 10 million, and nLineas is "selling lines", and with 10 is enough.

Could you be more specific about using incremental load in this case?

Thank you very much!!

sathishkumar_go · ‎2010-12-14

Hi Marcel,

It depends up on data Maintaining. What I suggest, Create 1 Qvd for Historical data and Make 1 qvd for last 1 or 2 year data.

In final App Call all Qvds either concatenate or use *.

Regards

Sathish

marcel_olmo · ‎2010-12-14

Thank you Sathish for your suggestion.

Unfortunately, I don't have historical data. The point is to get a big bunch of "invented" data to test between all the versions of QV9 and QV10.

I know that when I get the qvd file of "selling lines", everything is gonna get more easy. But by now, I have the problem of the time, I don't want to wait 3 or 4 days of "creating autogenerate data".

Many thanks anyway.

marcel_olmo · ‎2010-12-14

Hello Sunil, thanks for your reply. It's not for an incremental load, is for testing "invented selling data" to compare between all versions of QV.

(is a request from mycompany).

sathishkumar_go · ‎2010-12-14

HI Marel,

May I know y u r using nested for loop in script?

Because this nested for loop obviously take time (Because it ll read line by line also)

In 1st loop will read almost 1 crore records.

try to avoid the nested loops or explain y u need to use this logic it will help to find some other logic

-Sathish

marcel_olmo · ‎2010-12-14

Yes, I know I'm using nested for loop and that's why this is slow. But by now I don't know a better way to do it, and until now it was enough for my requirements, but not for this case. 😞

This is a picture of my desired result :

As you can see, there is :

1-Id Selling Header.

2- 10 associated selling lines for each selling header

3- The same Id Costumer for each Id Selling Header

4-The same Register date for each Id Selling Header

5- An incremental number of Id products for each line of each selling header.

Hope it helps about what I want to get.

Many thanks in advance!!!

Miguel_Angel_Baeyens · ‎2010-12-14

Hello Marcel,

I understand that you want to create some "dummy" sales information. If my assuption is correct, try something based on the following

Set nClientes = '1000000';Set nProductos = '5000000';Set nCabeceras = '1000000';Set nLineas = '10'; LineasVentas:LOAD Ceil(Rand() * $(nCabeceras)) + 2000 as [Nº documento], Ceil(Rand() * $(nLineas)) + 10000 as [Nº línea], Ceil(Rand() * $(nCabeceras)) + 2000 & '/' & Ceil(Rand() * $(nLineas)) + 10000 as [Nº documento-Nº línea], Ceil(Rand() * $(nClientes)) as [Venta a-Nº cliente], Date(Today() - Ceil(Rand() * 500)) as [Fecha registro], Ceil(Rand() * $(nProductos)) + 1000 as [Nº producto]AUTOGENERATE $(nCabeceras); store LineasVentas into LineasVentas.qvd;

The performance problem in your script is not in the autogenerated table rather than the amount of loops it has to do. In my modest computer, those 10M have taken few minutes to load. Playing with the rand(), ceil() and floor() and chr() functions, you can get some coherent information.

Hope this helps

Anybody knows a faster/optimized way to do this? (Autogenerate stuff)