Skip to main content
Announcements
See why Qlik is a Leader in the 2024 Gartner® Magic Quadrant™ for Analytics & BI Platforms. Download Now
cancel
Showing results for 
Search instead for 
Did you mean: 
rubenmarin

Split groups of columns in rows

Hello,

I have a table with 30 values reported in a single line, and each value is composed of 5 fields, ex .:

Key1, Key2, Field101, Field201, Field301, Field401, Field501, Field102, Field202, Field302, Field402, Field502, ...Field530

I would have to distribute in rows each group of 5 fields, it is a qvd of several million records, for now I have it done with a loop but it has to read 30 times and it takes a lot. Do you know if there is a way to avoid reading it 30 times?

FOR i = 1 to 30

    LET vNum = Num ($(i), '00');

    Concatenate (TableName)

    LOAD Key1,

          Key2,

          Field1$(vNum) as Field1,

          Field2$(vNum) as Field2,

          Field3$(vNum) as Field3,

          Field4$(vNum) as Field4,

          Field5$(vNum) as Field5

    From

        [$(vPath)FileName.qvd]

        (qvd)

    Where

        Field1$(vNum) <> 0

    ;

NEXT

I'm not looking for delta/incremental loads solutions.

Thanks,

Regards.

25 Replies
rubenmarin
Author

Thanks Marco, this one has become my first option. I'll make some tests and return with the results.

I will also use Rob's code, this table will be exported to qvd, so I think I can avoid the discussion this time.

rubenmarin
Author

I made some tests, I have created a 1,2Gb qvd with10M rows and all 150 fields (30 groups of 5 fields) with 2 keys (generator attached).

I couldn't make optimized load with a where Exists clause to work but doing an optimized load in bucle (without 'where') and a resident of this one was the fastest.

Optimized bucle and resident where: bit less than 3 minutes.

Bucle with Where: 6 minutes

Subfield: 45 minutes. It proccess all the 300M rows.

Generic: about 3 hours (cancelled after 2 hours). It creates a row for each field (10M*30*5 fields), making at least 2 turns of the 1500M rows.

I didn't worried to make clean tests, the computer was doing other non very consming tasks.

Attached the different tests made.

Edit: Just to comment that I tried to simulate real data giving less probability for '1' in Field1 as goups are higher (group 30 has less probability than group 29), so in my data only 28,8M had Field1<>0.

marcus_sommer

Many thanks Ruben that you made a comparing of the various approaches. I think it will be very useful in similar cases.

- Marcus

MarcoWedel

one example of combining the loop with a generic field name extraction:

tabData:

LOAD * FROM (qvd);

tabFields:

CrossTable (Field,Value)

LOAD 1, * Resident tabData Where RecNo()=1;

tabLoads:

LOAD Concat(Field&' as '&Left(Field,2),', ') as Fields,

    Mid(Field,3) as Group

Resident tabFields

Where RecNo()>2

Group By Mid(Field,3);

Join

LOAD Concat(Field,', ') as Keys

Resident tabFields

Where RecNo()<=2;

tabResult:

LOAD 1 as tempField AutoGenerate 0;

FOR Each vGroup in FieldValueList('Group')

    LET vLoad = Lookup('Keys','Group',vGroup,'tabLoads')&','&Lookup('Fields','Group',vGroup,'tabLoads');

Concatenate (tabResult)

LOAD $(vLoad),

    '$(vGroup)' as Group

Resident tabData;

NEXT

DROP Tables tabData, tabFields, tabLoads;

DROP Field tempField;

Marco

MarcoWedel

... and another one looping per row

tabData:  

LOAD K1,  

     K2,  

     Pick(IterNo(),F101,F102,F103,F104,F105,F106,F107,F108,F109,F110,F111,F112,F113,F114,F115,F116,F117,F118,F119,F120,F121,F122,F123,F124,F125,F126,F127,F128,F129,F130) as Field1,  

     Pick(IterNo(),F201,F202,F203,F204,F205,F206,F207,F208,F209,F210,F211,F212,F213,F214,F215,F216,F217,F218,F219,F220,F221,F222,F223,F224,F225,F226,F227,F228,F229,F230) as Field2,  

     Pick(IterNo(),F301,F302,F303,F304,F305,F306,F307,F308,F309,F310,F311,F312,F313,F314,F315,F316,F317,F318,F319,F320,F321,F322,F323,F324,F325,F326,F327,F328,F329,F330) as Field3,  

     Pick(IterNo(),F401,F402,F403,F404,F405,F406,F407,F408,F409,F410,F411,F412,F413,F414,F415,F416,F417,F418,F419,F420,F421,F422,F423,F424,F425,F426,F427,F428,F429,F430) as Field4,  

     Pick(IterNo(),F501,F502,F503,F504,F505,F506,F507,F508,F509,F510,F511,F512,F513,F514,F515,F516,F517,F518,F519,F520,F521,F522,F523,F524,F525,F526,F527,F528,F529,F530) as Field5  

FROM (qvd)

While IterNo()<=30;

still not very fast though ...

Marco

rubenmarin
Author

Thanks Marco, as you say, the optimized bucle version is still the fastest and, in my opinion, the crosstable-generic is still the cutest.