Qlik Community

Qlik Design Blog

All about product and Qlik solutions: scripting, data modeling, visual design, extensions, best practices, etc.

There are a number of prefixes in QlikView, that help you load and transform data. One of them is the Generic prefix.

Whenever you have a generic database, the Generic prefix can be used to transform the data and create the desired fields. A generic database is basically a table where the second last column is an arbitrary attribute and the very last is the value of the attribute. In the input table below you have a three-column generic database.

 

Generic transformation4.png

 

But if you want to analyze this data, it is much easier to have all attributes in separate fields so that you can make the appropriate selections. It is not very practical to have one single field for all attribute values, since you may want to make selections using different attributes at the same time.

 

Enter the Generic prefix.

 

It converts the data to a structure where each attribute is placed in a field of its own. Another way to express it is to say that it takes field values and converts these to field names. If you compare it to the Crosstable prefix, you will find that they in principle are each other’s inverses.

 

The syntax is

 

   Generic Load Key, Attribute, Value From … ;

 

There are however a couple of things worth noting:

  • Usually the input data has three columns: one qualifier field (Key in the above example), an Attribute and a Value. But you may also have several qualifying fields. If you have four or more columns, all columns except the two last will be treated as qualifying fields.
  • The Generic prefix will create several tables; one table per attribute. This is normally not a problem. Rather, it is an advantage: It is the least memory-consuming way to store data if you have many attributes.

 

If you have more than one key, this means that you will get a composite key – a synthetic key – in the data model:

 

Multiple keys3.png

 

Although it looks ugly, this synthetic key is completely harmless. But it may still be a good idea to replace it with a manually created concatenated key:

 

   Autonumber(Key1 & '|' & Key2 & '|' & Key3) as Key,

 

Finally, I have seen many examples on QlikCommunity where a For-Next loop is used to join together all tables created by the Generic prefix, e.g.:

 

   Set vListOfTables = ;

   For vTableNo = 0 to NoOfTables()

      Let vTableName = TableName($(vTableNo)) ;

      If Subfield(vTableName,'.',1)='GenericLabel' Then

         Let vListOfTables = vListOfTables & If(Len(vListOfTables)>0,',') & Chr(39) & vTableName & Chr(39) ;

      End If

   Next vTableNo

 

   CombinedGenericTable:

   Load distinct Key From GenericDB;

 

   For each vTableName in $(vListOfTables)

      Left Join (CombinedGenericTable) Load * Resident [$(vTableName)];

      Drop Table [$(vTableName)];

   Next vTableName

 

The result is one big table that contains all attributes; a table that often is sparse (containing many NULL values) and much larger than the initial tables. And no performance has been gained… So I can only say:

 

You should not do this - unless you have a specific reason to.

 

The Generic prefix creates a set of tables that store the data in an optimal way. In most cases you should not change this. I realize, however, that there are cases where you want to transform the data further and need the data in one, unified table. Then the above scriptlet can be used.

 

HIC

41 Comments
MVP
MVP

(Edit: Looks like I also should have read all the comments on the linked blog post before posting this. The storage of nulls is explained there in more detail. I've still not read it all.)

Thank you so much for the link to the symbol tables and bit-stuffed pointers! I'll give that a thorough read and try to commit it to memory. The symbol tables on first glance are exactly what I expected. I thought they might connect in some more complicated way than the data table, but the data table makes sense.

Let me calculate how sparse my data is so we have that as reference. 16,225,873 rows in key-attribute-value form / (559 fields * 143,640 keys) = 20% of fields are non-null, so 80% are null.

You say that null is stored in each symbol table with pointers from the main table. OK. I have 558 attributes. For each I count the number of distinct values and add 1 for null. From that I can calculate the bits required for the symbol table key. The symbol table row is that number of bits plus the number of bits required to store the value. Well, that gets iffy, because I don't know how many are text values vs. numeric values, and I believe QlikView will try to turn numeric text into numbers in the symbol table. So it breaks down a bit, but let's just say len()*8 as the bits. Add the key bits, and that's that row length in the symbol table. Add up the rows for that attribute. Then add up the totals for the attributes. I get 9,070,157 bits = 1107 KB. Then we have the symbol table for the key, which needs 18 bits for the pointer and 72 bits for the value, 90 * 143,640 keys = 12,927,600 bits = 1578 KB for the key's symbol table. 1107 + 1578 = 2685 KB for the symbol tables. Then we have the main table with the pointers. I've already calculated the bits required for the pointer for each field, so I can total those up, and I get 2625 bits per row * 143,640 = 377,055,000 bits = 46,027 KB for a total of 48,712 KB. Plus overhead, whatever that might come to. That's dramatically higher than the 13,143 KB that was observed.

Now for the sake of argument, let's say that nulls are not stored in the symbol tables, and in fact take zero bytes everywhere. My symbol tables are slightly smaller, though I'll always assume the pointer needs at least one bit even when there's only one value. Anyway, there are fewer bits, but not enough to matter, and I get 2685 KB for the symbol tables again. For my main table, I have all the keys once, so 18 bits * 143,640 keys = 316 KB. Then for each field, it's the number of keys that have a non-null value for the field, times the length of the pointer to the symbol table. Add all that up, and I get another 14,006 KB. So my total is 14,006 + 316 + 2685 = 17,007 KB. Much closer but still higher than the 13,143 KB that was observed.

What about the generic load results? The symbol tables shouldn't need nulls, but either way, 2685 KB for those. For each table, I already calculated how many bits are needed for the pointer to the attribute field. Add another 18 bits for the pointer to the key field. Multiply by the number of rows in that table. Add up all the tables. I get 49,659 KB. So 52,344 KB total, plus overhead. This is again somewhat higher than the 38,008 KB that was observed.

This suggests a couple of things to me. First, that nulls are likely being stored in a more efficient way than as regular old entries in our symbol tables with pointers from our main tables. And that makes sense to me. Sparse tables are very common. An efficient handling of nulls would seem an important requirement for an in-memory data model. Second, that there's some additional compression going on in both the generic tables and the joined table. There's the compression of numerics, but that won't account for much of it because that's just the symbol tables, which are small in comparison to the main table(s).

Or I've simply miscalculated. Certainly possible.

0 Likes
490 Views
MVP
MVP

Clever! And it looked very promising, and I'd love for my load to run faster and more efficiently, so I gave it a shot on my real data. My script was a little simpler because I knew that I only had one value per key per attribute. Here's the core of it. Before we get to this point, I've loaded my raw data into a table called Generic with fields Key, Field, and Value.

// Autogenerate faster than load distinct
[Fields]:
LOAD text(fieldvalue('Field',recno())) as [Field]
AUTOGENERATE fieldvaluecount('Field')
;
SET vList = ;

FOR I = 0 TO noofrows('Fields') - 1
    
LET vField = peek('Field',I,'Fields');
    
LET vList = vList & chr(13) & ',only(distinct if([Field]=' & chr(39) & vField & chr(39) & ',[Value])) as [' & vField & ']';
NEXT

DROP TABLE [Fields];

[Flat]:
LOAD
[Key] $(vList)
RESIDENT
[Generic]
GROUP BY
[Key]
;

DROP TABLE [Generic];

Technically, it works. But my version at least is extremely hard on memory. Somewhere in the middle of the load of Flat, memory use spikes. Running on my 16 million row table with 558 attributes, my machine with 8 GB eventually froze and had to be turned off and on to recover. So I ran for 1% of rows. Memory still spiked fast and hard, and again my machine had to be turned off and on to recover. So I dropped it to 0.1% of my rows. Only at that point did it work. It still took about 3x more memory at its peak than the generic load and left join approach, but it did complete faster, in 30ish seconds instead of 40ish seconds.

Maybe someone can spot a mistake in my script, or maybe that's just the nature of the beast for some reason, that it takes huge amounts of memory during the load.

0 Likes
490 Views
MVP & Luminary
MVP & Luminary

John,

Re the QVW sizes you are showing, are those from disk? And if so, are you storing the QVW uncompressed to get a number that reflects the actual size?

-Rob

http://masterssummit.com

http://qlikviewcookbook.com

490 Views
New Contributor III

Very useful!

Thanks HIG

0 Likes
490 Views
MVP
MVP

Sorry for the delay. Yes, those were from disk. Embarrassingly, no, I was storing with High compression. Let's see what happens at None.

                         Original   Generic   Joined
QVW KB            97,697    62,347    52,849

Much closer, over but not hugely over what I calculated, and I didn't know what sort of overhead I'd have, and my calculations were surely at least a little inaccurate. Looks reasonable based on the calculations, then. And thus the simple explanation for why I was seeing nulls take very little space was that they'd simply been compressed out when I saved.

I'm not sure what to make of the much larger in-memory difference. I was measuring that by closing QlikView to kill caching, then opening by double clicking on the file, then looking at task manager and reporting how much memory it said QlikView was using. There's a lot more in memory than just the data model at that point, but I wasn't sure how to parse that out, and had guessed that other things in memory would be approximately equal between the two. But perhaps I can't trust Task Manager for measuring memory consumption. I know caching makes Task Manager a "lie", which is why I went in fresh, but there could be other things I'm unaware of.

0 Likes
490 Views
MVP & Luminary
MVP & Luminary

John,

You can confirm  your memory calculations by generating a .mem file (Document Properties, Mem File) and loading it up in QlikView. You can also use Document Analyzer to get the same numbers or process the .mem file with QvOptimizer.

-Rob

0 Likes
490 Views
New Contributor

Hi,

I am creating a data model use the genric load statement but I have a complication, let me explain.

The DB source is very simple has one fact table and two dimensional tables.

First dimensional table is related with  production sites

Plant IDPlantNamePlant TypeCountryRegion
1Plant A Type 1 Italy Europe
2Plant BType 1FranceEurope
3Plant CType 2BelgiumEurope

Second dimensional table is related with the metrics calculated per site

Metric IDMetricNameMetric TypeMetric UnitsPlant ID
1Metric A Type 1Units A1
2Metric BType 2Unit B1
3Metric BType 2Unit B2

The fact table contains the daily value for each metric for all production sites

Metric IDMetricValuePlant IDDate
1Metric A 101-01-17
2Metric B 101-01-17
3Metric B 201-01-17

In the load script all map is done so the IDs columns are replaced by the names

Unfortunately the fact table can not be used as it is due to several reasons. As example the agregation rules can vary by metric.

Therefore I use the generic statement to get a table per metric . When doing so the system create a syntethic key composed by Date+PlantName+MetricName. This key works well to create apps as value tables and dimension tables (production sites table and metric dimensional table) are linked, but I can not create any calculated measure (e.g. Metric A + Metric B) in an app  as they do not share any comom key. I can overcome the problem by creating a shorther Key  Date+PlantName which will allow me to create calculated measures (e.g. Metric A + Metric B). Unfortunately then the link between the values tables and metric dimensional table is nonexisting.

Is there a way to have all functionalities available?

Thanks a lot

Paco

0 Likes
490 Views
Contributor II

Hello, I'm am trying to accomplish the part where "The result is one big table that contains all attributes" so it can be used to join to other tables.  The values for SOURCE_SYSTEM are PLS, MP and ORCA.  Having trouble with the part to do the FOR-NEXT loop.  Here's what I have but it gives me the error below.  Any help would be appreciated.  Thanks!

step1:

generic LOAD

#POL_N as POL_N,

Autonumber(SOURCE_SYSTEM & '|' & SOURCE_SYSTEM & '|' & SOURCE_SYSTEM) as Key,

trim(SOURCE_SYSTEM) as SOURCE_SYSTEM,

DEVICE_VALUE

FROM

(qvd)

WHERE(DEVICE_TYPE = 'E-MAIL ADDRESS');

Set vListOfTables = 3;

  For vTableNo = 0 to NoOfTables()

  Let vTableName = TableName($(vTableNo)) ;

  If Subfield(vTableName,'.',1)='GenericLabel' Then

  Let vListOfTables = vListOfTables & If(Len(vListOfTables)>0,',') & Chr(39) & vTableName & Chr(39) ;

  End If

  Next vTableNo

CombinedGenericTable:

Load distinct Key From step1;

  For each vTableName in $(vListOfTables)

  Left Join (CombinedGenericTable) Load * Resident [$(vTableName)];

  Drop Table [$(vTableName)];

  Next vTableName

0 Likes
490 Views
Contributor

Hello Henric,

i appreciate your blog entry. It motivated me to reproduce your example simply without a Loop. I hope it is okay, when i share my work here. If not please let me know.

GenericDB_Dashboard.JPG

Greets,

gino2780

0 Likes
490 Views
Partner
Partner

Trying to add an excel sheet that which i cannot.

have a similar situation of loading as a cross table with multiple header.

please help

tony

       

Baseline Qty Baseline Price (Final, average of shortlisted suppliers if no baseline price) Baseline Spend
MB IDProduct CategorySupplierSupplier Part NumberManufacturer/ BrandManufacturer Part NumberStock CodeItem NameItem DescriptionUoPUoIQty/UOPPlant1Plant2Plant3Plant4Plant5Plant6Plant7Plant8Plant9Plant10 Estimated Annual Qty Baseline Qty SourceCommentsPlant1Plant2Plant3Plant4Plant5Plant6Plant7Plant8Plant9Plant10Baseline Price SourcePlant1Plant2Plant3Plant4Plant5Plant6Plant7Plant8Plant9Plant10 Item Level
Baseline Spend
1536VALVESup2SS2P-25MM00000094532VALVE,BALL:25MM F.BSP;SS316 BODY,BALL & STEM;  TYPE K2(2-PIECE);    EA 178109 34               34            255 INVOICE $20.96$20.96$0.00$20.96$0.00$0.00$0.00$0.00$0.00$20.96 $ 1,635 $ 2,285   $               713         $               713 $ 5,345
107FITTINGSSup3100001537500000027516COUPLING,PIPE:50MM F.BSP X 50MM F.BSP; STEEL; GALV;  AS1074 SPEC2IN; EN 10241EA 17429 1860               33            214 INVOICE $2.85$2.85$0.00$2.85$2.85$0.00$0.00$0.00$0.00$3.09 $               211 $                  83   $                  51 $               171       $               102 $               618
115FITTINGSSup3100001466000000027706TEE,PIPE:50MM F.BSP EQUAL,MALL IRON,GALV,        2450KPA,AS3673 SPEC                     2INCH EQUALEA 18027 2514               15            161 INVOICE $5.61$5.61$0.00$5.61$5.61$0.00$0.00$0.00$0.00$6.09 $               449 $               151   $               140 $                  79       $                  91 $               910
1139PIPESup1CHS-25-M-GAL-TBE-650000000004283
000079509
000155362
PIPE,METALLIC:25MM NB;6.5M LGTH;STEEL;PLAIN ENDS;GALV;MEDIUM;EA 1160107 531215 84              64            423 INVOICE $35.70$29.09$0.00$29.09$41.66$41.66$0.00$29.09$29.09$29.09 $ 5,712 $ 3,113   $ 1,542 $               500 $               625   $               233 $ 1,862 $ 13,586
1141PIPESup3100003298800000079525PIPE,METALLIC:32MM NB;6.5M LGTH;SCREWED;STEEL;GALV;   MED GAUGE;3.2MM WALL THK;3.22KG/M; AS1074;EA 117482 46314             113            432 INVOICE $43.93$36.96$0.00$36.96$43.93$43.93$0.00$0.00$0.00$36.96 $ 7,644 $ 3,031   $ 1,700 $               132 $               615     $ 4,176 $ 17,298
MB IDAttributeProduct CategorySupplierSupplier Part Number Baseline Qty MB IDAttribute Baseline Price MB IDAttributeForecast Spread
1536Plant1 781536Plant1781536Plant11634.88
1536Plant2 1091536Plant21091536Plant22284.64
1536Plant3 01536Plant301536Plant3
1536Plant4 341536Plant4341536Plant4712.64
1536Plant5 01536Plant501536Plant50
1536Plant6 01536Plant601536Plant60
1536Plant7 01536Plant701536Plant7
1536Plant8 01536Plant801536Plant80
1536Plant9 01536Plant901536Plant90
1536Plant10 341536Plant10341536Plant10712.64
107Plant1 74107Plant174107Plant1210.9
107Plant2 29107Plant229107Plant282.65
107Plant3 0107Plant30107Plant3
0 Likes
490 Views
New Contributor

I know this is an old post but i'm trying to follow the above but im getting confused, and would like some help.  Here is my script.  The first part of the script creates all my bespoke tables...but I want to consolidate them into a single table as per Henric's post, but I cant get it to work.  The first part works and creates multiple tables for each of my BespokeDefinition but then it says table not found.  Does anyone know what i'm doing wrong?

SET v.GenericTableName = 'BespokeItem';
//
[$(v.GenericTableName)]:
Generic
LOAD
CommonRef,
ReferralRef,
BespokeDefinition,
BespokeItem,
SchemeName,
ServiceDesc,
BespokeFieldCaption,
BespokeFieldControlType
FROM [Q:\Production\Data\Source\CQ QVDs\BespokeFieldItems.QVD] (qvd);


Set vListOfTables = ;
For vTableNo = 0 to NoOfTables()
Let vTableName = TableName($(vTableNo)) ;
If Subfield(vTableName,'.',1)='$(v.GenericTableName)' Then
Let vListOfTables = vListOfTables & If(Len(vListOfTables)>0,',') & Chr(39) & vTableName & Chr(39) ;
End If
Next vTableNo

CombinedGenericTable:

Load distinct BespokeDefinition resident [$(vTableName)];
For each vTableName in $(vListOfTables)
Left Join (CombinedGenericTable) Load * Resident [$(vTableName)];
Drop Table [$(vTableName)];
Next vTableName

0 Likes
192 Views
Labels