Is it best to load then manipulate? - Qlik Community

derekjones · ‎2011-07-07

I have a question for any of you experienced Qlikview users...

I have a sales extract with around 4 million records that I need to load into my app.

Is it best to..

1) Load it into the app first as a temp table then manipulate any fields e.g. Using Mid() to strip out month and year from date or If(...) on fields that need data changing on certain conditions by creating a new table based on the temp table then dropping that temp table

2) Load it into an app first and create a QVD, then use that QVD to load it again and do the manipulations on that load.

3) Do any manipulations in the load script from the original sales extract

Common sense would say option 3 would be slow as you are trying to manipulate data as you are loading it from a text file. Option 2 would be more efficient, but you are having to save data to a qvd which slows it down and uses up unecessary server space and option 1 would be the most efficient as you are loading the raw data into the app, then using this "more accessible" table to then do your manipulations on.

I've tested 1 and 3 and there appears to be very little difference in load times, can anyone advise if this is the case. Which option in an experienced Qlikviewer mind would be best way to go?

Miguel_Angel_Baeyens · ‎2011-07-07

Hello Derek,

What I usually do is to do all the transformations required by your analyses and charts in the load that precedes the SQL statement that actually retrieves from the database, so when you store into the QVD file all data is clean and with proper formats.

This way you can load the optimized QVD files in different applications that is the faster way to load data into QlikView.

It may take longer but the code is cleaner and you will always know that the issue is always in the first step (loading from the SQL) and easier to correct, to add new fields or anything.

Hope that makes sense.

Miguel Angel Baeyens

BI Consultant

Comex Grupo Ibérica

nagaiank · ‎2011-07-07

I always try to do all manipulations, calculations which are not dependent on the user input in the load script. This improves response time.

derekjones · ‎2011-07-07

Thank you all. It appears I still have to load the table twice anyway as I have to manipulate one field's data on load to then use that "cleansed data" against a mapping table to load in an additional field. Do you have to load the whole lot again on a mapping load on a field that's being manipulated? A simplified example is below..

My Mapping table is grouping Teams (TeamName) into groups of Teams (TopTeam) which I have in Excel which I load in:

Top_Team_Map:

MAPPING LOAD

TeamName,

TopTeam

FROM $(vAppDataDir)TOP_TEAM_MAP.xls (biff, embedded labels, table is Sheet1$);

I load my sales data in and manipulate TeamName to get rid of any inconsistencies...

TmpSales:

LOAD
customer_id as CustomerID,

if(team_now = 'TeamDead’, team_then , team_now) as TeamName

FROM

$(vDataDir)SALES.txt
(txt, codepage is 1252, embedded labels, delimiter is '|');

I then have TeamName which I can use in my mapping, but I have to laod the whole table again...

Sales:

LOAD
CustomerID,

TeamName,

CustomerID &'~'& applymap('Top_Team_Map', TeamName, Null()) As TopTeamKey

Resident TmpSales;

Drop table TmpSales;

I can't easily put the if statement in the mapping as this is just a simplified example, my if's are quite a string in reality.

Any ideas?

My thanks in advance

Miguel_Angel_Baeyens · ‎2011-07-07

Hi Derek,

It sometimes happens, that you have to load several times a table until it gets completely clean an prepared to be used in charts. Usually, this process performs better and your dashboards are rendered faster and your expressions simpler.

In regards to your question, as in any other LOAD you can use an If() within the mapping, or you can use two mapping tables. Heres hwo it will work with a very dummy example

CodeNameMap:

MAPPING LOAD Code,

Name

FROM CodeSource;

NameCountryMap:

MAPPING LOAD Name,

Country

FROM CountrySource;

EmployeeWithCountry:

LOAD *,

ApplyMap('CodeNameMap', EmployeeID, 'Employee not Found') AS EmployeeName,

ApplyMap('NameCountryMap', ApplyMap('CodeNameMap', EmployeeID, 'Employee not Found'), 'Country not Found') AS EmployeeCountry

FROM EmployeesSource;

The first ApplyMap can be used as the parameter to the second ApplyMap. Nesting ApplyMap() functions may affect your load time, but it's sometimes easier thatn doing that further in the expression. And sometimes is faster to to a LEFT JOIN LOAD but it will depend on your data model and the analyses you want to carry out with those records.

Hope that helps.

Miguel Angel Baeyens

BI Consultant

Comex Grupo Ibérica