Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
derekjones
Creator III
Creator III

Is it best to load then manipulate?

I have a question for any of you experienced Qlikview users...

I have a sales extract with around 4 million records that I need to load into my app.

Is it best to..

1) Load it into the app first as a temp table then manipulate any fields e.g. Using Mid() to strip out month and year from date or If(...) on fields that need data changing on certain conditions by creating a new table based on the temp table then dropping that temp table

2) Load it into an app first and create a QVD, then use that QVD to load it again and do the manipulations on that load.

3) Do any manipulations in the load script from the original sales extract 

Common sense would say option 3 would be slow as you are trying to manipulate data as you are loading it from a text file. Option 2 would be more efficient, but you are having to save data to a qvd which slows it down and uses up unecessary server space and option 1 would be the most efficient as you are loading the raw data into the app, then using this "more accessible" table to then do your manipulations on.

I've tested 1 and 3 and there appears to be very little difference in load times, can anyone advise if this is the case. Which option in an experienced Qlikviewer mind would be best way to go?

4 Replies
Miguel_Angel_Baeyens

Hello Derek,

What I usually do is to do all the transformations required by your analyses and charts in the load that precedes the SQL statement that actually retrieves from the database, so when you store into the QVD file all data is clean and with proper formats.

This way you can load the optimized QVD files in different applications that is the faster way to load data into QlikView.

It may take longer but the code is cleaner and you will always know that the issue is always in the first step (loading from the SQL) and easier to correct, to add new fields or anything.

Hope that makes sense.

Miguel Angel Baeyens

BI Consultant

Comex Grupo Ibérica

nagaiank
Specialist III
Specialist III

I always try to do all manipulations, calculations which are not dependent on the user input in the load script. This improves response time.

derekjones
Creator III
Creator III
Author

Thank you all. It appears I still have to load the table twice anyway as I have to manipulate one field's data on load to then use that "cleansed data" against a mapping table to load in an additional field. Do you have to load the whole lot again on a mapping load on a field that's being manipulated? A simplified example is below..

My Mapping table is grouping Teams (TeamName) into groups of Teams (TopTeam) which I have in Excel which I load in:

Top_Team_Map:

MAPPING LOAD

TeamName,

TopTeam

FROM $(vAppDataDir)TOP_TEAM_MAP.xls (biff, embedded labels, table is Sheet1$); 

I load my sales data in and manipulate TeamName to get rid of any inconsistencies...

TmpSales:

LOAD
     
customer_id as CustomerID,

if(team_now = 'TeamDead’, team_then , team_now) as TeamName

FROM

$(vDataDir)SALES.txt
(
txt, codepage is 1252, embedded labels, delimiter is '|');

I then have TeamName which I can use in my mapping, but I have to laod the whole table again...

Sales:

LOAD
     
CustomerID,

      TeamName,

      CustomerID &'~'& applymap('Top_Team_Map', TeamName, Null()) As TopTeamKey

Resident TmpSales;

Drop table TmpSales;

I can't easily put the if statement in the mapping as this is just a simplified example, my if's are quite a string in reality.

Any ideas?

My thanks in advance 
Miguel_Angel_Baeyens

Hi Derek,

It sometimes happens, that you have to load several times a table until it gets completely clean an prepared to be used in charts. Usually, this process performs better and your dashboards are rendered faster and your expressions simpler.

In regards to your question, as in any other LOAD you can use an If() within the mapping, or you can use two mapping tables. Heres hwo it will work with a very dummy example

CodeNameMap:

MAPPING LOAD Code,

     Name

FROM CodeSource;

NameCountryMap:

MAPPING LOAD Name,

     Country

FROM CountrySource;

EmployeeWithCountry:

LOAD *,

     ApplyMap('CodeNameMap', EmployeeID, 'Employee not Found') AS EmployeeName,

     ApplyMap('NameCountryMap', ApplyMap('CodeNameMap', EmployeeID, 'Employee not Found'), 'Country not Found') AS EmployeeCountry

FROM EmployeesSource;

The first ApplyMap can be used as the parameter to the second ApplyMap. Nesting ApplyMap() functions may affect your load time, but it's sometimes easier thatn doing that further in the expression. And sometimes is faster to to a LEFT JOIN LOAD but it will depend on your data model and the analyses you want to carry out with those records.

Hope that helps.

Miguel Angel Baeyens

BI Consultant

Comex Grupo Ibérica