Time to Extract

Report Inappropriate Content · ‎2009-08-19

We are a daughter company that gets SAP tables nighly. We are being told that it is taking 12 hour to extract the data (perhap including other daughter companies too).

This sounds way to long. Can this really be? Any ideas on what may be going wrong?

Thanks,
Stephen

Report Inappropriate Content · ‎2009-08-19

It depends on what you are allowed to do - if you have access to the QlikView applications that are extracting the data you can run the script in debug mode: each step is displayed on the screen and you can see which ones are taking a long time.

You may be able to reduce this time by creating QVD files for the initial extract, then taking the data from these.

You will have to review and analyse what you are reading and make sure you are doing this as efficiently as possible.

It might be useful to ask how your parent company is running the data extract, as this might be something you can improve.

johnw · ‎2009-08-20

By "data extract" do you mean "running the reload script"?

I have at least one script that takes over an hour to run. If we had ten times the data volume (we're a relatively small company in the grand scheme of things), then it could easily take ten times as long. On the other hand, if we had ten times the data volume, we'd probably have ten times the processing power, and it would still take about an hour to extract.

I don't know much about SAP, but generally speaking, I would suggest that you only pull data from any given table or system once, and store it in a QVD. Make sure that you're reading using the right indexes, or whatever the fastest way to grab the source data should be. In some cases, you may want to add new indexes to the source tables to support faster data extraction.

Then the actual user applications would load only from QVDs, and not from SAP itself.

Then it's a matter of tuning all of your script in the applications. For large tables, you want to only load them once in the script, and to load using an optimized load. That puts some serious restrictions on the initial load. A technique I tend to use is this:

[Some Huge Table]:
LOAD * INLINE [
Some Field with the MOST Restrictive Values
Value 1
Value 2
Value 3
];
INNER JOIN ([Some Huge Table])
LOAD
"Some Field with the MOST Restrictive Values"
,other fields
FROM SomeHugeTable.qvd (QVD)
WHERE EXISTS("Some Field with the MOST Restrictive Values")
;
INNER JOIN ([Some Huge Table])
LOAD * INLINE [
Some Field with the Next Most Restrictive Values
Value A
Value B
Value C
];
INNER JOIN ([Some Huge Table])
"Unique Key Field"
,some expression as "Derived Field 1"
,some other expression as "Derived Field 2"
RESIDENT [Some Huge Table]
WHERE some final conditions
;

In my experience, this has usually been faster than putting the conditions and derived fields in the load of the QVD, as it allows for an optimized QVD load. The join at the end ends up being the slowest part, but it often seems to be faster to do the extra conditions and derived fields as a separate step. But definitely experiment, as what is fastest in one case isn't always fastest in another case. It may also differ between QlikView versions for all I know.

Report Inappropriate Content · ‎2009-08-20

John,

I am only refering to the SAP connector and the extraction of SAP tables (which are put to QVD's)

Stephen

disqr_rm · ‎2009-08-20

It depends on how the data is being extracted. If the query has a where clause on a HUGE table, and the table in SAP does not have an index on it, sure it can take longer. Bu no table should take 12 hours to extract unledd that table is trillions of the records. Which particular table you are extracting from SAP?

Related Topics