Solved: Binary Load then Reduce Data - Qlik Community

johnw · ‎2011-12-09

What is the cleanest way of doing a binary load, then automatically making selections and reducing the data?

I have an application that needs the complicated data model from another application, but only a small fraction of the data based on some simple selections. I solved the problem by reducing my data with a sequence of inner joins, like so:

INNER JOIN ([Products]) LOAD 'TFS' as "Product Group" AUTOGENERATE 1;

INNER JOIN ([Date Types]) LOAD 'Int Yields Plus Open TFS Batch' as "Date Type" AUTOGENERATE 1;

INNER JOIN ([Show Weights In]) LOAD dual('Tons',2000) as "Show Weights In" AUTOGENERATE 1;

INNER JOIN ([Yields]) LOAD DISTINCT "Product" RESIDENT [Products];

INNER JOIN ([Defect]) LOAD DISTINCT "Loss Reason Code" RESIDENT [Yields];

INNER JOIN ([Locations]) LOAD DISTINCT "Location" RESIDENT [Yields];

INNER JOIN ([Dates by Type]) LOAD DISTINCT "Recno" RESIDENT [Yields];

INNER JOIN ([Dates by Type]) LOAD DISTINCT "Date Type Number" RESIDENT [Date Types];

INNER JOIN ([Calendar]) LOAD DISTINCT "Date" RESIDENT [Dates by Type];

This works just fine, but it just seems like it shouldn't be this complicated given the various ways QlikView has of reducing data based on selections.

Written as a macro, I believe it would be this simpler and probably faster code:

sub ReduceData()
    activedocument.fields("Product Group").select "TFS"
    activedocument.fields("Date Type").select "Int Yields Plus Open TFS Batch"
    activedocument.fields("Show Weights In").select "Tons"
    activedocument.reducedata
end sub

But I can't run a macro OnPostReload in a server environment. I can supposedly execute a function during script execution, but that didn't work. I assume that's because none of the logic in the macro above makes any sense until the final data model has been built, so after script execution.

I can do a reduction using Publisher, but only the distributed file would be reduced. Explaining why this isn't good enough will take a little more detail. The full set of data is huge, reloads daily, and takes 5-10 minutes to binary load. My final user application requires additional QVDs that are refreshed hourly, so it must reload hourly. I can't afford to binary load for 5-10 minutes every hour. Instead, I have a middle QVW that loads daily after the huge data model, and does the reduction. Then the user application is only binary loading a small file hourly. It could not (?) binary load from the distributed file.

I could write all the original data model's tables out as QVDs, then do a bunch of where exists() loads, but that seems at least as complicated as the inner joins, plus keeps an extra copy of some very large tables of data.

Is there a cleaner way to reduce my data? Or are inner joins as good as it gets?

johnw · ‎2012-12-12

I can't say for sure what your best bet will be, but I ultimately stuck with the inner joins. I worked on a script to use an inline table to control the joins, but just writing them out ended up being simpler. Here's the whole actual script, comments included:

BINARY ..\Yields\Yields.qvw;

// This application only exists to support TFSYieldsAndInventory, which will binary load from it.
// The Yields binary load and then trimming it down for TFS integrated yields takes several minutes
// of time, which isn't something we want to do every hour. So instead, we'll load TFSYields nightly,
// and then the real application can do a binary load from there hourly. Since all the time is in the binary load,
// the normal duration calulation would be inaccurate.

DROP TABLES
[Show Coating In]
,[Threads]
,[Loss Reason Group]
,[Field View]
,[Fields on Report]
,[Dimensions]
,[Expressions]
,[Production]
,[Adjustment]
,[Non Prime]
,[Loss Analysis]
;
//LET dummy=ReduceData(); // NEED TO GET THIS WORKING BECAUSE IT IS *MUCH* SIMPLER
// probably not working because fields and selection don't work until data model is complete

// Reduce data to just TFS batch runs
INNER JOIN ([Products])        LOAD 'TFS' as "Sheet / Tin / TFS / HRP"              AUTOGENERATE 1;
INNER JOIN ([Date Types])      LOAD 'Int Yields Plus Open TFS Batch' as "Date Type" AUTOGENERATE 1;
INNER JOIN ([Show Weights In]) LOAD dual('Tons',2000) as "Show Weights In"          AUTOGENERATE 1;
INNER JOIN ([Yields])          LOAD DISTINCT "Product"          RESIDENT [Products];
INNER JOIN ([Defect])          LOAD DISTINCT "Loss Reason Code" RESIDENT [Yields];
INNER JOIN ([Locations])       LOAD DISTINCT "Location"         RESIDENT [Yields];
INNER JOIN ([Dates by Type])   LOAD DISTINCT "Recno"            RESIDENT [Yields];
INNER JOIN ([Dates by Type])   LOAD DISTINCT "Date Type Number" RESIDENT [Date Types];
INNER JOIN ([Calendar])        LOAD DISTINCT "Date"             RESIDENT [Dates by Type];

// On second thought, this whole loop approach is significantly more complicated than just writing them out.
//[Links]:
//LOAD concat(chr(39) & "From" & ',' & "To" & ',' & "Key" & chr(39),',',"Sequence") as "Links"
//;
//LOAD *
//,recno() as Sequence
//INLINE [
//From, To, Key
//Products, Yields, Product
//Yields, Defect, Loss Reason Code
//Yields, Locations, Location
//Yields, Dates by Type, Recno
//Date Types, Dates by Type, Date Type Number
//Dates by Type, Calendar, Date
//];
//LET vLinks = peek('Links');
//DROP TABLE [Links];
//
//FOR EACH vLink IN $(vLinks)
//    LET vFrom = subfield(vLink,',',1);
//    LET vTo   = subfield(vLink,',',2);
//    LET vKey = subfield(vLink,',',3);
//    INNER JOIN ([$(vTo)])
//    LOAD DISTINCT "$(vKey)"
//    RESIDENT [$(vFrom)];
//NEXT vLink
//
// Drop unnecessary fields
DROP FIELDS
"A1003?"
,"Activity Date"
//,"Activity ID"
,"Activity Type"
,"Actual Gauge"
,"Actual Width"
,"Aim Gauge"
,"Aim Width"
,"Bill Payer"
,"Catalog"
,"Charged Type"
,"Coating Weight"
,"Coil"
,"Coil Thread"
,"Consumption of Coating"
,"Cost Class"
,"Customer"
,"Customer Spec"
,"Customer Spec Desc"
,"Diverts Prime Reapplies"
,"Diverts Scrapped"
,"Diverts Secondary"
,"Diverts Secondary Reapplies"
,"Division"
,"Downgraded Coil Delay Minutes"
,"Downgraded Coil Feet"
,"Downgraded Coil Minutes"
,"Downgraded Coil Minutes with Delays"
,"Downgraded Coil Weight"
,"Downgraded Coils"
,"Line Consumption Feet"
,"Line Consumption of Coating"
,"Line Loss Line Scrap"
,"Line Loss Non Prime"
,"Line Loss Scrap"
,"Line Loss Scrapped Coils"
,"Line Production Feet"
,"Location Total"
,"Mill Source"
,"Minutes w/o Delays"
,"Non Prime Input Feet"
,"Non Prime Input of Coating"
,"Order Item"
,"Ordered Gauge"
,"Ordered Width"
,"Product Total"
,"Production Date"
,"Secondary Diverts Scrapped"
,"Secondary Diverts Upgrade"
,"Secondary Service Charge"
,"Secondary Service Charge Coating"
,"Secondary Write Offs"
,"Service Charge Coating"
,"Service Charge Divert to Non Prime"
,"Service Charge Line Scrap"
,"Service Charge Scrap Loss"
,"Service Charge Scrapped Coils"
,"Service Input of Coating"
,"Service Location"
,"Steel Grade"
,"Steel Type"
,"Temper"
,"Turns"
,"Write Offs Adds"
,"Write Offs Entry Rejects"
,"Write Offs Inactive"
,"Write Offs Weight Changes"
;

And here's the macro I tried to get working, but didn't get working:

sub ReduceData()

activedocument.fields("Product Group").select "TFS"

activedocument.fields("Date Type").select "Int Yields Plus Open TFS Batch"

activedocument.fields("Show Weights In").select "Tons"

activedocument.reducedata

end sub

View solution in original post

Oleg_Troyansky · ‎2011-12-09

John,

I've never tried it in the server environment, so it's more of a guess... I believe if you replace the macro with Actions and trigger them using "OnPostReload" event, you might be able to make your selections. Then, maybe you can fire another trigger by making a selection in a certain field and having a macro triggered off that selection, to actually perform the data reduction...

again, never tried it...

good luck!

Oleg

Oleg Troyansky
Call me for expert support, performance tuning, advanced training and more!

johnw · ‎2011-12-09

According to the reference manual:

The following triggers are not to be used in the QlikView Server environment, since

they lack meaning or may cause unexpected results:

OnPostReduceData

OnPostReload

On the other hand, I've been using OnOpen for years in the server environment despite a similar caution, and they only recently (?) updated the manual to indicate that OnOpen is now OK. So maybe OnPostReload actually works.

I'll give the trigger a shot.

hectorgarcia · ‎2011-12-09

John is correct.

the workaround I use a lot is ...

when I need to run macros on post reload y use the windows task manager using the command line reload and it works perfect!!!I hope it helps

Report Inappropriate Content · ‎2012-12-12

Bump.

John I am dealing with a very similar issue now. What did you end up doing in this regard? Are you still using OnOpen? Or are inner joins my best bet?

johnw · ‎2012-12-12

I can't say for sure what your best bet will be, but I ultimately stuck with the inner joins. I worked on a script to use an inline table to control the joins, but just writing them out ended up being simpler. Here's the whole actual script, comments included:

BINARY ..\Yields\Yields.qvw;

// This application only exists to support TFSYieldsAndInventory, which will binary load from it.
// The Yields binary load and then trimming it down for TFS integrated yields takes several minutes
// of time, which isn't something we want to do every hour. So instead, we'll load TFSYields nightly,
// and then the real application can do a binary load from there hourly. Since all the time is in the binary load,
// the normal duration calulation would be inaccurate.

DROP TABLES
[Show Coating In]
,[Threads]
,[Loss Reason Group]
,[Field View]
,[Fields on Report]
,[Dimensions]
,[Expressions]
,[Production]
,[Adjustment]
,[Non Prime]
,[Loss Analysis]
;
//LET dummy=ReduceData(); // NEED TO GET THIS WORKING BECAUSE IT IS *MUCH* SIMPLER
// probably not working because fields and selection don't work until data model is complete

// Reduce data to just TFS batch runs
INNER JOIN ([Products])        LOAD 'TFS' as "Sheet / Tin / TFS / HRP"              AUTOGENERATE 1;
INNER JOIN ([Date Types])      LOAD 'Int Yields Plus Open TFS Batch' as "Date Type" AUTOGENERATE 1;
INNER JOIN ([Show Weights In]) LOAD dual('Tons',2000) as "Show Weights In"          AUTOGENERATE 1;
INNER JOIN ([Yields])          LOAD DISTINCT "Product"          RESIDENT [Products];
INNER JOIN ([Defect])          LOAD DISTINCT "Loss Reason Code" RESIDENT [Yields];
INNER JOIN ([Locations])       LOAD DISTINCT "Location"         RESIDENT [Yields];
INNER JOIN ([Dates by Type])   LOAD DISTINCT "Recno"            RESIDENT [Yields];
INNER JOIN ([Dates by Type])   LOAD DISTINCT "Date Type Number" RESIDENT [Date Types];
INNER JOIN ([Calendar])        LOAD DISTINCT "Date"             RESIDENT [Dates by Type];

// On second thought, this whole loop approach is significantly more complicated than just writing them out.
//[Links]:
//LOAD concat(chr(39) & "From" & ',' & "To" & ',' & "Key" & chr(39),',',"Sequence") as "Links"
//;
//LOAD *
//,recno() as Sequence
//INLINE [
//From, To, Key
//Products, Yields, Product
//Yields, Defect, Loss Reason Code
//Yields, Locations, Location
//Yields, Dates by Type, Recno
//Date Types, Dates by Type, Date Type Number
//Dates by Type, Calendar, Date
//];
//LET vLinks = peek('Links');
//DROP TABLE [Links];
//
//FOR EACH vLink IN $(vLinks)
//    LET vFrom = subfield(vLink,',',1);
//    LET vTo   = subfield(vLink,',',2);
//    LET vKey = subfield(vLink,',',3);
//    INNER JOIN ([$(vTo)])
//    LOAD DISTINCT "$(vKey)"
//    RESIDENT [$(vFrom)];
//NEXT vLink
//
// Drop unnecessary fields
DROP FIELDS
"A1003?"
,"Activity Date"
//,"Activity ID"
,"Activity Type"
,"Actual Gauge"
,"Actual Width"
,"Aim Gauge"
,"Aim Width"
,"Bill Payer"
,"Catalog"
,"Charged Type"
,"Coating Weight"
,"Coil"
,"Coil Thread"
,"Consumption of Coating"
,"Cost Class"
,"Customer"
,"Customer Spec"
,"Customer Spec Desc"
,"Diverts Prime Reapplies"
,"Diverts Scrapped"
,"Diverts Secondary"
,"Diverts Secondary Reapplies"
,"Division"
,"Downgraded Coil Delay Minutes"
,"Downgraded Coil Feet"
,"Downgraded Coil Minutes"
,"Downgraded Coil Minutes with Delays"
,"Downgraded Coil Weight"
,"Downgraded Coils"
,"Line Consumption Feet"
,"Line Consumption of Coating"
,"Line Loss Line Scrap"
,"Line Loss Non Prime"
,"Line Loss Scrap"
,"Line Loss Scrapped Coils"
,"Line Production Feet"
,"Location Total"
,"Mill Source"
,"Minutes w/o Delays"
,"Non Prime Input Feet"
,"Non Prime Input of Coating"
,"Order Item"
,"Ordered Gauge"
,"Ordered Width"
,"Product Total"
,"Production Date"
,"Secondary Diverts Scrapped"
,"Secondary Diverts Upgrade"
,"Secondary Service Charge"
,"Secondary Service Charge Coating"
,"Secondary Write Offs"
,"Service Charge Coating"
,"Service Charge Divert to Non Prime"
,"Service Charge Line Scrap"
,"Service Charge Scrap Loss"
,"Service Charge Scrapped Coils"
,"Service Input of Coating"
,"Service Location"
,"Steel Grade"
,"Steel Type"
,"Temper"
,"Turns"
,"Write Offs Adds"
,"Write Offs Entry Rejects"
,"Write Offs Inactive"
,"Write Offs Weight Changes"
;

And here's the macro I tried to get working, but didn't get working:

sub ReduceData()

activedocument.fields("Product Group").select "TFS"

activedocument.fields("Date Type").select "Int Yields Plus Open TFS Batch"

activedocument.fields("Show Weights In").select "Tons"

activedocument.reducedata

end sub

victa001 · ‎2013-03-07

Simply put in Section Acces right after the binary load.

Section Access;

LOAD * INLINE [

ACCESS, USERID, PASSWORD, NTNAME, YEAR

ADMIN, *, *, *, 2013

];

Section Application;

Ps. Dont forget to check initial datareduction in de doc prop.

Carlo Vruwink