I am also trying to understand what Direct Discovery does under the bonnet.
Empirically I reckon that this from your script :
Actually executes this SQL against your database :
Would you be able to execute that directly against your database, i.e. not via QlikView, and advise if that also takes a loooooong time like 25-30 minutes and also how many rows it returns.
Best Regards, Bill
I spoke to some folks from QlikView. They explained that indeed the Direct Discovery statement I have in my QVW will generate a select distinct SQL call to the DB for the dimension items I've defined. That said, I talked to our DBA to determine how the DB was setup for this large fact table (19B row). As per the available documentation, QlikView performance is dependent on the performance of the DB.
The DBA explained the DB table is partioned based on the REVENUE_YR_MNTH_NBR (it's an Oracle DB) with sub-partitions on another dimension (which I wasn't using) and that I should make sure whatever query I come up with leverages availble partions. The DBA went into some detail about the built-in parallelism Oracle has for performance/optimization/etc.
After talking with the DBA, I added an additional dimension and filter condition to my Direct Query statement in an attempt to leverage the sub-partitions when QV actually runs a query. I also constrained the data to a (much) smaller date rage. It is still slow (in my opinion), but after running queries manually against the source DB, I think this is as fast as I can go.
I plan to gradually widen the filter critera an determine how far I can offload data association to Direct Discovery and if it's worth using Direct Discovery or if I should find an alternate approach using aggregated data and document chaining (perhaps in combination with Direct Discovery) or something like that.
I don't have anything working yet, but I wanted to post an update based on what I know thus far. I plan to update this thread as I learn more.