Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
nicolas_martin
Partner - Creator II
Partner - Creator II

Perf problem with data discovery

Hello,

I have a VERY huge table stored in a Vertica instance (really fast for SELECT queries).

Basicly, I have the following fields:

- date

- store

- product

- price

Where I have for each store for each date for each product, a price.

This table is approximatively 150 GB.

I've loaded this table in QV (12.1) with a script like

my_table:

DIRECT QUERY

DIMENSION

year,

date,

store,

product

MEASURE

price

FROM my_table;

In QlikView, I've made a chart with dimension "Store" and "Product", and expression:

avg(price)

If I filter on "year = 2017", this charts takes minutess to compute.

In another chart, with the expression:

avg(

aggr(

if(product = 'My Favotite', avg(price))

, Store, Product)

)

to view only the prices of "My Favorite" product (something I would have done with a Set Analysis).

This chart also takes minutes (maybe 10) to compute.

In a query tool, when I submit the following query:

SELECT

store,

product,

avg(price)

FROM my_table

WHERE year = 2017

GROUP BY store, product;

it takes 2 seconds to give me the result.

And

SELECT

store,

product,

avg(price)

FROM my_table

WHERE year = 2017

AND product = 'My Favorite'

GROUP BY store, product;

less than 1 second.

I would like to understand why there is such a difference.

I didn't find where to set the parameter that creates a SQL.log and view the exact query.

1 Reply
Not applicable

Hi Nicolas,

I tried using direct discovery a few years back (on 11.2).  I found so many limitations that it was basically unusable in all but the most basic of scenarios.  It's a real shame, as it would be an exceptionally useful capability to have (had it been implemented properly). 

I can't remember the specifics (sorry, it was quite a few years ago now), but I found an enormous overhead similar to what you're experiencing in many cases.  I ran some traces at the time and I believe QV was running some "select distinct" queries on all the dimension fields on the script reload (which due to the underlying queries we had this was very inefficient for the specific use case I was prototyping).   From memory, I think it would run a select distinct using the entire query core, but then just using a select distinct for every single field marked as a dimension. 

So if you have

select

     dima

     ,dimb

     ,dimc

     ,metric

from

     [some really complex query]

qv would run 3 queries at reload time

select distinct dima

from

     [some really complex query]

select distinct dimb

from

     [some really complex query]

select distinct dimc

from

     [some really complex query]

There was no way to tell QV to avoid the complex query and just pick up the distinct list of dimension values from another more performant table/query.  This is just one example of the types of things I found.  Not specifically related to your problem by the sound of it, but just for background.

For your specific problems, I think your best bet would be to run some DB traces to understand where the issue is - you might be able to work around it depending on the specifics of your environment and use case, but expect it to be an uphill journey.

Regards,

Graeme