Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
gvasti
Partner - Contributor III
Partner - Contributor III

ETL with NOT IN

Hello guys,

I'm translating many queries and I have some doubts on etl strategy.

There are cases with nested subqueries where I must do the selections on fields on all used tables.

Here's an example:

SELECT

  table01.C_PDS,

  table01.VOLUME

FROM

  table01

WHERE

  (

  table01.C_CARRIER  =  @Carrier

  AND  table01.V_ANNO =    year(@StartDate)

  AND  table01.C_PDS  IN 

    (SELECT

        table02.PDS FROM

        table02

    WHERE

  (

  table02.DT_INI  <=  @StartDate

  AND  table02.DT_FINE  >=  @EndDate

  AND  table02.COD_TRASP  =  @Carrier

  AND  table02.PDS  NOT IN 

        (SELECT

            table03.C_PDR FROM

            table03

        WHERE

      (

      table03.V_ANNO = year(@StartDate)

      AND  table03  =  @Carrier

  )

)))))

@Carrier, @StartDate and @EndDate are the values that i'll use on the front end with set analysis, lists etc.

Please note that I'm working with large volumes of data and each table corresponds to a QVD file.

What's the best approach that you think?

Thanks in advance

10 Replies
hic
Former Employee
Former Employee

One way is to use temporary tables and the Exists() function, for example:

tmp3:

Load C_PDR FROM table03.qvd ;

tmp2:

Load PDS FROM table02.qvd

WHERE not Exists(C_PDR,PDS);

RealData:

Load ... FROM table01.qvd

WHERE Exists(PDS,C_PDS);

Drop Tables tmp3, tmp2;

// HIC

gvasti
Partner - Contributor III
Partner - Contributor III
Author

Thanks Henric.

I'm not all sure of this solution because there are the selections on table2 and table3 fields.

For example, the NOT IN on table3 probably requires a LEFT JOIN, but my qvd files with 500 milions of record slow very down.

hic
Former Employee
Former Employee

I don't see a join in your SQL statement...

gvasti
Partner - Contributor III
Partner - Contributor III
Author

But with NOT EXISTS how can I make selections on the fields in table02 and table03 ?

hic
Former Employee
Former Employee

If you want to make selections (i.e. clicking in list boxes in the UI) on the fields in table02 and table03, then you need to load these tables. But then I don't understand your question: Your SQL statement doesn't load these fields.

However, if you just want to set filters during the load sequence, you can do this just by adding conditions to the where clauses in my script.

HIC

gvasti
Partner - Contributor III
Partner - Contributor III
Author

Pardon, in my first post I wrote the word "values" in a wrong way:

@Carrier, @StartDate and @EndDate are just the fields that I must use in UI (list box ect).

My problem is: how do I load table01, table02 and table03 ?

How set the filters on UI will be another step.

hic
Former Employee
Former Employee

I would just load the three tables without any filters, making sure that the keys are OK:

table01:

Load

   C_PDS & '|' & V_ANNO as Key,

   C_PDS as PDS,

   V_ANNO,

   C_CARRIER,

   VOLUME,

   RecNo() as Table01_ID

From table01 ;

table02:

Load

   PDS,

   DT_INI,

   DT_FINE,

   COD_TRASP,

   RecNo() as Table02_ID

From table02 ;

table03:

Load

   C_PDR & '|' & V_ANNO as Key,

   RecNo() as Table03_ID

From table03 ;

I assume that the keys are PDS and V_ANNO.

Then you can take care of filtering in the UI.

HIC

gvasti
Partner - Contributor III
Partner - Contributor III
Author

Ok, I will load all the table with the correct keys PDS and C_PDR.

Then, how do I write the sum of VOLUME as expression in a pivot?

For example:

SELECT

  table01.C_PDS,

  sum(table01.VOLUME)

FROM

  table01

WHERE

  table01.C_CARRIER  =  @Carrier

  AND  table01.V_ANNO =    year(@StartDate)

  AND  table01.C_PDS  IN 

    (SELECT

        table02.PDS FROM

        table02

    WHERE

  table02.DT_INI  <=  @StartDate

  AND  table02.DT_FINE  >=  @EndDate

  AND  table02.COD_TRASP  =  @Carrier

)

group by table01.C_PDS

hic
Former Employee
Former Employee

You don't sum it in the script. Just use "Sum(VOLUME)" as expression in your pivot table.

HIC