Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello guys,
I'm translating many queries and I have some doubts on etl strategy.
There are cases with nested subqueries where I must do the selections on fields on all used tables.
Here's an example:
SELECT
table01.C_PDS,
table01.VOLUME
FROM
table01
WHERE
(
table01.C_CARRIER = @Carrier
AND table01.V_ANNO = year(@StartDate)
AND table01.C_PDS IN
(SELECT
table02.PDS FROM
table02
WHERE
(
table02.DT_INI <= @StartDate
AND table02.DT_FINE >= @EndDate
AND table02.COD_TRASP = @Carrier
AND table02.PDS NOT IN
(SELECT
table03.C_PDR FROM
table03
WHERE
(
table03.V_ANNO = year(@StartDate)
AND table03 = @Carrier
)
)))))
@Carrier, @StartDate and @EndDate are the values that i'll use on the front end with set analysis, lists etc.
Please note that I'm working with large volumes of data and each table corresponds to a QVD file.
What's the best approach that you think?
Thanks in advance
One way is to use temporary tables and the Exists() function, for example:
tmp3:
Load C_PDR FROM table03.qvd ;
tmp2:
Load PDS FROM table02.qvd
WHERE not Exists(C_PDR,PDS);
RealData:
Load ... FROM table01.qvd
WHERE Exists(PDS,C_PDS);
Drop Tables tmp3, tmp2;
// HIC
Thanks Henric.
I'm not all sure of this solution because there are the selections on table2 and table3 fields.
For example, the NOT IN on table3 probably requires a LEFT JOIN, but my qvd files with 500 milions of record slow very down.
I don't see a join in your SQL statement...
But with NOT EXISTS how can I make selections on the fields in table02 and table03 ?
If you want to make selections (i.e. clicking in list boxes in the UI) on the fields in table02 and table03, then you need to load these tables. But then I don't understand your question: Your SQL statement doesn't load these fields.
However, if you just want to set filters during the load sequence, you can do this just by adding conditions to the where clauses in my script.
HIC
Pardon, in my first post I wrote the word "values" in a wrong way:
@Carrier, @StartDate and @EndDate are just the fields that I must use in UI (list box ect).
My problem is: how do I load table01, table02 and table03 ?
How set the filters on UI will be another step.
I would just load the three tables without any filters, making sure that the keys are OK:
table01:
Load
C_PDS & '|' & V_ANNO as Key,
C_PDS as PDS,
V_ANNO,
C_CARRIER,
VOLUME,
RecNo() as Table01_ID
From table01 ;
table02:
Load
PDS,
DT_INI,
DT_FINE,
COD_TRASP,
RecNo() as Table02_ID
From table02 ;
table03:
Load
C_PDR & '|' & V_ANNO as Key,
RecNo() as Table03_ID
From table03 ;
I assume that the keys are PDS and V_ANNO.
Then you can take care of filtering in the UI.
HIC
Ok, I will load all the table with the correct keys PDS and C_PDR.
Then, how do I write the sum of VOLUME as expression in a pivot?
For example:
SELECT
table01.C_PDS,
sum(table01.VOLUME)
FROM
table01
WHERE
table01.C_CARRIER = @Carrier
AND table01.V_ANNO = year(@StartDate)
AND table01.C_PDS IN
(SELECT
table02.PDS FROM
table02
WHERE
table02.DT_INI <= @StartDate
AND table02.DT_FINE >= @EndDate
AND table02.COD_TRASP = @Carrier
)
group by table01.C_PDS
You don't sum it in the script. Just use "Sum(VOLUME)" as expression in your pivot table.
HIC