Consider this and where the KEEP statement is. This produces a table with 2.2million rows which "seems" correct.
If however I move the keep statement to the final load statement I get circa 750k rows.
IN the UI this results in slightly different claim_counters.
For the life of me I cannot think why this would be different. As you would expect placing the keep on the first load statement would simply mean that all rows would be dropped from T_CLAIM2 before the calculated fields were derived, whereas placing it last would mean the calculations would simply be done on all rows and then subsequently dropped.
IF([Actual Date] = 'NA','$(vNaDesc)',IF(LEN([Actual Date])>2,'$(vClaimDesc)',IF(LEN([Measure Date])=0 OR [Measure Date] >=TODAY(),'$(vExpDesc)','$(vOsDesc)'))) AS [Measure Type]
IF(LEN([Actual Date])>=2,[Actual Date],IF(PTYPE = 'IS Delivery' AND (msname = 'pre-production complete' OR msname = 'feasibility study required'),[Forecast Date],[Expected Date])) AS [Measure Date]
AUTONUMBER(msname,'%_MSID') AS %_MSID,
1 AS #_CLAIM_COUNTER
DROP TABLE T_CLAIMS;
Does the D_PROJECT table have [Measure Type] or [Measure Date] as field? If it does, then the keep will also take place based on whichever of the two or both fields which may cause more rows to drop.
Hi Sunny thanks, I should have mentioned the shared key is %_PROJECTID none of the other fields exist in D_PROJECT so it should only associate on that field.
Maybe worth mentionioning that their is a one to many relationship bettern D_PROJECT and T_CLAIMS. But that shouldn't make any difference in my SQL brain anyway.
If that is the case, then it is strange that your are seeing this issue... I would have expected to see same number of rows regardless of where you put the Inner Keep. Also, 1-N shouldn't matter
How many rows do you get for this table if you remove the KEEP completely?
I would assume that you should get a correct result with the KEEP placed in front of the topmost LOAD, and that it is ignored when you put it to the bottom LOAD.
Maybe hic can comment on what is expected here better than anyone of us.
Stefan thanks for the suggestion and the results are interesting. Seems having no Keep produces the same results as the lowest KEEP, so in other words the lowest KEEP is not doing anything.
But what this also tells me is that there are 8 PROJECT_IDSwhich are not shared in both table, i.e when i use the highest KEEP I lose 8 project. However this loss of 8 projects results in a reduction of 1.5million rows in T_CLAIMS. Which makes even less sence as the ratio between project in D_PROJECT and T_CLAIMS cannot be more than about 1/200. So I would expect to lose no more than 1600 rows from T_CLAIMS.
Still confused I know it doesnt matter if I have erroneous rows but ascitically its horrible having orphaned data.
Thanks for confirming. But that still doesnt explain why I am loosing more rows from the fact table in relationship to the dim table when I apply it in the correct place.
Think I need to back to source and check my expected results.