Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
bclstu
Contributor
Contributor

tMap Inner Join with Date Range in Expr. Filter

I am having difficulty with the following use case.  In tMap an inner join with an expression filter on a date range on the input side and Catch lookup inner join reject = true on the output side is not getting picked up as an inner join reject.

 

Can someone recommend a work-around or alternative solution?

 

The use case is trying capture Late Arriving Dimensions on an Slowly Changing Type-2 dimension table. This requires a date range check to return the correct dim_id at the time of the transaction.

 

Below is a simplified version of what I need to do.  

 

Inputs:

 

trx.csv

trx_num

trx_dim_key

trx_date

101

aaa

2015-03-01

100

aaa

2017-02-01

 

dim.csv

dim_id

dim_key

eff_date

end_date

group_name

87

aaa

2016-01-01

2016-12-31

domestic-1

88

aaa

2017-01-01

2017-12-31

intl-a

89

aaa

2018-01-01

2018-12-31

domestic-2

 

I want the output file to contain each row from trx.csv where there is no row in dim.csv with a matching trx_dim_key = dim.dim_key AND the trx.trx_date is between dim.eff_date and dim.end_date.

 

Using the sample inputs above, the expected output from Catch lookup inner join reject is true should be:

trx_num

trx_dim_key

trx_date

101

aaa

2015-03-01

 

Instead, Iam getting this:

trx_num

trx_dim_key

trx_dt

101

aaa

2015-03-01

101

aaa

2015-03-01

101

aaa

2015-03-01

100

aaa

2017-02-01

100

aaa

2017-02-01

 

 

I have tried many ways to acomplish this in tMap without success.  What I expected to work is the following:

 

See Screenshots attached.

 

tMap:

 

Left (input) side:

 

Main = trx.csv (trx)

Lookup = dim.csv (dim)

 

Match Model: All matches

Join Model: Inner Join

 

Expression Key

Column

trx.trx_dim_key

dim_key

 

Expression Filter:

TalendDate.compareDate(trx.trx_date,dim_lookup.eff_date) >= 0 &&

TalendDate.compareDate(trx.trx_date,dim_lookup.end_date) < 0

 

Middle (variables): empty

 

Right (output) side:

Catch output reject: false

Catch lookup inner join reject: true

 

Note that the Match Model is All matches, as the trx_date can be an historic transaction or a recent transaction, so a match model of first or unique will not work in this scenario.

 

The expected output using the sample inputs above is:

trx_num

trx_dim_key

trx_date

101

aaa

2015-03-01

 

 

It seems to be ignoring the input Expression Filter as part of the INNER JOIN and that is why I believe this is a bug.

 

To prove to myself that the input Expression Filter is considered part of the inner join, I modified the above  test as follows:

 

Added another row to trx.csv with trx_dim_key = 'bbb' (which is not in dim.csv)

Removed the join criteria from Expr. Key / Column

Replaced the input Expr. filter with the join criteria:          trx.trx_dim_key.equals(dim_lookup.dim_key)

 

If the input Expr Filter is not considered part of the inner join, then no rows should have been caught by the inner join reject, as there is no join criteria (except what is in the Expr. Filter). But, in this case Talend did consider the Expr. Filter in the inner join logic and rejects the non-matching rows, but when I used a more complex Expr. Filter with the date range logic, it does not reject the non-matching row.

 

Any thoughts are appreciated.

 

Thanks.

 

 

 

 

Labels (3)
8 Replies
billimmer
Creator III
Creator III

the results are correct for your current tmap settings.   You have a one to many relationship for your join condition.  You don't want this.  So in tmap change the match model from "unique match" to "first match".

 

bclstu
Contributor
Contributor
Author

Thank you for the reply.
If it is using the date compare logic in the input side Expr. Filter, then it is a one-to-one (or one-zero) relationship. In the example input files I provided, you can see that only one row should be returned based on the dim_key and the trx_date between the dim effective and end dates, as coded in the Expr Filter.
My understanding (and testing) has shown that the only difference between the Unique Match and the First Match is that the Unique match only looks at the last matching row (from the lookup) and first Match only looks at the first matching row. Running this same test with First Match returns the same incorrect result set (below).
Only the row with trx_num = 101 should be returned as the trx date of is not within the eff/end dates in the dim file. Trx Num 100 has a trx date between the eff/end of dim_id 88, so this should not have been returned as it meets the join criteria (and should not be an inner join reject.)
Note that the example was carefully chosen so that the trx date of trx_num 100 is in the middle row in the dim file. Therefore, it is neither the First or Last (Unique) row. This is a real-world scenario, as the transactions can be historic so the trx_date may not fall within the first or last eff/end range for the Dim
key.

Output from running the test with First Match. Still not correct.
trx_num,trx_dim_key,trx_dt
101,aaa,2015-03-01
100,aaa,2017-02-01

Again, this is why I feel it is a bug in the tMap Expr. Filter. With the filter expression provided, one row from the transaction (main) should be a 1:1 or 1:0 relationship with lookup file (dim). When 1:0, it should be picked up by the Catch Lookup Inner Joint Reject = true


bclstu
Contributor
Contributor
Author

removed duplicate post.

bclstu
Contributor
Contributor
Author

removing duplicate post.

ThWabi
Creator II
Creator II

Hello bclstu,

 

I had some difficulties with dates in join conditions myself, similar to your question also with date ranges. I had tried to use the expression filter on the input side (lookup) as well and it had not worked as expected. (I suspect that this expression filter is applied only once when creating the lookup cache data and not for each main input row. I use "load once" for lookups, not "load at each row".)

So I have moved such expression filters or comparisons to the "var" panel in the middle. I filter their results on the output side on the right.

On the input side, keep the (inner) join on the dim key and the all matches setting. On the output side, do not use "inner join reject", instead filter according to the expression results from the "var" panel.

Hope this is helpful or gives you some ideas to try.

 

Best regards,

 

Thomas

 

bclstu
Contributor
Contributor
Author

Hi Thomas,

 

Thanks for the reply.  I tried submitting the issue to Talend Support as a bug, but after about 1 week the response was that they do not think it is a bug, and, as you said, the input filter is applied to the results of the inner join.  Talend's tMap documentation is ambiguous as to how the input filter is applied.

 

I have tried filtering on the output side, but no luck. I also tried playing around with the 'catch output reject' (true/false) setting,  without success.

 

It's frustrating - if both input sources (trx and dim) are relational tables, it is a fairly basic sql query, something like this:

 

select * from trx where not EXISTS
(select 1 from dim where trx.trx_dim_key = dim.dim_key
and trx.trx_date between dim.eff_date and dim.end_date);

 

I did come up with a work-around but it is convoluted and and a drag on performance. 

1. Capture the 'good' transactions (key match and in date range)  in a temp file via the tMap and and output filter

2. Re-process the trx file and inner join with the saved temp file as the lookup, with capture inner join reject = true.  

 

So the large trx file has to be processed twice.  If you have many Type 2 Slowly Changing Dimensions, and need to capture Late Arriving Dimension members for them, this gets unwieldy quickly.

 

I have asked Talend support to provide an alternate solution.  I will post back if I get a good solution.

 

ThWabi
Creator II
Creator II

Hi bclstu,

please see the attached job for an example of how I would solve it. Feel free to adapt it to your needs.

Best regards,

Thomas

 


Join_and_Date_Range.zip
bclstu
Contributor
Contributor
Author

Hi Thomas,

 

Thanks for putting that together.  I appreciate it!

 

First glance, I'm thinking that I can get the unique set of trx_dim_key and trx_date rows from the date_not_ok output and run it through another tMap inner join against the row_ok output and capture inner join rejects.  This along with the rejected_dim_key  output provides the set of data needed.  Still not a clean process (similar to my current work-around solution) .  I'm still considering if it will perform better than my current work-around, which re-joins the date_not_ok output against the initial trx file.  

 

Thanks again.

bclstu