Join with wildcards

Mithril1991 · ‎2021-07-22

Hello,

I struggle with joining two tables containing wildcards. Here is sample table:

ClientID	SubID	Name	Category	Stream	CombinedID
2346	String1	Name1	Category1	Stream1	2346String1
2346	String2	Name1	Category2	Stream2	2346String2
2378	String3	Name2	Category2	Stream2	2378String3
2378	String4	Name2	Category3	Stream3	2378String4
2470	*	Name3	Category2	Stream3	2470*
2646	*	TBD	Category3	Stream3	2646*
*	*	TBD	Other	Other	**

What I need basically:

- If there is defined ClientID and SubID, use this as unique ID when joining towards final data table

- If there is Defined ClientID only, join only based on ID (There are multiple SubIDs that e.g. 2470 can get, but it will always be linked to same Name, Category and Stream)

- For other, not defined matches in the table (No hard linked Client ID and SubID) use Values "TBD; Other; Other".

I need this to implement into the load script.

The sample table is taken from Excel, so it can be dynamically changed in case some new relationship needs to be defined. Full dataset is growing steadily and currently has over 160k rows with at least 30 fields. Each month there is approximately 2000 more rows. This dataset contains the ClientID and SubID (which combination I've wanted to use as a primary key shown in the table as CombinedID). With this approach I got stuck because of the wildcard.

I've also tried pure joining (with "*" or with empty cells), but that ended in a lot of empty values.

I've also tried to create Cartesian product for each combination of ClientID and SubID, but that ended in over 18 million rows.

I've also tried Rob Wunderlich's approach, but that is pretty limited for my data set (as expression length can be only over 8k characters).

Thank you for any help or pointing towards direction

General Question