Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I struggle with joining two tables containing wildcards. Here is sample table:
ClientID | SubID | Name | Category | Stream | CombinedID |
2346 | String1 | Name1 | Category1 | Stream1 | 2346String1 |
2346 | String2 | Name1 | Category2 | Stream2 | 2346String2 |
2378 | String3 | Name2 | Category2 | Stream2 | 2378String3 |
2378 | String4 | Name2 | Category3 | Stream3 | 2378String4 |
2470 | * | Name3 | Category2 | Stream3 | 2470* |
2646 | * | TBD | Category3 | Stream3 | 2646* |
* | * | TBD | Other | Other | ** |
What I need basically:
- If there is defined ClientID and SubID, use this as unique ID when joining towards final data table
- If there is Defined ClientID only, join only based on ID (There are multiple SubIDs that e.g. 2470 can get, but it will always be linked to same Name, Category and Stream)
- For other, not defined matches in the table (No hard linked Client ID and SubID) use Values "TBD; Other; Other".
I need this to implement into the load script.
The sample table is taken from Excel, so it can be dynamically changed in case some new relationship needs to be defined. Full dataset is growing steadily and currently has over 160k rows with at least 30 fields. Each month there is approximately 2000 more rows. This dataset contains the ClientID and SubID (which combination I've wanted to use as a primary key shown in the table as CombinedID). With this approach I got stuck because of the wildcard.
I've also tried pure joining (with "*" or with empty cells), but that ended in a lot of empty values.
I've also tried to create Cartesian product for each combination of ClientID and SubID, but that ended in over 18 million rows.
I've also tried Rob Wunderlich's approach, but that is pretty limited for my data set (as expression length can be only over 8k characters).
Thank you for any help or pointing towards direction