Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All,
I have a quick data model question that should be rather easy. I can't think of an optimal solution. There is only 3 tables involved:
Accounts:
LOAD
AccountID,
ESFA_ID,
[Golf_Account?]
OtherAttributeDataFields
from AcctBLAH.qvd (qvd)
WHERE [Golf_Account?] = 'TrueDat'
;
GolfDataSnapshot:
LOAD
AccountID,
SnapShotDate,
GolfDataFields
from GolfBLAH.qvd (qvd);
ESFA_History:
LOAD
ESFA_ID,
SnapShotDate,
ESFADataFields
from ESFA_BLAH.qvd (qvd);
Part of the problem is that some of the attribute data that's in ESFA_History can be found in the other two tables as well. Accounts should have most of the attribute data for the other two tables in it. SnapShotDate is totally different values between GolfDataSnapshot and ESFA_History.
I was thinking the best approach would be to Concatenate GolfDataSnapshot and ESFA_History, but this doesn't allow for a good relationship between AccountID field and ESFA_ID field. However, both of these fields can be found in accounts. I'm afraid of using joins, as i don't want to cut out any snapshot dates (GolfDataSnapshot table currently has 1 snapshot, ESFA_History has several). Any Thoughts? The Delta Load for GolfDataSnapshot literally just concatenates onto itself every month with Date(Today()) as the definition of SnapShotDate field.
Thanks again,
Ron
Ron,
what I would do is exactly the same as you suggested yourself initially: masssage all the data so that it fits in a single table. You can do this by distilling a Mapping Table from Accounts that translates ESFA_ID into a single AccountID. Since AccountID is an index into the currently used CRM system, we'll keep that one and eliminate ESFA_ID (or turn it into a non-key field, just for reference).
Upon loading the ESFA_History table, translate the ESFA_ID into an AccountID using the mapping table and concatenate the rows to the GolfDataSnapShot, thereby renaming all corresponding ESFA_History fields.
The few exceptions (having two AccountID's) should be analyzed manually; you can then decide what to do about them: drop one (again using a mapping table that contains a list of "illegal" ESFA_ID's), merge information into a single row or something else. This is just a data quality phase and should be manageable.
Does this sound useful to you?
Best,
Peter
If there is there a 1:1 relationship between AccountID and ESFA_ID in Accounts? What is the nature of the data in ESFA_History and how does it relate to GolfDataSnapShot?
It's pretty difficult to assemble a workable data model without a single clue as to where you would like to get at.
Peter
Yes, there is a 1:1 relationship between AccountID and ESFA_ID in Accounts. Sorry for the lack of clarity. Essentially, ESFA_History is a decommissioned CRM system table and GolfDataSnapShot is a new, similar (probably not exact, but that's not anybodies problem but mine) table from SalesForce CRM. Some of the fields in ESFA_History, upon renaming, would contain values found in the other two tables. But, ESFA_History should be a historical version of GolfDataSnapShot.
Correction, there's a few ESFA_IDs with two AccountID's. I think they should be 1:1, though.
Ron,
what I would do is exactly the same as you suggested yourself initially: masssage all the data so that it fits in a single table. You can do this by distilling a Mapping Table from Accounts that translates ESFA_ID into a single AccountID. Since AccountID is an index into the currently used CRM system, we'll keep that one and eliminate ESFA_ID (or turn it into a non-key field, just for reference).
Upon loading the ESFA_History table, translate the ESFA_ID into an AccountID using the mapping table and concatenate the rows to the GolfDataSnapShot, thereby renaming all corresponding ESFA_History fields.
The few exceptions (having two AccountID's) should be analyzed manually; you can then decide what to do about them: drop one (again using a mapping table that contains a list of "illegal" ESFA_ID's), merge information into a single row or something else. This is just a data quality phase and should be manageable.
Does this sound useful to you?
Best,
Peter
Hi Peter,
This is a good solution and i think the best way to handle this situation properly. Upon further investigation, the data is in a great need of cleansing; more so than i expected. I'm dropping the duplicates and bad account names with WHERE clauses (where not wildmatch(Name, '*USE ACCOUNT BLAH*') or something like not wildmatch(Name, '*DELETED ACCOUNT*') etc. <-- this is the kind stuff i'm battling with) . There are too many to pick individual accounts out for a mapping table. By the way, would you please go further into the mapping table containing "illegal" ESFA_ID's? I can't envision how to script for that. It sounds useful for future endeavors if you wouldn't mind sharing with us.
-Ron
For example, imagine that an ESFA_ID should take the form ABCD1234, then something like this could do the job.
MapIDsToEliminate:
MAPPING LOAD *, 1 AS Flag INLINE [
ID
2015chck
NONSENSE
99999999
??Wait??
];
Transactions:
LOAD *
FROM ... (qvd)
WHERE Not applymap('MapIDsToEliminate', ESFA_ID, false());
The main advantage of Mapping Table vs (wild)match is that an INLINE source is easier to maintain and can be easily converted into an external source (if the illegal-ID-table keeps growing)
I recently used something similar to correct absurd transaction dates that had obviously been the result of typing errors (like Jun 6, 3012). This trick can be used when there aren't too many corrections to make.
Peter
Thanks!