Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Data Model Scripting Question

Hello All,

I have a quick data model question that should be rather easy. I can't think of an optimal solution. There is only 3 tables involved:

Accounts:

LOAD 

AccountID,

ESFA_ID,

[Golf_Account?]

OtherAttributeDataFields

from AcctBLAH.qvd (qvd)

WHERE [Golf_Account?] = 'TrueDat'

;

GolfDataSnapshot:

LOAD

AccountID,

SnapShotDate,

GolfDataFields

from GolfBLAH.qvd (qvd);

ESFA_History:

LOAD

ESFA_ID,

SnapShotDate,

ESFADataFields

from ESFA_BLAH.qvd (qvd);

Part of the problem is that some of the attribute data that's in ESFA_History can be found in the other two tables as well. Accounts should have most of the attribute data for the other two tables in it. SnapShotDate is totally different values between GolfDataSnapshot and ESFA_History.

I was thinking the best approach would be to Concatenate GolfDataSnapshot and ESFA_History, but this doesn't allow for a good relationship between AccountID field and ESFA_ID field. However, both of these fields can be found in accounts. I'm afraid of using joins, as i don't want to cut out any snapshot dates (GolfDataSnapshot table currently has 1 snapshot, ESFA_History has several). Any Thoughts? The Delta Load for GolfDataSnapshot literally just concatenates onto itself every month with Date(Today()) as the definition of SnapShotDate field.

Thanks again,

Ron

1 Solution

Accepted Solutions
Peter_Cammaert
Partner - Champion III
Partner - Champion III

Ron,

what I would do is exactly the same as you suggested yourself initially: masssage all the data so that it fits in a single table. You can do this by distilling a Mapping Table from Accounts that translates ESFA_ID into a single AccountID. Since AccountID is an index into the currently used CRM system, we'll keep that one and eliminate ESFA_ID (or turn it into a non-key field, just for reference).

Upon loading the ESFA_History table, translate the ESFA_ID into an AccountID using the mapping table and concatenate the rows to the GolfDataSnapShot, thereby renaming all corresponding ESFA_History fields.

The few exceptions (having two AccountID's) should be analyzed manually; you can then decide what to do about them: drop one (again using a mapping table that contains a list of "illegal" ESFA_ID's), merge information into a single row or something else. This is just a data quality phase and should be manageable.

Does this sound useful to you?

Best,

Peter

View solution in original post

7 Replies
Peter_Cammaert
Partner - Champion III
Partner - Champion III

If there is there a 1:1 relationship between AccountID and ESFA_ID in Accounts? What is the nature of the data in ESFA_History and how does it relate to GolfDataSnapShot?

It's pretty difficult to assemble a workable data model without a single clue as to where you would like to get at.

Peter

Anonymous
Not applicable
Author

Yes, there is a 1:1 relationship between AccountID and ESFA_ID in Accounts. Sorry for the lack of clarity. Essentially, ESFA_History is a decommissioned CRM system table and GolfDataSnapShot is a new, similar (probably not exact, but that's not anybodies problem but mine) table from SalesForce CRM. Some of the fields in ESFA_History, upon renaming, would contain values found in the other two tables. But, ESFA_History should be a historical version of GolfDataSnapShot.

Anonymous
Not applicable
Author

Correction, there's a few ESFA_IDs with two AccountID's. I think they should be 1:1, though.

Peter_Cammaert
Partner - Champion III
Partner - Champion III

Ron,

what I would do is exactly the same as you suggested yourself initially: masssage all the data so that it fits in a single table. You can do this by distilling a Mapping Table from Accounts that translates ESFA_ID into a single AccountID. Since AccountID is an index into the currently used CRM system, we'll keep that one and eliminate ESFA_ID (or turn it into a non-key field, just for reference).

Upon loading the ESFA_History table, translate the ESFA_ID into an AccountID using the mapping table and concatenate the rows to the GolfDataSnapShot, thereby renaming all corresponding ESFA_History fields.

The few exceptions (having two AccountID's) should be analyzed manually; you can then decide what to do about them: drop one (again using a mapping table that contains a list of "illegal" ESFA_ID's), merge information into a single row or something else. This is just a data quality phase and should be manageable.

Does this sound useful to you?

Best,

Peter

Anonymous
Not applicable
Author

Hi Peter,

This is a good solution and i think the best way to handle this situation properly. Upon further investigation, the data is in a great need of cleansing; more so than i expected. I'm dropping the duplicates and bad account names with WHERE clauses (where not wildmatch(Name, '*USE ACCOUNT BLAH*') or something like not wildmatch(Name, '*DELETED ACCOUNT*') etc. <-- this is the kind stuff i'm battling with) . There are too many to pick individual accounts out for a mapping table. By the way, would you please go further into the mapping table containing "illegal" ESFA_ID's? I can't envision how to script for that. It sounds useful for future endeavors if you wouldn't mind sharing with us.

-Ron

Peter_Cammaert
Partner - Champion III
Partner - Champion III

For example, imagine that an ESFA_ID should take the form ABCD1234, then something like this could do the job.

MapIDsToEliminate:

MAPPING LOAD *, 1 AS Flag INLINE [

ID

2015chck

NONSENSE

99999999

??Wait??

];

Transactions:

LOAD *

FROM ... (qvd)

WHERE Not applymap('MapIDsToEliminate', ESFA_ID, false());

The main advantage of Mapping Table vs (wild)match is that an INLINE source is easier to maintain and can be easily converted into an external source (if the illegal-ID-table keeps growing)

I recently used something similar to correct absurd transaction dates that had obviously been the result of typing errors (like Jun 6, 3012). This trick can be used when there aren't too many corrections to make.

Peter

Anonymous
Not applicable
Author

Thanks!