Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear All,
Hope all is well with everyone.
We are working on a Telecom Project and have to combine and save distinct Customer Information from the Customer Detail Files.
Each Day we receive the Customer Details files in csv format. CustomerDetail_YYYYMMDD.csv
ACCOUNTID, NAME, MSISDN, IMSI, STATUS
We filter the YYYYMMDD from the file and make a dimension FILE_DATE MM/DD/YYYY
Each day we get the the Customer Details along with any additions. What we want to do is to keep the earliest Customer Details.
For example -
We receive a file CustomerDetail_20160707.csv with the following information.
FILE_DATE,ACCOUNTID,NAME,MSISDN,IMSI,STATUS
7/7/2016,123,DAVE SMITH,123456789,8900987654321,ACTIVE
next day we receive CustomerDetail_20160708.csv with the following information.
FILE_DATE,ACCOUNTID,NAME,MSISDN,IMSI,STATUS
7/8/2016,123,DAVE SMITH,123456789,8900987654321,ACTIVE
Now on 7/8/2016 we want to have just one record of DAVE SMITH with the FILE_DATE being 7/7/2016
7/7/2016,123,DAVE SMITH,123456789,8900987654321,ACTIVE
Then on 7/9 we receive the following file
FILE_DATE,ACCOUNTID,NAME,MSISDN,IMSI,STATUS
7/9/2016,123,JOHN,123456789,8900987654321,ACTIVE
Notice that the name changed so on 7/9 we would like to have two records
7/7/2016,123,DAVE SMITH,123456789,8900987654321,ACTIVE
7/9/2016,123,JOHN,123456789,8900987654321,ACTIVE
The way we approach it is that everyday we receive the file and then concatenate with the previous file.
We make a KEY of ACCOUNTID & NAME & MSISDN & IMSI & STATUS and then group it to find the count. If the count is 1 then we know it is unique and we add the new date to it.
Want to find out if there is another way we could approach it? because the customer details is growing to more than 2million records and the grouping is taking long time.
Will appreciate if someone can assist.
Thanks.
Maybe create a KEY made of ACCOUNTID & NAME in your script and use a WHERE NOT EXISTS(KEY, ACCOUNTID & NAME) clause when appending your new records.
Dear Stephen
Thanks for the prompt response.
1. How would i add the new date for each new record?
2. How would i use the where not exists clause? After concatenating thw old and previous files?
Thanks
1. Isn't the date part of your input data / file name?
2. Assuming you first load your historic data from QVD, then concatenate your new data, then store all into the QVD, something like
Data:
LOAD * FROM Historic.qvd (qvd);
Let vToday = Date(Today(1), 'YYYYMMDD');
CONCATENATE (Data)
LOAD FILE_DATE,ACCOUNTID,NAME,MSISDN,IMSI,STATUS, ACCOUNTID & NAME as KEY
FROM CustomerDetail_$(vToday).csv
(...)
WHERE NOT EXISTS(KEY, ACCOUNTID & NAME);
STORE Data INTO Historic.qvd (qvd);
Stephen
If i use where not exists on key and accountid&name then will it filter the distinct dates as per the requirement. Meaning if all the fields are the same between two dates then it will take only the min date?
Also can you tell me a bit about where not exists. We have two values in it - key and accountid&name. Howbdoes it work? It looks for the key in the first table (before concat) and then compares it with accountid&name in the second (after concat) ?
Thanks.
Exists() basically checks the values already loaded into the specified field / symbol table so far (including the current LOAD statement).
So LOAD order of your records is important, you need to load the dates / files in chronological order so that only the first / earliest date is kept.
Exists() can be used with one argument, a field name, then it checks the current field value against the loaded field values so far.
If you use Exists() with two arguments, first denotes the field / symbol table to check, second denotes an expression that creates the values you want to check. You would need this when you are building a key field that does not exists in your input table, only in your output table.