Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a dashboard that identifies duplicate records for data integrity cleanup.
The source data does not currently have a date stamp but I would like to show trending over time. Therefore, I want additional files to be appended to the data already there. I DO NOT want duplicates removed. This doesn't seem to work though because After the second load, I only see today's date and the 5/15/17 records are removed. Not sure what's wrong.
On the first load, my script looks like:
DUPE_PROV_RECORDS:
LOAD Npi_of_providers as NPI,
Lastname_of_providers,
Firstname_of_providers,
Middlename_of_providers,
Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers as FULL_NAME,
Npi_of_providers & ' ' & Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers & ' ' & Specialty_rtk_of_providerspecialties as PROVIDER,
Displaydegrees_short_of_providers as CREDENTIALS,
Specialty_rtk_of_providerspecialties as SPECIALTIES,
Entity_k_of_entityassociations as FACILITY,
date('5/15/2017') as ETL_DATE
// date(Now()) as ETL_DATE
FROM
[..\External_Data\Duplicate Specialty Listing w NPI.xlsx]
(ooxml, embedded labels, table is Sheet1);
Subsequent loads would use:
DUPE_PROV_RECORDS:
ADD LOAD Npi_of_providers as NPI,
Lastname_of_providers,
Firstname_of_providers,
Middlename_of_providers,
Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers as FULL_NAME,
Npi_of_providers & ' ' & Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers & ' ' & Specialty_rtk_of_providerspecialties as PROVIDER,
Displaydegrees_short_of_providers as CREDENTIALS,
Specialty_rtk_of_providerspecialties as SPECIALTIES,
Entity_k_of_entityassociations as FACILITY,
//date('5/15/2017') as ETL_DATE
date(Now()) as ETL_DATE
FROM
[..\External_Data\Duplicate Specialty Listing w NPI.xlsx]
(ooxml, embedded labels, table is Sheet1);
DUPE_PROV_RECORDS:
LOAD Npi_of_providers as NPI,
Lastname_of_providers,
Firstname_of_providers,
Middlename_of_providers,
Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers as FULL_NAME,
Npi_of_providers & ' ' & Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers & ' ' & Specialty_rtk_of_providerspecialties as PROVIDER,
Displaydegrees_short_of_providers as CREDENTIALS,
Specialty_rtk_of_providerspecialties as SPECIALTIES,
Entity_k_of_entityassociations as FACILITY,
date('5/15/2017') as ETL_DATE
// date(Now()) as ETL_DATE
FROM
[..\External_Data\Duplicate Specialty Listing w NPI.xlsx]
(ooxml, embedded labels, table is Sheet1);
concatenate
DUPE_PROV_RECORDS:
ADD LOAD Npi_of_providers as NPI,
Lastname_of_providers,
Firstname_of_providers,
Middlename_of_providers,
Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers as FULL_NAME,
Npi_of_providers & ' ' & Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers & ' ' & Specialty_rtk_of_providerspecialties as PROVIDER,
Displaydegrees_short_of_providers as CREDENTIALS,
Specialty_rtk_of_providerspecialties as SPECIALTIES,
Entity_k_of_entityassociations as FACILITY,
//date('5/15/2017') as ETL_DATE
date(Now()) as ETL_DATE
FROM
[..\External_Data\Duplicate Specialty Listing w NPI.xlsx]
(ooxml, embedded labels, table is Sheet1);
That fails for me. Here is the app.
I can't reload it without sample input files, don't understand, why you use twice:
[..\External_Data\Duplicate Specialty Listing w NPI.xlsx] ?!?
Shouldn't that be 2 different inputs?
I'm not loading it twice. The first time I hard code the date. After that, the date is based on the current date.
could you provide your excel file with some dummy data?
So your plan is you load it everyday with a partial reload with the current date?
I am trying to add trending and since I don't have a date field in the file, I need to create one during load. Loads will be every week or so. I am trying to add all new rows from the new files to the current tables to see the trend on changes to the number of issues.
OK now I understand...
Step one you have to save an initial version:
DUPE_PROV_RECORDS:
LOAD Npi_of_providers as NPI,
Lastname_of_providers,
Firstname_of_providers,
Middlename_of_providers,
Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers as FULL_NAME,
Npi_of_providers & ' ' & Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers & ' ' & Specialty_rtk_of_providerspecialties as PROVIDER,
Displaydegrees_short_of_providers as CREDENTIALS,
Specialty_rtk_of_providerspecialties as SPECIALTIES,
Entity_k_of_entityassociations as FACILITY,
date('5/15/2017') as ETL_DATE
FROM
[.\Duplicate Specialty Listing w NPI.xlsx] (ooxml, embedded labels, table is Sheet1);
STORE DUPE_PROV_RECORDS into DUPE_PROV_RECORDS.qvd (qvd) ;
Then you created an initial qvd, after that you change your script and add data day by day like that:
DUPE_PROV_RECORDS:
LOAD * from DUPE_PROV_RECORDS.qvd (qvd);
Concatenate
LOAD Npi_of_providers as NPI,
Lastname_of_providers,
Firstname_of_providers,
Middlename_of_providers,
Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers as FULL_NAME,
Npi_of_providers & ' ' & Lastname_of_providers & ', ' & Firstname_of_providers & ' ' & Middlename_of_providers & ' ' & Specialty_rtk_of_providerspecialties as PROVIDER,
Displaydegrees_short_of_providers as CREDENTIALS,
Specialty_rtk_of_providerspecialties as SPECIALTIES,
Entity_k_of_entityassociations as FACILITY,
Date(Today(1)) as ETL_DATE
FROM [.\Duplicate Specialty Listing w NPI.xlsx] (ooxml, embedded labels, table is Sheet1);
STORE DUPE_PROV_RECORDS into DUPE_PROV_RECORDS.qvd (qvd);
The date is still overwriting the original date. Not sure why. I want it to add, not overwrite.
I just tried it, for me it works...
you may use Step1.qvw only once.