Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
kelsie_muller
Partner - Contributor II
Partner - Contributor II

Loading only Duplicate values in a Load Script

Hello,

I am attempting to load only duplicate values in a large data set with the following query in Qlik Sense. The data load source is a Snowflake integration, so I have some limitations on the formatting of the query.

I'd like to only return rows where there is more than one value  in the SI_Master_S3_Key value. 

 

LIB CONNECT TO 'Snowflake hda76102';
select * from (

select clip_id
, max(supplier_name) as Owner
, max(clip_name) as Resource_Name
, max(concat(year(clip_ingested),'-',month(clip_ingested),'-',day(clip_ingested))) as Ingest_Date
, max(case when CLIP_DATA_NAME = 'SI.Master.S3.Bucket' then CLIP_DATA_VALUE else null end) as SI_Master_S3_Bucket
, max(case when CLIP_DATA_NAME = 'SI.Master.S3.Key' then CLIP_DATA_VALUE else null end) as SI_Master_S3_Key
, max(case when CLIP_DATA_NAME = 'SI.Master.Filename' then CLIP_DATA_VALUE else null end) as SI_Master_Filename
, max(case when CLIP_DATA_NAME = 'Format.FileSize' then CLIP_DATA_VALUE else null end) as Format_FileSize
, max(case when CLIP_DATA_NAME = 'SI.Master.Extension' then CLIP_DATA_VALUE else null end) as SI_Master_Extension
, max(case when CLIP_DATA_NAME = 'TE.OriginalName' then CLIP_DATA_VALUE else null end) as TE_OriginalName
from ANALYTICS.REPORTING.DMH_CLIP_DATA
where supplier_name = 'SPS'
group by clip_id
order by SI_Master_S3_Key
)xx;

Can you please help with this query? Alternatively if there is a way to filter a table to only display duplicate values on the whole data set - that would work as well. 

Labels (2)
4 Replies
kelsie_muller
Partner - Contributor II
Partner - Contributor II
Author

To clarify - I do not want to only load where there is more than one value - I would like to load only where there are more than one of the same value in the SI_Master_S3_Key value. 

steeefan
Luminary
Luminary

My suggestion is to leave the query as-is and to load the data into a QS table. Based on that table, you could do:

 

Duplicate:
NOCONCATENATE LOAD
  *
WHERE
  SI_Master_S3_Key_Count > 1;
LOAD
  SI_Master_S3_Key,
  COUNT(SI_Master_S3_Key) AS SI_Master_S3_Key_Count
RESIDENT
  S3Table
GROUP BY
  SI_Master_S3_Key;

 

This gives you a table containing all the values of SI_Master_S3_Key for which there is more than one entrie in your S3 table.

kelsie_muller
Partner - Contributor II
Partner - Contributor II
Author

Thank you for assisting, Steeefan!

Can you please provide the full load script? When I add that piece either before or after the existing portions of my load script I get an error:

The following error occurred:
Table 'S3Table' not found
steeefan
Luminary
Luminary

Sure thing: You have to first load the data you are getting from the SQL query into a Qlik Sense table, the one I named S3Table in my script.

S3Table:
NOCONCATENATE LOAD
  *;
SQL SELECT
  //..
)xx;

 Replace SELECT //..)xx; with your statement you posted above. After this block, place the script I posted earlier.