Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
deep2021
Creator III
Creator III

Alternative way to avoid duplicate records in qlik script

Hi,

Can you please suggest the alternative way for where not exist clause using concatenation to avoid duplicate records,

I am doing the calculations like below,

 

Document:
 
Load
distinct
              autonumber(CP_Key,'CP_Key') as CP_Key
             ,autonumber(CP_Key,'CP_Key_Temp') as CP_Key_Temp
             ,docType
             ,docPathPattern                                                                                      
             ,protocol
             ,location
             ,cutoff
             ,version
                                                                        


FROM [lib://MSS_MCA_XDS_Base_Folder/QVD/Extractor/2.1/Raw/teble1.qvd]
(qvd);

Concatenate(Document)

 $(tmp_statement)
load
distinct
              autonumber(CP_Key,'CP_Key') as CP_Key
            ,docType
            ,docPathPattern
            ,protocol
            ,location
            ,cutoff
            ,version
                                                                                                                 

FROM  [lib://MSS_MCA_XDS_Base_Folder/QVD/Extractor/2.1/Raw/teble2.qvd]
(qvd)
 where not exists(CP_Key_Temp,autonumber(CP_Key,'CP_Key'))
 
Can you please suggest on this.
 
Thanks
 
 
 
Labels (4)
5 Replies
Digvijay_Singh

Is there any problem in the  way you are trying? I normally use this way only to avoid duplicates. Sample data would help to understand the problem better.

marcus_sommer

IMO the use of autonumber() is rather not expedient - means no direct benefit but adding more complexity. Further it depends on the kind of the merging-requirements if a field-duplication is needed to define the wanted exists-statement.

Be aware that exists() isn't related to any table else checks against all already loaded field-values which means the order of load-statements could impact the results and it might be sensible to change them and/or dividing the task into multiple applications - especially because by using exists() with two parameters the load couldn't be performed optimize anymore.

deep2021
Creator III
Creator III
Author

Hi,

 

It is showing me out of memory issue when I m trying for the larger datasets.

My reload is failing. My error is as per the below,

 

20230426T170745.516+0100 9 fields found: CP_Key, CP_Key_Temp, docType, docPathPattern, protocol, location, cutoff, version, ccyIndexInstrument,
20230426T175509.029+0100 -129 =
20230426T175534.758+0100 Execution Failed
20230426T175534.764+0100 Execution finished.

 

Sometimes the task is failing without giving any error msg.

Can you please suggest on this.

 

Thanks

marcus_sommer

There are various reasons possible. The simplest is it are really to much data for your current environment. But more likely are any mistakes within the script, for example further loads before and after your shown snippet, for example the $(tmp_statement) before the second load.

Helpful in such cases is to look within the document-log (it should be in general enabled and set to the instant writing mode to get log-information in the case the load breaks) and/or commenting some load-parts and looking what happens and which results the tables get and/or reducing the number of records with further where-clauses and/or first statements.

deep2021
Creator III
Creator III
Author

I have removed  $(tmp_statement). My concern is how can I replace where not exist clause to remove duplicates.

Is there any alternative way.

I have huge data sets.

I am unable to preform the below calculation with where condition.

 where not exists(CP_Key_Temp,autonumber(CP_Key,'CP_Key'))

 

Thanks