Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

unique on large/huge file

Hi 

I have a file having 100M records, have to do unique on all columns. What's best way to do in terms of performance. I have memory setup like 30gb-50gb. but still too much time.

0683p000009M1BY.png

 

Thanks!!

Labels (2)
2 Replies
Anonymous
Not applicable
Author

Hi,

 

    Considering the data volume, you will have to allocate temp disk space to mark the data interim for comparison.

 

    Please refer the advanced tab to setup this configuration.

0683p000009M1BZ.png

 

 

If the answer has helped you, could you please mark the topic as resolved? Kudos are also welcome 🙂

 

Warm Regards,

 

Nikhil Thampi

vapukov
Master II
Master II

using disk - do not increase speed (as it was in question)

 

generally - volumes problem possible resolve only by "force".

 

first of all -  talend (Java) - good utilize cpu for sorting, and disk speed not very critical until you do not use disk for store temp data

 

so solutions could be:

- when disk usage enabled - use fastest disk as possible, standard HDD - 150Mb/s, SSD - 500Mb/s, NVMe - 3300Mb/s. For example - AWS provides NVMe disks, Azure - not.

- when all "in memory"- memory speed and cpu (speed, cache) is important. it is complicated, but not always 4.7Ghz cpu win over 2.7Ghz, many other parameters affected, like an on-chip cache size, memory bus wide, frequency, number of clocks and etc

 

in both cases - whenever it possible reduce the number of columns for sorting ("check unique" it kind of sorting)