Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Sorting before tuniq...

Hi

I have a MySQL table of 60m rows and need to dedupe the table keying on all 6 columns.  Do I need to sort using tSortRow first then follow with a tUniqRow or can I go straight into a tUniqRow and let the component deal with it.

 

Any advice on whether this is the right approach or if there's a better way would be great!

 

Thanks

 

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Why not dedupe and sort in your database? That is what a database is good at. If you have 60m rows where only a third are duped, that is 20m rows that you unnecessarily send to Talend for them to be thrown away. While Talend is a great ETL tool, it uses Java. Java is good at many things, but it isn't as quick as a database at sorting and filtering. 

 

I'd recommend sorting and filtering your data in your database by writing a query to do that in your DB component. This way only the necessary data will enter your job and in the correct order. After that your job will have a lot less work to do. 

View solution in original post

2 Replies
Anonymous
Not applicable
Author

Why not dedupe and sort in your database? That is what a database is good at. If you have 60m rows where only a third are duped, that is 20m rows that you unnecessarily send to Talend for them to be thrown away. While Talend is a great ETL tool, it uses Java. Java is good at many things, but it isn't as quick as a database at sorting and filtering. 

 

I'd recommend sorting and filtering your data in your database by writing a query to do that in your DB component. This way only the necessary data will enter your job and in the correct order. After that your job will have a lot less work to do. 

manodwhb
Champion II
Champion II

@DaveG2008,since your source is DB right,i will suggest you to do in the DB level to remove duplicates.

 

if you feel your DB server will not able to handle then go with tUniqRow.