Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
sravanth49
Contributor III
Contributor III

[resolved] tFileList-Read CSV files

Hi Community,
I have many csv files in distributed directory. There are duplicate file-names in those directory. I want to read those files only once, if there are duplicate filename it should read only one file.
example
D:\test\a\ abc.csv, 123.csv,yud.csv
D:\test\b\rd.csv,xy.csv
D:\test\abc.csv,fty.csv
In above you can observe abc.csv is located in 2 locations. I want to read one among these two csv.
Please do needful help.
Thanks,
Sravanth   
Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable

Try this 
tFileList(select Include Subdirectories option)----->tIterateToFlow-------->tUniqRow
hope this help you

View solution in original post

4 Replies
Anonymous
Not applicable

You need to store the file names. Where (memory/file/database) depends on whether or not you want this de-duplication to persist across runs of your Job.
A database table of processed files may be the sensible option. You can then insert each successfully processed file and then check the database each time you pick up a new one.
If you don't have a database to hand, I always use SQLite for this type of activity.
sravanth49
Contributor III
Contributor III
Author

Hi Alan,
Thanks for reply.
Can you please say in terms of talend implementation. Show me the way like what component I have use in squeal with screenshot.
Thanks,
Sravanth 
Anonymous
Not applicable

Try this 
tFileList(select Include Subdirectories option)----->tIterateToFlow-------->tUniqRow
hope this help you
sravanth49
Contributor III
Contributor III
Author

Thanks manish. Your suggestion makes lot of sense