Solved: Duplicate check between files while using tFilelis... - Qlik Community

Anonymous · ‎2019-10-15

Hi Team,

I am trying to load files from a directory to MySql Output table
I used tFileList > tFileinputDelimited>tMap>tMySqlOutput design to iterate through the files
Now I want to remove duplicate data between files. ie, check the data based on a column or combination of 2-3 columns between the files
For example: if month column of first file contains data NOV and if the second file contains same month data as NOV, job should neglect the second file to load
Please help me to implement this concept in my job

TRF · ‎2019-10-15

You need a new field into the temporary file.

Change the design like this:

tFileList--(iterate)-->tFileInputDelimited-->tMap-->tFileOutputDelimited(with Appen option ticked)

|

+(OnSubjobOK)

|

tFileInputDelimited-->tUniqRow-->tMysqlOutput

In the tMap you add a field into the output flow (let say filename) and use this expression to populate this field:

((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))

Change "tFileList_1" depending on your component real name.

Is that what you expect?

View solution in original post

TRF · ‎2019-10-15

I suppose all your input files are based on the same schema. In such a case, you can read all the input files and push the result to a single temporary file the eliminate the duplicate records before to go into MySQL.

The design should look like this:

tFileList--(iterate)-->tFileInputDelimited-->tFileOutputDelimited(with Appen option ticked)

|

+(OnSubjobOK)

|

tFileInputDelimited-->tUniqRow-->tMysqlOutput

Anonymous · ‎2019-10-15

Thanks TRF for providing the job design and concept. Can you please tell me how will I identify and remove the duplicates from the temporary file and distinguish the data is from from first file and second file to find out the correct data.

TRF · ‎2019-10-15

You need a new field into the temporary file.

Change the design like this:

tFileList--(iterate)-->tFileInputDelimited-->tMap-->tFileOutputDelimited(with Appen option ticked)

|

+(OnSubjobOK)

|

tFileInputDelimited-->tUniqRow-->tMysqlOutput

In the tMap you add a field into the output flow (let say filename) and use this expression to populate this field:

((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))

Change "tFileList_1" depending on your component real name.

Is that what you expect?

Anonymous · ‎2019-10-16

Thanks TRF, I have tried this approach and it is working as how the files are placed in the the directory.The order of the file in tFileList is from the last file in the directory to the first file, right? I mean the order of the files. Can we specify the order of file load in tFileList or using any component? Also How will I specify the filenames in tfileinputDelimited, tFileOutputDelimited in the main job and tFileinputDelimited in the subjob? using ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH"))?

Duplicate check between files while using tFilelist

SQL

Talend Data Integration

v7.x