Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Scenario based Question in Talend???

I have one scenario based question to all the Talend forum members. It come like this as given below :-
I)Suppose you are having source text file named "A.txt" which contains billons of records.
The task is to extract all the data from source file and loads into target table named " Table D".
When millions of records are inserted into your target table, in the middle of your job for the particular record/data you get error.
After resolving the error, when you execute the job again I want my job to do the following :-
The already inserted records should not be inserted again in the "target table D" keeping the performance of your job in mind.
How can we do?
Consider the cases : 1) Source file may contain unique records
and 2) It may contains duplicate records
B) If you the use of tmemorize components, how we can achieve/do this task without using the components?
Thanks for your help in advance.
Regards
SWABHOSA
Labels (1)
8 Replies
Anonymous
Not applicable
Author

Hi,
Have you checked "insert or update" option in t<DB>output. On schema of t<DB>Output, check the 'key' box of the columns which are the primary key in database, if you select 'insert or update' option, the data will be inserted if it does not exist yet, otherwise, it is updated.
Best  regards
Sabrina
Anonymous
Not applicable
Author

Hello Sabrina,
Thanks for your reply. I am aware of that "insert and update" option in t<db> component.
What if you have to insert only their will be no use of the "Update" options.
When I re-run my job, it should insert the new records only, not the already inserted records.
Best Regards:-swabhosa
Anonymous
Not applicable
Author

If it has a key column.
You can simply look up the data in "Target D" and join it with your input file using a tmap.  Records that match can be filtered out.
Anonymous
Not applicable
Author

Hello Egoetsch,
Thanks for reply!!!
It does not have an key column.
Regards
Swabhosa
Viswa560
Contributor
Contributor

Any answer for this post????
_AnonymousUser
Specialist III
Specialist III

can tell me what could be the error ? if it  is  data mismatch error, you can always do the schema level checks  then pull out the  rejected records into the file and only correct data can taken further for inserting into database
vaibs_parab
Contributor II
Contributor II

1. In case of Source file may contain unique records:

==> In this case, you can import all the files to the Destination table. And before inserting into Destination table use tMap to lookup with the source file unique column. Map the primary key from the inserting main flow to the primary unique column of lookup source file. And keep the 'Inner Join Reject' setting to True in tMap at the output side.

This will insert only those records which are not in the destination table if few of the records are already there is destination table, then it will reject based on an inner join.

 

1. In case of Source file may contain duplicate records:

==> I am assuming here the whole row does not have duplicated values, only a few columns have duplicated values.

For example:  | Id   |  Name |  Age |  Country |

                         2       John       25       USA

                         5       John       28       USA

                         8       Jack       32        UAE

 

For above case use tAddCRCRow component while inserting data into dest table. And apply/check CRC for all columns, so each record will get a unique CRC value. After that before inserting data into Dest. table join the main flow with source file CRC value as lookup and keep the 'Inner Join Reject' to True at the output side in tMap.

This will insert only those records which are not in the destination table if few of the records are already there is destination table, then it will reject based on an inner join.

 

Don't forget to give kudos if this reply is worked for you. Also, suggestions/ improvements for this solutions are appreciated.

Anonymous
Not applicable
Author

Hi  @swapneel9218,

In this scenario we can create hashkeys for each record  not for key column using MDM Hash Key  Routine.

after creating hash keys for already inserted table and newly inserting table lookup those keys in tMap then we will get only the filtered new columns.

Example:

Data1:                          Data2:

 Id       Name               Id       Name 

100     krane               101    Lunarg       (updated)

101     Lunar               103    Harry         (duplicate)

103     Harry               104    potter         (new record)

                                   105    Henry        (new record)

so lets think there is no primary key then we create hash key for whole record,

then the hashkeys for example

Data1:                                     |                Data2:

 Id    |  Name  |  hashkey1 |  Id  |    Name  |   hashkey2

100    krane     1005867   | 101    Lunarg  (updated)    5456878

101    Lunar    1239878   | 103    Harry  (duplicate)     4354587 

103    Harry     4354587  | 104    potter (new record)   5650112

                                        | 105    Henry (new record)  1008867

so in the above example you can see the the hashkeys for 103 record it is same.

for updated record 101 it is different hash key. so based on this we lookup the hashkeys in tMap get the rejected columns in output.i.e, in target  Table D.

 

Don't forget to give kudos if this reply is worked for you. Also, suggestions/ improvements for this solutions are appreciated.