Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

newbie question on writing to a sql db

How do I make sure that I do not duplicate data being copied from a flat file to a sql table 

I am reading a txt file going thru logrow and then writing to a sql table. The process works but every time I run it the same data gets inserted into my sql table 

I know when writing to a txt file you have the option of append or rewrite .. is there such an option for writing to sql 

 

Tom

 

Labels (1)
  • v7.x

1 Solution

Accepted Solutions
vapukov
Master II
Master II

there are many ways to achieve the same goals:

 

  1. who and how generate these logs?
  2. is it possible truncate logs after reading?
  3. you could use insert ignore if output database support it
  4. you could add database trigger to ignore duplicates if a database does  not support insert ignore, or use tSQLRow to do the same (INSERT xxx WHERE NOT EXISTS ...)
  5. you could read data from db and reject duplicates in tMap if data not too big

View solution in original post

4 Replies
vapukov
Master II
Master II

Hi Tom

 

with files the same - if you will append same information, it will be duplicated and in files as well

 

but you could save in database row number - add column and populate it by talend sequence

on next time - you will need request max row from table and start read the file from NNN+1 (header section of tFileInputDelimited)

you could also add more logic:

  • count number of rows in file
  • read from NNN+1 if NNN bigger than number of rows
  • read from begin if number of rows less than NNN - it will help in case if file was truncated, but also will require reset max NNN in database
  •  
Anonymous
Not applicable
Author

Thank you for the quick response .. I understand the thought that the data could be duplicated if the input file was used multiple times. Which is what I see when I run the job more then once. Is there a way to parse the data itself either via Talend or SQL to stop duplicate rows from being added. I am trying to learn and become more proficient so any support is very welcome

Tom 

vapukov
Master II
Master II

there are many ways to achieve the same goals:

 

  1. who and how generate these logs?
  2. is it possible truncate logs after reading?
  3. you could use insert ignore if output database support it
  4. you could add database trigger to ignore duplicates if a database does  not support insert ignore, or use tSQLRow to do the same (INSERT xxx WHERE NOT EXISTS ...)
  5. you could read data from db and reject duplicates in tMap if data not too big
Anonymous
Not applicable
Author

This is not a production environment of any sort just me trying to learn how to write to a sql db. One idea that came to me was the use of Line id and adding logic to verify the new data line id versus the largest one in the db itself. What I am finding in developing in Talend is looking at all the ways that the data needs to be handled prior to ingestion 0683p000009MACn.png

 

Tom