Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Gourav_King_of_DataLand
Contributor II
Contributor II

check incoming data from JSON before inserting into database

Hi All,

 

I am extracting some chats from REST API, incoming data is in JSON. So far I have been able extract data from API and put in DataBase. what I want to do next is to check is chats already inserted before inserting into table. Some more information chat table has millions of records because it stores all the conversation, one row per message. So there are two table here 

CHAT_DETAILS - this stores everything (millions of records)

CHAT_MASTER - This Stores the unique chat_ID and some other information. (couple of hundred thousand records). This is the table where I want to check for incoming Chat_IDs before inserting.

If Chat_ID exists in CHAT_MASTER = STOP THE JOB 

If Chat_ID doesn't exist in CHAT_MASTER = continue inserting

 

0683p000009LzGz.png

 

 

 

 

 

 

 

 

 

0683p000009LzSg.png

 

 

Labels (4)
8 Replies
vapukov
Master II
Master II

You can:

1) use tFlowToIterate and read data from database with this id - tDBInput and use tRunIf if component return or not data, like:

 

SELECT CASE WHEN EXISTS (SELECT * FROM table where Chat_ID = Chat_ID) THEN 1 ELSE 0 END as ID_Exists

 

 

2) as I understand it oracle (?) and because component do not support INSERT/IGNORE You can use tOracleRow component and with prepared query run something like

INSERT INTO table (a,b,c) VALUES (?,?,?) WHERE NOT EXISTS (SELECT * FROM table where Chat_ID = Chat_ID)


in both case use variable instead of Chat_ID

 
3) use triggers on table

Anonymous
Not applicable

you can also:

 

put a tMap component between extractJSON an oracleOutput.

connect it with extractJSON and oracleOutput components

add a tOracleInput component (CHAT_MASTER) as lookup

 

0683p000009LzSv.png

 

tMap configuration looks like this:

 

0683p000009LzH9.png

 

Important: 

"Lookup Mode"l has to be set to: Load on every row,

"Catch lookup inner join reject" must be set to true)

you need to add a global Map Key - name it KEY_CHAT_ID and connect it with the CHAT_ID from the main row (row 3)

 

tOracleInput configuration/query looks like that 

 

0683p000009LzQh.png

 

in the WHERE condition you can see the key you´ve defined before in the tMap.

 

Add a tOracleCommit component after the oracleOutput. You have to commit after every inserted row

because in your JSON file the same CHAT_ID might be more than one time (maybe - I dont know your data).

 

Close the Oracle connection in the Post-Job, not in the oracleCommit component, deselect the check box

 

0683p000009LzSc.png

 

Gourav_King_of_DataLand
Contributor II
Contributor II
Author

Hi odisys,
Thanks a lot for taking out time to provide detailed example for solution, this works great. One of the most important step in all this for me is
" If Chat_ID exists in CHAT_MASTER = STOP THE JOB "
What it means is as soon as I hit any of the chat ID that is already existing in the chat_master table entire job should be stopped/killed. In the example you provided it is continuing the job by insert the ones that are not present and excluding the ones are already present. Can you please help to understand how can I get this condition to work " If Chat_ID exists in CHAT_MASTER = STOP THE JOB "

Gourav_King_of_DataLand
Contributor II
Contributor II
Author

Hi @vapukov,

 

Thanks for your advice and help. I am quite new to Talend and have never used any of those components and not sure what to do. Option 1 you provided seems to be the most suitable since for me it's important that I should be able to stop/kill job as soon as I hit the condition " If chat_id already exists stop the job". I will try to understand and learn about the components you mentioned. 

Anonymous
Not applicable

hi gr44,

 

just that i get it right.

 

lets say you have 200 different chats (CHAT_IDs) in your JSON.

Lets also say the first 100 are not yet in the CHAT_MASTER table.

Number 101 is.

 

Do you want to

  • write the first 100 records into the database and exit the job when reaching the 101st?   OR
  • you do not want any of the records to be written into the database?

If it´s the first variant: what if just  record 101 is already in the database but 102 -200 are not (as 1 - 100 are not). If you exit the job at 101 then 102 - 200 will not be written. Probably your use-case makes sense but it´s a bit different to understand for me 0683p000009MACn.png

 

best regards

 

Gourav_King_of_DataLand
Contributor II
Contributor II
Author

Hi Odisys,

 

I want to write the first 100 records into the database and exit the job when reaching the 101st.

 

"If it´s the first variant: what if just  record 101 is already in the database but 102 -200 are not (as 1 - 100 are not). If you exit the job at 101 then 102 - 200 will not be written. Probably your use-case makes sense but it´s a bit different to understand for me"

 

You point is completely valid and I was bit concerned about this, however the JSON output is always sorted by date. Let's say there are 500 records, I have already successful inserted 400 records. The first 100 record from the JSON output are the new ones that are not in my tables and 101 onward are already in my table. So in this scenario it wouldn't be an issue.

 

However in a scenario where JSON output was randomly sorted then my approach wouldn't work and what you mentioned is correct and would become an issue.

 

 

Why I want to exit as soon as I hit any Chat_ID already exist and not continue : 

  • Reason mentioned above
  • There are alternative ways to get only required data based on Date and other parameters but the REST API  I have have very limited options and is kind of inefficiently designed to get data out. Only one URL gives me data in my desired format rest all others are different formats and missing data field. Hence this kind of work around.

Hope that clarifies somewhat 🙂

 

 

 

vapukov
Master II
Master II


@gr44 wrote:

Hi @vapukov,

 

Thanks for your advice and help. I am quite new to Talend and have never used any of those components and not sure what to do. Option 1 you provided seems to be the most suitable since for me it's important that I should be able to stop/kill job as soon as I hit the condition " If chat_id already exists stop the job". I will try to understand and learn about the components you mentioned. 


Hi!

 

it is similar with example from @odisys

 

same idea - use CHAT_ID from request for query filtered data from database and than JOIN in tMap

Anonymous
Not applicable

Hi @gr44

 

OK i got it 🙂

 

Possible solution - I dont have my private notebook here where i created the screenshot from posting before, so i have to explain it by words.

 

Add a tDie component to the job.

Connect a second output row from tMap to tDie - name it stop/exit or something else.

Inside the tMap change the join model on the left side to Left outer join.

 

On the right side - in output update

Click the Filter symbol - it should be the arrow pointing to the right, just hoover the mouse over the buttons

Enter ->

Relational.ISNULL(row4.CHAT_ID)

 

On the right side - in output stop (or how you named it)

Click the Filter symbol

Enter ->

!Relational.ISNULL(row4.CHAT_ID)

 

I would also rename the output "update" to "insert" to have a better/clearer understanding what the jobparts is doing.

 

best regards, there might be typos in my posting but i guess you get the idea