Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Talend Bigdata -POC-Use case Help

Hello Experts,
We are planning to start a POC for around 15 use cases by using Talend Big data open source edition and on successful we are planning to replace our existing commercial ETL tool with Talend Enterprise BIG data edition.
Could you please someone help me on implementing the below use case in Talend:-
One of our Sql server source table will be updating frequently in every 2 hrs with some transnational data (through some front end application) and we need to load that data into an Hadoop HDFS file (which is Big data environment)
We need to load the data from Sql server table to HDFS file in every 2hrs and each time we load the file we should extract only new or modified rows from the table instead of loading whole data in to the file (due to waste of space)
There is a ‘Load_date_Time’ column in source table but we can’t trust it.Hence each time we extract the data we need to compare the data with already loaded in previous cycle and load only new or changed rows in to target HDFS file.Also we don't have any control on source tables apart from just extracting the data.
And Talend job should be automated to run for every 2hrs.
How do we achieve above 2 scenarios,any help would be appreciated!
Thanks in advance!
Abhi
Labels (2)
2 Replies
Anonymous
Not applicable
Author

Hi Abhikriti,
Thanks for posting your job requirement here.
We have redirected your requirement to our bigdata experts and then come back to you as soon as we can.
Best regards
Sabrina
amarouni
Contributor
Contributor

Abhikriti,
What DB you're using ? MS-SQL ? What does the data look like ?
We can offload the data from the DB to HDFS using tSqoop component and then do some post-processing using HIVE, MAPREDUCE or SPARK. (Please note the MAPREDUCE and SPARK are only available with Talend entreprise, that you can download and try).
The tELTHiveXXX components can be used to do the post-processing by creating Hive tables and results.
Best Regrads,