Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I am very much new to the Hadoop ecosystem softwares and recently installed talend open studio for big data 7.3.1 on my laptop.
I have one VM (centos 8 based) running on my azure account where I have created one single node hadoop cluster with hive and mysql database.
Hadoop Version - 3.2.1
Hive Version- 3.1.2
Mysql Version- 8
Now I am trying to ingest data from a centralized oracle 12c server to my hive database through talend.
DB connection to oracle database from talend is successful but while trying to connect to hive database in VM talend is showing classnotfound exception.
I have installed all libraries i.e. required and third party, still not able to make a successful connection.
I have added the screen shot of Hive connection details and the error that I got.
Please advice. Thanks in advance.
Hello,
The error indicates the requires JDBC driver is missing or the installed driver is not compatible with the current Hive version 3.1.2 you are using. Have you tried to check and find the right driver file and use tLiabraryLoad to load the drive file?
Let us know if it is OK with you.
Best regards
Sabrina
Hello,
I have installed all the third party libraries and have used cloudera distribution with CDH6.1.1, with this configuration I have successfully created hive connection metadata. Now with cloudera CDH6.1.1 I am able to create my hadoop cluster also.
My ultimate goal is to create a job which will read oracle database and write files in HDFS.
I have created the job with tDBInput and tHDFSOutput component.
The job is failing with error
org.apache.hadoop.ipc.RemoteException: File /user/hive/warehouse/hive_db.db/approved_requests/test_3 could only be written to 0 of the 1 minReplication nodes. There are 2 datanode(s) running and 2 node(s) are excluded in this operation.
Please advise how to fix this problem.
Best Regards,
AGhosh1596267888
Attaching the exception received
Talend and its community provide you with the convenience to keep using a version its vendor ceases to support in Talend products. For this reason, this version could be still listed in the following tables and available in the products but Talend stops providing support for this version.
Hello,
thanks for the update.. can you please let me know which set up I should use to complete the mention workflow.
Also please point me to any relevant documentation which will help to choose the supported versions.
Hello,
On the version list of the distributions, some versions are labelled Builtin. These versions were added by Talend via the Dynamic distribution mechanism and delivered with the Studio when the Studio was released. They are certified by Talend, thus officially supported and ready to use.
Usually, we use tSqoopImport to call Sqoop to transfer data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS).
Please refer to this page: TalendHelpCenter: Sqoop scenarios
Best regards
Sabrina
Hello,
Thank you very much for pointing me to the right direction. I will definitely look into sqoop import in order to complete the workflow.
Also, please let me know if there is any possible option to capture and upload changed (Insert update and delete) data from Oracle to HDFS or cassandra on real time basis?
Best Regards,
AGhosh1596267888
Hello,
With talend DI job, you can use tFlowMeterCatcher and tFlowMeter components to catch processed row's count.
The tFlowMeterCatcher component catches the processing volumetric from the tFlowMeter component and passes them onto the output component.
For example:
tFileInput--->tFlowMeter--->tMysqlOutput_1
tFlowMeterCatcher---tLogRow
There is one column called "count" in the schema of tFlowMeterCatcher which counts the no. of records pass by the specify flow.
Here is a new feature jira issue about Monitoring Statistics for Spark Jobs:
https://jira.talendforge.org/browse/TBD-5137
Let us know if it is what you are looking for.
Best regards
Sabrina