Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Does anyone know if it is possible to write multiple queries in the tHiveInput component? For example, I have a series of queries like create a table, main query and drop table.
I got error messages either "missing EOF" , or " cannot recognize input near ''<1st query tail>'' ';' '<2nd query head>' in expression specification" if I use semicolon in between the 2 queries.
Thanks!
Dawn
First component will be tFixedFlowInput component. each line, will have Hive statement.
Connect it with a tHiveRow component and SQL text area will be row1.sqlstmt - String column in tFixedFlowInput should have name "sqlStmt"
Then use tHiveInput for final select query.
Dear SachinD,
Thank you for your reply!
Do you mean the workflow will be tFixedFlowInput- tHiveRow - tHiveInput? If my queries are -
1. create table temp as ...
2. select * from A join temp ....
3. drop table temp
Can you give more details about it?
Thanks,
Dawn
tPreJob-->THiveConnection-->tHDFSConnection
tHiveRow (1. create table temp as .. ) -->THiveLoad (Load Data from HDFS file to Temp table, if your File is in HDFS location) --> on SubjobOk --> THiveInput (2. select * from A join temp ....)
tPostJob--> tHiveRow (3. drop table temp)
we can drop Temp table in tPostJobs, or we can create temp table using stmt CREATE TEMPORARY TABLE , which will be dropped after use automatically.
Thanks,
Sachin
Dear Sachin,
Sorry for late reply! I was busy with my projects in the last couple of days.
I think your focus is to load external files into database using Talend components; then drop them after the query. My major concern is how to create/drop/store temp tables in the database through Talend, so I could use them multiple times in the query which can help to improve the performance.
For now I am using Bash Shell scripts to load external txt/csv files from local machine into the Hive database, and also create temp tables in the Hive database through Bash Shell scripts; then do the main query from there; then output txt/csv/xlsx report through Talend; then drop those temp tables (either from external source or from Hive tables) by bash shell scripts. I am sure it is not the best way to handle it...
Dawn