Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
We have a situation here which requires inputs from you.
The requirement is that we have a Job A that needs to be run without talend. We are planning to build the job and run them using windows scheduler.
Actually, the job loads data into BigQuery for a Single provider (say Provider A). Now, if we there is another provider joining (say Provider B), what is the best way to run the same job for Provider B again? There will be no change in the requirement/job at all, the only place where we may have to change is the DataSet name of the "BigQuery" which is different for each provider.
Each provider is independent of the other, so if possible running the same job for two different provider in parallel is also most welcome (if possible). We wanted to add the name of the Dataset into a flatfile and use it as a parameter (for getting the Data Set name) and load into the Bigquery table (based on the data set name). Can any of you provide me some idea on how to make this possible.
I know we can use context variable, but for my requirement, I need to pass the value for the dataset name using a parameter file name which will be different for each provider; how to make the 1st job point to 1st path and 2nd to point 2nd path. If not, is it possible to put both the dataset into a single parameter file and have the job to run twice based on the data set value.
Is there any other different way by which, I can run the same job many times based on the value provided as a parameter outside the job and the records should be loaded into the database based on the value provided in the Parameter (Dataset name)
Hi
You can use external context variable read. You can pass a parameter file and parse the params inside the Talend job .
When you run the second instance you can pass second param file so that you can run with second set of parms.
HTH
Thanks
Raghu
Here is what it has been asked to check at implementation point of view.
Create a flat file with list of datasets. if in future any datasets to be added (for another provider, the addition should be done here with a number by which it is to be identified)
The flat file will look like
1. Dataset1/Provider A
2. Dataset2/Provider B
3. Dataset3/Provider C
44. Dataset4/Provider D
....
So while running the talend job, the job should ask, for which dataset this run has to be done.The talend should give us the option something like this
Select the Dataset/Provider which has to be executed.
1. Dataset1/Provider A
2. Dataset2/Provider B
3. Dataset3/Provider C
44. Dataset4/Provider D
...
...
99 Exit
If any datasets added in future in flat file, that datasets should be displayed while running the job automatically and the number for that dataset should be the number which is given in the flat file.
If the number 1 is pressed, then the Dataset1 should be executed and if number 44 then the Dataset 4 should be executed.
The number is based on the number that is given in the flat file and Exit should end the program without executing further.
Can you please suggest how to work on this scenario.Should be more useful for me now and everyone in future, if you can share the job that is relevant for this scenario.
@shong wrote:
Hello
As raghumreddy suggested, read the param value from the flatfile and pass it to the business job dynamically, for example:
You have a flatfile that has the dataset names:
dataset1
dataset2
tfileInputDelimited--main(row1)--tFlowToIterate--iterate--other components--main-->tBigQueryOutput
tfileInputDelimited: read the dataset name from flatfile, define one column called "dataset"with string type. In the later components, you can get the current dataset name with this expression:
(String)globalMap.get("row1.dataset")
tFlowToIterate: loop the business processing multiple time for each dataset name.
Regards
Shong
Following is the error I am getting while trying to load the file as mentioned by you.
400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid dataset ID \"\"Dataset_Dev\"\". Dataset IDs must be alphanumeric (plus underscores, dashes, and colons) and must be at most 1024 characters long.",
"reason" : "invalid"
} ]
@sreenathtr wrote:
@shong wrote:
Hello
As raghumreddy suggested, read the param value from the flatfile and pass it to the business job dynamically, for example:
You have a flatfile that has the dataset names:
dataset1
dataset2
tfileInputDelimited--main(row1)--tFlowToIterate--iterate--other components--main-->tBigQueryOutput
tfileInputDelimited: read the dataset name from flatfile, define one column called "dataset"with string type. In the later components, you can get the current dataset name with this expression:
(String)globalMap.get("row1.dataset")
tFlowToIterate: loop the business processing multiple time for each dataset name.
Regards
Shong
Following is the error I am getting while trying to load the file as mentioned by you.
400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid dataset ID \"\"Dataset_Dev\"\". Dataset IDs must be alphanumeric (plus underscores, dashes, and colons) and must be at most 1024 characters long.",
"reason" : "invalid"
} ]
Got this problem resolved. The CSV options has to be enabled for getting this job to get loaded successfully.
If anyone can advice me of the previous implementation i asked for, it will be most useful. Thanks again.
Hi Shong,
Thanks for the reply.
Instead, can you get me the sample job that basically has this primary function working which can be useful for everyone looking for a similar setup.
I am looking for something like
job 1 -----> job 2
(This job) (running based on the flatfile value)
There is nothing much to be worked on Job 2, just populate the value that is extracted from the drop down value of job1.
Can you share your ideas please?