Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Automate file processing from cloud storage is one of the most common use cases. This article explains how to integrate Google Storage with Talend Cloud for this case.
Google Cloud Storage is an object storage system on the Google Cloud Platform (GCP), if you are familiar with Amazon, the equivalent is S3.
Log in to your Google Cloud account, then go to the Storage page.
Click Create Bucket.
Configure your bucket.
Once your bucket is created, add a folder to classify your incoming files.
Create an access key for your application. In Google Cloud Storage, navigate to the Settings section, and click Create a new key.
Retrieve and secure the keys in a safe place.
Create a Talend Job to retrieve a file from Google Cloud Storage and display data in the console. Of course in a real life example, a Job is more complex. The demo Job is available in the GSRead.zip file attached to this article.
The Job is composed of the following steps:
Configure the context variables:
tempFolder: the folder used to store temporary data, on Talend Cloud Engine, it is /tmp.
fileKey: the file name from the GCP Bucket. It is composed of the folder and file name.
fileBucket: the Google bucket you created.
gcpAccessKey: the Google Storage access key you created.
gcpSecretKey: the Google Storage secret key you created.
Configure the tGSConnection component:
Configure the tGSGet component:
This component uses a split because the fileKey looks like cloud-function/sample.csv and you only need the sample.csv.
Configure the tFileInputRaw component:
Now your Job is configured. You can test it by adding valid parameters to the context.
For more information on publishing a Job to Talend Cloud, see Connecting Talend Studio to Talend Integration Cloud
Right-click your Job, and select Publish to Cloud.
Configure your export as needed, then click Finish.
Once the Job is published, click Open Job Flow.
If you are logged in Talend Cloud, you should see your configuration.
Configure the Job as appropriate.
A Google Cloud Function allows you to implement the following logic:
Google Cloud functions are written in JavaScript. The function and package code are available in the gcp-cloud-function.zip file attached to this article.
Find your Flow Id. In Talend Cloud, select your flow and check the Flow Id:
In your Google Cloud Platform console, go to Cloud Functions.
Create a function.
Name: Give your function a name.
Memory allocated: Select the memory needed for your function.
Trigger: In your case, use the Cloud Storage bucket.
Event Type: For each file Finalized/Created.
Bucket: The bucket you created.
You can use the inline editor to create the function.
/**
* Triggered from a message on a Cloud Storage bucket.
*
* @param {!Object} event The Cloud Functions event.
* @param {!Function} The callback function.
*/
function responseCall(error, response, body) {
console.log(JSON.stringify(body));
console.log(response);
}
exports.processFile = (event, callback) => {
console.log('Processing file: ' + event.data.name);
// Body
var parameters = new Object();
parameters.fileBucket = event.data.bucket;
parameters.fileKey = event.data.name;
var body = new Object();
body.executable = "<Talend Flow ID>";
body.parameters = parameters;
var jsonString= JSON.stringify(body);
console.log(jsonString);
// Call Executions
// Include the request library for Node.js
var request = require('request');
// Basic Authentication credentials
var username = "<Talend Cloud user>";
var password = "<Talend Cloud Password>";
var authenticationHeader = "Basic " + new Buffer(username + ":" + password).toString("base64");
//Request
var options ={
method: 'POST',
url : "https://ipaas.us.cloud.talend.com/api/v1.1/executions",
body: jsonString,
headers : {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization" : authenticationHeader }
};
request(options, responseCall);
console.log("Done!")
callback();
};
This function is rather simple:
{
"executable": "57f64991e4b0b689a64feed0",
"parameters": {
"fileKey": "cloud-function/sample.csv",
"fileBucket": "mgainhao-demo"
}
} You can test your API on the API documentation page, Talend Cloud API-Executions, of Talend Cloud.
Add basic authentication for connection to Talend Cloud.
Create a request.
Because you are using the module request, you need to update the package.json to add dependencies.
Click Create. The function is created.
You can access the dashboard by clicking the name of the function.
Test your function. Create a new file in the Google Cloud bucket.
Your function is called.
In Talend Cloud, verify that there are new executions of your Job.
In the logs, you should be able to see the content of the file in the tLogRow_1 section.
Using Talend Cloud to process a file from cloud storage is easier with the ability to call Talend API within a Cloud Function.