Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
maayandermer
Contributor III
Contributor III

How-to-guide: Bulk Insert to Azure SQL Server

If you are trying to import/insert a lot of data to Azure SQL Server (Not on-prem/SQL Server on the Cloud hosted on a VM) you will run into some issues. Especially if you are trying to copy data from one Azure SQL Server to another.

My Architecture is Azure Cloud, Talend Clound Data Integration and Remote Engine that is deployed on Virtual Machine in the same Datacentre/region as all my resources (as much as possible) in order to have good performances.

 

I will save you all the things that I tried, the main issues are:

  1. If you will use the tMsSqlOutput will create row-by-row transactions. Not very efficient when you want to insert millions and millions of records.
  2. Azrue SQL Server is a PaaS – it doesn’t have the concept of Filesystem. So traditional bulk load will not work here – you cannot just point it to the location of the file because there is no such a thing…0683p000009M0Ry.jpg

 

 

The gist of the solution is using 2 jumphosts one of them is the Remote Engine as a Filesystem, the second one is Azure Storage Blob/Container to be used as a location that the Azure SQL Server can access.

 

The pipeline that eventually worked was:

 

0683p000009M0BI.png

 

And the Talend pipeline that orchestrate it:

 

0683p000009M0S8.png

 

Detailed instructions:

 

  1. First create SAS – Shared Access Signature to the Azure Storage that will be used as your 2nd jumphost

 

0683p000009M0SI.png

 

 

 

  1. If you followed the Instructions correctly you will get a SAS connection string in the following form:

 

sv=2015-12-11&ss=b&srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z&spr=https&sig=copyFromAzurePortal

 

Don’t forget to remove the ‘?’ at the beginning of the string after copying the SAS connection string!

 

  1. The next stage will be to create the credentials in the Target DB. 

 

CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential

WITH IDENTITY = 'SHARED ACCESS SIGNATURE',

SECRET = 'sv=2015-12-11&ss=b&srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z&spr=https&sig=copyFromAzurePortal';

 

The SECRET is the SAS you created in stage (2)

 

  1. Now we need to create an External Data Source:

 

CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage

WITH ( TYPE = BLOB_STORAGE,

       LOCATION = 'https://myazureblobstorage.blob.core.windows.net',

       CREDENTIAL= MyAzureBlobStorageCredential);

 

Notice that MyAzureBlobStorage is the name of your Storage account

 

  1. The BULK INSERT instruction itself will look something similar to this:

 

BULK INSERT Product

FROM 'data/product.csv'

WITH ( DATA_SOURCE = ‘MyAzureBlobStorage');

 

*DATA_SOURCE is the EXTERNAL DATA SOURCE that was created at stage (4)

** notice the ‘data’ in FROM 'data/product.csv' – it is the Blob in the Azure Storage. You can also add subfolders if needed.

 

More info on here

 

And the result:0683p000009M0SN.png

 

 

 

Cheers

Maayan

Labels (4)
5 Replies
Anonymous
Not applicable

Good posts, thanks for your sharing! Maayan

Regards
Shong
Anonymous
Not applicable

Hey,

thanks for the help. Amazing info you have given.

 

Samridhi
Contributor II
Contributor II

HI ,

 

with tdrow I am getting below error

" could not be opened. Operating system error code 5(Access is denied.).I have also tried it with tdbbulkexec in that also I am facing same error .

Please suggest .

 

 

maayandermer
Contributor III
Contributor III
Author

From which component? This is weird Azure SQL Servers don't have OS. Maybe is from the Remote Engine?

Rathesh
Contributor
Contributor

What can i do if a column is having new line in the data?. While loading into file, it is creating as new row. So getting error while doing the bulk insert