Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Dataguy46
Contributor III

Loading Multiple csv file from amazon s3 bucket

Hi,

I am trying to pull data from the Amazon S3 bucket, and the path is ibv2-live\activity\server1\,

there are multiple CSV files in this path with prefix activity,

Could you please help me with a script to load this?

Thanks in advance.

Labels (1)
1 Solution

Accepted Solutions
datadiva
Contributor II

The best way to make this work without doing loops is to do to the Data Editor.  In there, you should connect to your S3 bucket.  Open the connection and navigate to the file path where the files you want to open are and select one file from the path and click "Insert Script".  

You'll get something like:

LOAD
"Column 1",
"Column 2",
"Column 3"
FROM [lib://<SPACE>:<BUCKET>/<FOLDER>/<FOLDER>/<FILENAME>.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);

Change the columns listed to LOAD * and remove all of the <FILENAME> portion with the extension and leave the last line of code.  That way you end up with something like this:

LOAD * FROM [lib://<SPACE>:<BUCKET>/<FOLDER>/<FOLDER1>/]

(txt, codepage is 28591, embedded labels, delimiter is ',', msq);

That will pull all of the files.

Also for security it would be wise to not share your actual S3 bucket name and file path in questions.

View solution in original post

4 Replies
datadiva
Contributor II

The best way to make this work without doing loops is to do to the Data Editor.  In there, you should connect to your S3 bucket.  Open the connection and navigate to the file path where the files you want to open are and select one file from the path and click "Insert Script".  

You'll get something like:

LOAD
"Column 1",
"Column 2",
"Column 3"
FROM [lib://<SPACE>:<BUCKET>/<FOLDER>/<FOLDER>/<FILENAME>.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);

Change the columns listed to LOAD * and remove all of the <FILENAME> portion with the extension and leave the last line of code.  That way you end up with something like this:

LOAD * FROM [lib://<SPACE>:<BUCKET>/<FOLDER>/<FOLDER1>/]

(txt, codepage is 28591, embedded labels, delimiter is ',', msq);

That will pull all of the files.

Also for security it would be wise to not share your actual S3 bucket name and file path in questions.

Charvick
Contributor III

Hi datadiva, I have used the same approach and excluded the filenames from the path, so that it only points to the folder that has the csv files in it. However, I am getting a badFileSize error, this is my query: 

LOAD Distinct
UM_USER_NAME as USERID,
Combo_ID
FROM [lib://Amazon_S3/qa/<S3>/<Folder>/<Folder 1>/<Folder 2>/<Folder 3>/<Folder 4>/]

Inside folder 4 all of my csv files will be present. 

Can you please help me with this?

Thanks in advance.

 

 

Dataguy46
Contributor III
Author

Hello Charvick,

 

Please note that this method works only if your files have same schemas.

swarnapavani
Contributor

Hello @Charvick 

Please use this script to load the  multiple files from web connectors.

// Manually define the list of file names

 

for each vFile in [add Filenames here] ex:'Backup_Jobs_Report_2025-01-09.csv','xxxx.csv','yyyy.csv'

    // Load the data from each specified file

BackupJobs:   

LOAD *

    FROM [Lib path/foldername/$(vFile)]

    (txt, codepage is 28591, embedded labels, delimiter is ',', msq);

next vFile

 

Exit Script;