Solved: Migrate data from My SQL to AWS S3 without any Del... - Qlik Community

Sireesha_Chapa · ‎2021-02-10

Hi All,

I have 2 questions which I am looking in multiple documents but could not find concrete answers:

I would like to know if there are any components that would migrate directly from My SQL to S3 without using any delimited file.
Even if it should be written to any file, can I directly write the data from My SQL to Parquet and then to AWS S3 ? I could see many references where one writes into CSV and from CSV it would be written into S3. So, I should know without using CSV, is there any architecture in Talend that transfers to Parquet from My SQL directly and then from Parquet to S3.

I appreciate your quick response. Based on that I am in need to check about the feasibility of using Talend in our project.

Sireesha_Chapa · ‎2021-02-10

Thank you @Thomas Dye. So you mean if I use Talend Stitch, I can directly insert the data from MySQL to AWS S3 without any file? I would check meanwhile.

"But the other alternative of using like Spark Job" - Can I get any references for this for inspiration. I could not find it.

Also if I use tDBInput_1 -->tFileOutputParquet_1

Would this work? Would it insert the data directly from Server without using any spark jobs?

View solution in original post

Anonymous · ‎2021-02-10

Hi Sireesha, You have 2 possibilities. First look at Talend Stitch. This tool is designed to move data in bulk and supports both your endpoints. If you can't use Stitch for some reason, you may try to create a Spark Job that reads MySQL and writes Parquet. You will want to add as many executors as you have hardware, to foster performance. This also assumes you have a Talend Big Data License. Stitch is the preferred solution. It is easy to get a trial license.

Sireesha_Chapa · ‎2021-02-10

Thank you @Thomas Dye. So you mean if I use Talend Stitch, I can directly insert the data from MySQL to AWS S3 without any file? I would check meanwhile.

"But the other alternative of using like Spark Job" - Can I get any references for this for inspiration. I could not find it.

Also if I use tDBInput_1 -->tFileOutputParquet_1

Would this work? Would it insert the data directly from Server without using any spark jobs?

Migrate data from My SQL to AWS S3 without any Delimited files

Cloud

Cloud Migration