Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik GA: Multivariate Time Series in Qlik Predict: Get Details
cancel
Showing results for 
Search instead for 
Did you mean: 
pthomas
Contributor III
Contributor III

How can I pull the results of a query in aws into Talend?

I have a bucket with multiple parquet files.  I need to get from all of the files within a folder, the unique ids.  I need to have that in Talend so I can loop through the next steps.

In terminal, using spark, I can run the statement below and get the ids that I need.  Is there a way to run that in a tSystem or some other component that will return the df list?

 

df = spark.read.parquet("s3a://talend/bronze/books/").select("bookId").distinct()

df.show(false)

 

 

Labels (3)
1 Solution

Accepted Solutions
Shicong_Hong
Employee
Employee

Hello 

You can run PySpark commands in a tSystem component like you did in terminal. Refer to these topics to learn how to execute a python script file using tSystem component.

https://community.qlik.com/t5/Talend-Studio/how-to-execute-a-python-script-file-with-an-argument-usi...

https://community.qlik.com/t5/Talend-Studio/using-tsystem-in-studio-in-windows-to-call-python-script...

Before running the PySpark commands, you must have a Spark environment set up.

Regards

Shicong

View solution in original post

1 Reply
Shicong_Hong
Employee
Employee

Hello 

You can run PySpark commands in a tSystem component like you did in terminal. Refer to these topics to learn how to execute a python script file using tSystem component.

https://community.qlik.com/t5/Talend-Studio/how-to-execute-a-python-script-file-with-an-argument-usi...

https://community.qlik.com/t5/Talend-Studio/using-tsystem-in-studio-in-windows-to-call-python-script...

Before running the PySpark commands, you must have a Spark environment set up.

Regards

Shicong