Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us to spark ideas for how to put the latest capabilities into action. Register here!
cancel
Showing results for 
Search instead for 
Did you mean: 
Clever_Anjos
Employee
Employee

Qlik Request <-> Pandas Dataframe

I've been using pandas as base for python SSE's

I've checked which part of the code takes longer times, so I might improve performance.

I came to case processing 510K rows x 4 Columns (https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data#GlobalLandTempera...) and processed the files attributing the mean value for a country when a value as null (fillna() method from dataframe)

 

When I measured more than 70% of time is consumed converting from request values to Pandas dataframe back and forth

I'm using this to convert from request to dataframe

df = pd.DataFrame([(row.duals[0].strData, row.duals[1].strData,row.duals[2].strData,row.duals[3].strData) \
                                        for request_rows in request \
                                        for row in request_rows.rows], \
                                       columns=['a_date','b_country','c_value','d_value'])
 
and this to convert back
df = df.values.tolist()
response_rows = [iter([SSE.Dual(strData=row[0]),SSE.Dual(strData=row[1]),SSE.Dual(strData=str(row[2])),SSE.Dual(strData=str(row[3]))]) for row in df]
response_rows = [SSE.Row(duals=duals) for duals in response_rows] 
 
 
Any hints in how this could be improved?
Thanks in advance
Labels (2)
1 Reply
jcbsorensen
Contributor II
Contributor II

Have you tried to play around with parallel computing libraries that are build with pandas in mind? - such as Modin , Dask or Swifter .  If done right under specific circumstances, these libraries can be complete game-changers.