Skip to main content
Announcements
SYSTEM MAINTENANCE: Thurs., Sept. 19, 1 AM ET, Platform will be unavailable for approx. 60 minutes.
cancel
Showing results for 
Search instead for 
Did you mean: 
Clever_Anjos
Employee
Employee

Qlik Request <-> Pandas Dataframe

I've been using pandas as base for python SSE's

I've checked which part of the code takes longer times, so I might improve performance.

I came to case processing 510K rows x 4 Columns (https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data#GlobalLandTempera...) and processed the files attributing the mean value for a country when a value as null (fillna() method from dataframe)

 

When I measured more than 70% of time is consumed converting from request values to Pandas dataframe back and forth

I'm using this to convert from request to dataframe

df = pd.DataFrame([(row.duals[0].strData, row.duals[1].strData,row.duals[2].strData,row.duals[3].strData) \
                                        for request_rows in request \
                                        for row in request_rows.rows], \
                                       columns=['a_date','b_country','c_value','d_value'])
 
and this to convert back
df = df.values.tolist()
response_rows = [iter([SSE.Dual(strData=row[0]),SSE.Dual(strData=row[1]),SSE.Dual(strData=str(row[2])),SSE.Dual(strData=str(row[3]))]) for row in df]
response_rows = [SSE.Row(duals=duals) for duals in response_rows] 
 
 
Any hints in how this could be improved?
Thanks in advance
Labels (4)
1 Reply
jcbsorensen
Contributor II
Contributor II

Have you tried to play around with parallel computing libraries that are build with pandas in mind? - such as Modin , Dask or Swifter .  If done right under specific circumstances, these libraries can be complete game-changers.