Qlik Request <-> Pandas Dataframe

Clever_Anjos · ‎2019-11-04

I've been using pandas as base for python SSE's

I've checked which part of the code takes longer times, so I might improve performance.

I came to case processing 510K rows x 4 Columns (https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data#GlobalLandTempera...) and processed the files attributing the mean value for a country when a value as null (fillna() method from dataframe)

When I measured more than 70% of time is consumed converting from request values to Pandas dataframe back and forth

I'm using this to convert from request to dataframe

df = pd.DataFrame([(row.duals[0].strData, row.duals[1].strData,row.duals[2].strData,row.duals[3].strData) \

for request_rows in request \

for row in request_rows.rows], \

columns=['a_date','b_country','c_value','d_value'])

and this to convert back

df = df.values.tolist()

response_rows = [iter([SSE.Dual(strData=row[0]),SSE.Dual(strData=row[1]),SSE.Dual(strData=str(row[2])),SSE.Dual(strData=str(row[3]))]) for row in df]

response_rows = [SSE.Row(duals=duals) for duals in response_rows]

Any hints in how this could be improved?

Thanks in advance

jcbsorensen · ‎2019-12-18

Have you tried to play around with parallel computing libraries that are build with pandas in mind? - such as Modin , Dask or Swifter . If done right under specific circumstances, these libraries can be complete game-changers.

Advanced Analytics Integration

dataframe

Pandas

Python