Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
lailarhc
Creator
Creator

Count active employees in a given period of time (trying to reproduce a pandas function in qlik sense)

Hello.

I'm working on a project where I need to:

  1. Count how many courses each employee has taken in a given time.
  2. Calculate the average number of courses that have been taken by employees in a given period of time
  3. Count how many employees were actively working during a given year, regardless of taking courses or not
  4. Calculate the average number of employees that have taken courses in a given period of time

 

I've already managed to do items 1 and 2 of the list.

Here's how I accomplished number 1:

For the number of courses taken by employees, I'm counting all distinct combinations of employee's names, name of the course, month and year the course was taken:

Count(distinct {$ <[Nome do servidor] *= {"*"}>} [Nome Evento] & '==' & [Nome do servidor] & '==' & [Data.autoCalendar.Month] & '==' & [Data.autoCalendar.Year])

 

For number 2, here's what I did:

I divided the above expression by the count of distinct employees:

(Count(distinct {$ <[Nome do servidor] *= {"*"}>} [Nome Evento] & '==' & [Nome do servidor] & '==' & [Data.autoCalendar.Month])) / Count({$} distinct [Nome do servidor])

 

Now I need to calculate number 3.
Here's how I managed to do it in pandas:

 

 

# Dataframe contaning employee's name (Nome do Servidor), year course was taken (Ano), month course was taken (Mês), course's name (Curso), duration in hours of course (Carga Horária), course category (Área de Concentração)
df_complete = pd.read_excel("complete_table.xlsx", na_values = '-')
df_complete.head(5)

 

 

Example:

lailarhc_0-1638192246035.png

 

 

 

# Dataframe containing the employee's name (Nome do Servidor) and the date they began working (Data Exercício)
df_data_exercicio = pd.read_excel("tabela_servidores_data_exercicio.xlsx", na_values= "-")
df_data_exercicio.head(5)

 

 

Example:

lailarhc_2-1638192522332.png

 

 

 

# Dataframe contaning just the year that courses were taken:
df_ano = pd.DataFrame(df_complete["Ano"].unique())
df_ano.sort_values(0)

 

 

Example:

lailarhc_1-1638192328734.png

 

 

#Function that returns number of active employees in a year
def calc_servidores_ativos(x, df):
    qtd_servidores = df[df_data_exercicio["Data Exercício"] <= x]
    return qtd_servidores["Nome do servidor"].nunique()

# Column that represents the number of active employees in a year:
df_ano["Qtd de Servidores Ativos"] = df_ano[0].apply(lambda x: calc_servidores_ativos(x, df_data_exercicio))
df_ano.sort_values(0)

 

 

Example:

lailarhc_3-1638192636802.png

How can I achieve the same result in qlik sense? I need the numbers in the column "Qtd de Servidores Ativos" in the last image.

Can someone help me achieve that?

Thank you very much in advance.

 

P.s.:

Let me explain the fields being used:

Field that represents employee's names: [Nome do servidor]

Field that represents the name of  the courses employees have taken: [Nome Evento] or just Curso

Field that represents the year the course has taken place: [Data.autoCalendar.Year] or just Ano

Field that represents the year the employee started working: [Data de Entrada.autoCalendar.Year] or just Data Exercício

0 Replies