Non-knowledge workers can be identified by checking whether their average person-level collaboration hours are below a certain threshold. The default threshold is 5 collaboration hours, as adopted here in the wpa R library (https://microsoft.github.io/wpa/reference/identify_nkw.html).
The below code presents an example function of how this can be done in Python, with an option to either return a diagnostic message or a set of `PersonId`s who are identified as knowledge workers:
import pandas as pd
# path to person query
sq_data = pd.read_csv('../data/demo spq.csv')
# function for identifying non-knowledge workers
# data: pandas dataframe
# metric: string containing name of metric
# threshold: numeric value specifying threshold value
# return_value: 'text' or 'kw_id' to control what outputs to return
def identify_nkw(data, metric, threshold, return_value):
output = (data.groupby(by = ['PersonId'])
.agg({metric:'mean'})
)
nkw_tb = output[output[metric] < threshold]
if (return_value == 'text'):
# Print diagnostic message
print(nkw_tb.shape[0], 'non-knowledge workers identified with an average collaboration hours below', threshold, '.')
elif (return_value == 'kw_id'):
# Knowledge-worker ID
kw_tb = output[output[metric] >= threshold]
kw_tb.index.name = 'PersonId'
kw_tb.reset_index(inplace=True)
kw_tb = kw_tb[['PersonId']]
return kw_tb
else:
print('invalid input to `return_value`')
# Run functions with different return iterations
identify_nkw(data = sq_data, metric = 'Collaboration_hours', threshold = 15, return_value = 'text')
identify_nkw(data = sq_data, metric = 'Collaboration_hours', threshold = 15, return_value = 'kw_id')