cancel
Showing results for 
Search instead for 
Did you mean: 
Martin_Chan
Viva Expert
Viva Expert

Non-knowledge workers can be identified by checking whether their average person-level collaboration hours are below a certain threshold. The default threshold is 5 collaboration hours, as adopted here in the wpa R library (https://microsoft.github.io/wpa/reference/identify_nkw.html). 

The below code presents an example function of how this can be done in Python, with an option to either return a diagnostic message or a set of `PersonId`s who are identified as knowledge workers:

 

import pandas as pd

# path to person query
sq_data = pd.read_csv('../data/demo spq.csv')

# function for identifying non-knowledge workers
# data: pandas dataframe
# metric: string containing name of metric
# threshold: numeric value specifying threshold value
# return_value: 'text' or 'kw_id' to control what outputs to return
def identify_nkw(data, metric, threshold, return_value):
    output = (data.groupby(by = ['PersonId'])
          .agg({metric:'mean'})
          )
    nkw_tb = output[output[metric] < threshold]
    
    if (return_value == 'text'):
        # Print diagnostic message
        print(nkw_tb.shape[0], 'non-knowledge workers identified with an average collaboration hours below', threshold, '.')
    elif (return_value == 'kw_id'):
        # Knowledge-worker ID
        kw_tb = output[output[metric] >= threshold]
        kw_tb.index.name = 'PersonId'
        kw_tb.reset_index(inplace=True)
        kw_tb = kw_tb[['PersonId']]
        return kw_tb

    else:
        print('invalid input to `return_value`')

# Run functions with different return iterations
identify_nkw(data = sq_data, metric = 'Collaboration_hours', threshold = 15, return_value = 'text')
identify_nkw(data = sq_data, metric = 'Collaboration_hours', threshold = 15, return_value = 'kw_id')