Labeling rules
The labeling rules from our small-volume income dataset are as follows:
- Age rule: Based on exploratory bi-variate analysis of age and income, we come up with a heuristics rule that if the age range is between 28 and 58, then income is greater than $50K; else, it is less than $50K.
- Education rule: Based on a bi-variate analysis of education and income, we come up with a heuristic rule that if education is bachelor’s or master’s, then income is > $50K; otherwise, income is < $50K.
- Working hours rule: Based on a bi-variate analysis of working hours and income, we come up with the heuristic rule that if working hours are greater than 40, then income is greater than $50K; otherwise, it is less than $50K
- Work class rule: Based on a bi-variate analysis of work class and income, we come up with the heuristic rule that if the work class is Self-emp-inc or Federal-gov, then income is greater than $50K; otherwise, it is less than $50K.
Constants
Let us define various constants that are used in our labeling functions as follows.
- Each labeling function returns either one of the following labels as output:
- income_high indicates income > $50K
- income_low indicates income < $50K
- Let us assign the numerical values to the output labels as follows:
- income_high = 1
- income_low = 0
- Abstain = -1
Abstain indicates that income does not fall within any range for that observation data point:
Labeling functions
Now, let’s create labeling functions using these labeling rules and constants. We’ll discuss each function one by one as follows.
Age rule function
The age rule function is used to check whether the age of the person is greater than 28; if so, the income is greater than $50K.
We have defined our function with the help of the @labeling_function() decorator. This decorator, when applied to a Python function, returns a label. Let us apply this decorator to the age function to return the label based on age:
@labeling_function()
def age(record):
if record[‘age’] < 28 and record[‘age’] > 58:
return income_low
elif record[‘age’] >28 and record[‘age’]< 58:
return income_high
else:
return ABSTAIN
Later, we will apply this age rule labeling function to the unlabeled income dataset using the Pandas LF applier, which returns a label accordingly for each observation in the dataset.
Education rule function
The education rule function is used to check whether the education of the person is bachelor’s or master’s; if so, the income is greater than $50K.
We have defined our labeling function for education with the help of the labeling function decorator as follows:
@labeling_function()
def education(record):
if record[‘education’] == “Bachelors” or record[‘education’] == “Masters”:
return income_high
else:
return income_low
Later, we are going to apply this labeling function to our unlabeled income dataset using the Pandas LF applier, which returns the label accordingly for each observation in the dataset.