Labeling functions – Labeling Data for Classification

Labeling rules

The labeling rules from our small-volume income dataset are as follows:

  • Age rule: Based on exploratory bi-variate analysis of age and income, we come up with a heuristics rule that if the age range is between 28 and 58, then income is greater than $50K; else, it is less than $50K.
  • Education rule: Based on a bi-variate analysis of education and income, we come up with a heuristic rule that if education is bachelor’s or master’s, then income is > $50K; otherwise, income is < $50K.
  • Working hours rule: Based on a bi-variate analysis of working hours and income, we come up with the heuristic rule that if working hours are greater than 40, then income is greater than $50K; otherwise, it is less than $50K
  • Work class rule: Based on a bi-variate analysis of work class and income, we come up with the heuristic rule that if the work class is Self-emp-inc or Federal-gov, then income is greater than $50K; otherwise, it is less than $50K.

Constants

Let us define various constants that are used in our labeling functions as follows.

  • Each labeling function returns either one of the following labels as output:
    • income_high indicates income > $50K
    • income_low indicates income < $50K
  • Let us assign the numerical values to the output labels as follows:
    • income_high = 1
    • income_low = 0
    • Abstain = -1

Abstain indicates that income does not fall within any range for that observation data point:

Labeling functions

Now, let’s create labeling functions using these labeling rules and constants. We’ll discuss each function one by one as follows.

Age rule function

The age rule function is used to check whether the age of the person is greater than 28; if so, the income is greater than $50K.

We have defined our function with the help of the @labeling_function() decorator. This decorator, when applied to a Python function, returns a label. Let us apply this decorator to the age function to return the label based on age:
@labeling_function()
def age(record):
    if record[‘age’] < 28 and record[‘age’] > 58:
    return income_low
    elif record[‘age’] >28 and record[‘age’]< 58:
    return income_high
    else:
    return ABSTAIN

Later, we will apply this age rule labeling function to the unlabeled income dataset using the Pandas LF applier, which returns a label accordingly for each observation in the dataset.

Education rule function

The education rule function is used to check whether the education of the person is bachelor’s or master’s; if so, the income is greater than $50K.

We have defined our labeling function for education with the help of the labeling function decorator as follows:
@labeling_function()
def education(record):
    if record[‘education’] == “Bachelors” or record[‘education’] == “Masters”:
    return income_high
    else:
    return income_low

Later, we are going to apply this labeling function to our unlabeled income dataset using the Pandas LF applier, which returns the label accordingly for each observation in the dataset.

Leave a Reply

Your email address will not be published. Required fields are marked *