Skip to main content




Discrete Categorical Variables in Predictions

Categorical variables have values that can be placed in distinct groups based on a characteristic. For example, Job Family is a categorical variable that can have values such as Specialist, Leader, and Supervisor. Our predictive models will account for all categorical variables including properties that can have many distinct values such as Job Name.

Before we run the categorical variables through the prediction algorithm, we must transform them into numeric values. Based on our research and testing, the best data transformation method was Proportion Encoding, where each property value is given a numeric value based on how frequent it appears in the data.

Each level of a hierarchical dimension (leveled or parent-child) will be used in the predictions. For example, each level of the Location Hierarchy will be included as a separate property:

  1. North America
  2. North America, USA
  3. North America, USA, California
  4. North America, USA, California, Bay Area
  • Was this article helpful?