Keeping Frequent Values and Replacing Others in a Series

In this thread, you’ll get familiar with a particular task of keeping only the top 2 frequent values as it is in a series and replacing all other values as ‘Other’ or you can also replace them with something else following the same methods which will be discussed in this thread. This task is broken down into 2 steps which are:

  1. Calculate the frequency counts of the elements in the series.
  2. Replace everything in the series other than the top 2 frequent values.

I’ll be using the same method of finding frequency counts of elements throughout each method, but if you want to learn more methods of achieving the same thing, you can go through this thread of Frequency counts of unique items in a series.

1. Using "isin()" method:

  • The isin() is a method that checks if values in a Series or DataFrame are contained in a sequence of values.
  • The method is used below with ~ to replace all those values not in the list of top 2 frequent values.

2. Using "transform()" method:

  • The transform() method in the Pandas library is used to apply a function to groups of data in a DataFrame, Series, or GroupBy object, and returns a transformed object with the same shape as the original.
  • In the example below, a simple lambda function is used in the transform() method and it is a one-liner function to transform all values which are not in the list of top 2 frequent values.

3. Using "np.where()" method:

  • np.where() is a function of NumPy that allows you to select elements from an array based on a condition, and return a new array with the selected elements replaced by a specified value.
  • In the example below, np.where() checks the condition using isin(), if the elements are found in the top_two list, they are returned as it is, and other values are returned as Other.
  • This function returns an array, so we need to convert it back to a series using pd.Series().

4. Using "apply()" method:

  • The apply() method is used to apply a function along an axis of a DataFrame or a Series. The function can be any function, it can be a built-in function, a lambda function, or a user-defined function.
  • In the example below, a simple user-defined function is used and applied on the series using apply().