Finding Unique Elements in Two Series

You’ll be learning how to find unique values or elements from two series such as series1 and series2 in this thread. There are several different ways through which this can be easily achieved, and if you want to learn more about what a Series is and how you can create one, have a look at Building Pandas series with several datatypes thread and if you wanna learn how to filter values from one series that are absent in another series, look at the thread of Filtering out values from a series.

However, these are the techniques for finding unique elements between two series:

1. Using set "difference()" method both ways:

  • The difference() method in Python is used to get the set difference between two sets. It returns a set containing elements present in the first set but not in the second set.
  • For this technique to work, we first have to convert our series into sets using set() constructor.

In this technique, the difference() method is applied first on set1 with regard to set2 to get unique items present in set1 and the same is done for set2 with regard to set1 to get unique items from set2.

2. Using set "symmetric_difference()" method:

  • The symmetric_difference() method is a set operation available in Python that returns a new set containing all the elements that are unique to each set i.e., elements that are in either of the sets but not in both.
  • Since symmetric_difference() is a set operation, we’ve used the set() constructor to convert them into sets and apply the operation.

3. Using boolean indexing with "isin()" method:

  • The isin() method is used to check whether each element in a Pandas DataFrame or Series is contained in a sequence of values which in our case is another series.
  • It returns a boolean mask (True/False) indicating which values are in the sequence of values passed to the method.

In the example above, the method is applied separately for both series to get unique values from both, but a key point is that the mask is negated (using ~) so that values that match between the series get a False and the unique ones get a True helping us to filter the True values only.

4. Using "pd.concat()" with "drop_duplicates" argument:

  • The pd.concat() function is used to concatenate or join two or more Pandas objects along a particular axis (row or column) into a single Pandas object. In this example, we will join 2 series objects.
  • The drop_duplicates(keep = False) method will drop all rows that are duplicates, even the first occurrence of them. It will return a new DataFrame with only the unique rows i.e., it’ll only have unique values.
1 Like