This query is somewhat similar to my previous query which was of Efficiently Filter Values from a Pandas Series. However, in this query, I want to find out or filter out unique values present in two series
objects. Please provide me with different methods and techniques for doing this.
You can use the set difference()
method which gets the set difference between two sets. It returns a set containing elements present in the first set but not in the second set. For this technique to work, we first have to convert our series
objects into set
objects using the set()
constructor.
In this technique, the difference()
method is applied first on set1
with regard to set2
to get unique items present in set1
, and the same is done for set2
with regard to set1
to get unique items from set2
.
Hey @mubashir_rizvi ,from your query what I get is you want to find out unique values from the dataset and filter them. For this purpose, you can use the technique of boolean indexing which is provided by the isin()
method as it checks whether each element in a Pandas dataframe
or series
is contained in a sequence of values which in your case is another series
. It returns a boolean mask (True/False) indicating which values are in the sequence of values passed to the method.
In the code, the method is applied separately for both series
to get unique values from both, but a key point is that the mask is negated (using ~
) so that values that match between the series get a False
and the unique ones get a True
helping us to filter the True
values (unique values) only.
You can use the set difference()
method, which gets the set difference between two sets. It returns a set containing elements present in the first set but not in the second set. For this technique to work, we first have to convert our series
objects into set
objects using the set()
constructor.
In this technique, the difference()
method is applied first to set1
with regard to set2
to get unique items present in set1
, and the same is done for set2
with regard to set1
to get unique items from set2
.
Hey @mubashir_rizvi inorder to accomplish your objective, you can utilize the Pandas functions pd.concat()
and drop_duplicates(keep=False)
.
-
The
pd.concat()
concatenates or joins two or more Pandas objects along a particular axis (row or column) into a single Pandas object. In this example, we will join 2series
objects. -
The
drop_duplicates(keep = False)
method will drop all rows that are duplicates, even the first occurrence of them. It will return a newdataframe
with only the unique rows i.e., it’ll only have unique values.