How to find the index positions of items in a Pandas Series that are also present in another Series using Python?

mubashir_rizvi · March 24, 2023, 5:50pm

I am working on a problem of finding out the positions of items in one series that also appear in another series, I want to find the common elements in both series and I want the indexes of those common items in the other series. For example, if I have the following two series:

ser1 = pd.Series(['a', 'b', 'c', 'd', 'e'])
ser2 = pd.Series(['b', 'd', 'f', 'a', 'g'])

And I want to find the position of the common items in ser2 which are a, b, and d, then I need an output that tells me which element is present at which index. A desired output would be:

a       3
b       0
d       1

Please provide me with methods that can help me accomplish this task, and it is not necessary that the output is in the form of a series as I have shown, it can also be in the form of a dictionary, lists, etc.

sabih · April 20, 2023, 1:12pm

Hi @mubashir_rizvi, You can refer to this example:

In this method, we have used two NumPy functions to get the positions of items in ser1 that are also in ser2.
The in1d() function returns a boolean array that indicates whether each value in ser1 is also present in ser2. The nonzero() function is then used to return the index positions where the boolean array is True.
Lastly, the index positions and the values are converted to a dictionary using the to_dict() method.

Note:

The advantage of this code is that it uses NumPy functions, which are optimized for numerical computations and can be faster than Pandas’ methods for certain tasks.
Compared to the previous methods, this code is more concise and can be faster for larger datasets. However, it requires the use of NumPy functions, which may not be as familiar to some users as Pandas’ methods

nimrah · April 25, 2023, 4:53pm

Hi @mubashir_rizvi , you can use the boolean indexing method. The code creates a boolean mask to identify values in ser1 that are also in ser2. The mask is used with where() to keep True values and replace False with NaN . dropna() removes NaN values and the remaining index positions are used to create a dictionary with corresponding values from ser2. The dictionary keys are index positions and values are items from ser1 found in ser2 .

Note:

The advantage of this code is that it uses built-in Pandas methods, which are optimized for handling data in tabular form.
The main difference between this method and the others is that this technique uses a boolean mask to filter out the non-matching values and then finds the index positions of the matching values.

safa · April 26, 2023, 5:13pm

Hey @mubashir_rizvi, the below solution will help you find the index.

Note:

The advantage of this solution is that it is simple and easy to understand, and it does not require defining lambda functions.
It is also more efficient, as it avoids calling the apply() method for each element of ser1. Still, this method won’t be efficient for large datasets as it would have to loop through each value.