How to remove duplicate observations from a data frame in python?

Duplicate data means the same values repeating for an observation point.

This is damaging to our analysis because it can either deflate/inflate our number (e.g. we count more customers than there actually are, or the average changes because some values are more often represented)

We can use the following code to remove duplicates from a python data frame

import pandas as pd

Drop all duplicates in the DataFrame

df = df.drop_duplicates()

Drop all duplicates in a specific column of the DataFrame

df = df.drop_duplicates(subset = “column”)

Drop all duplicate pairs in DataFrame

df = df.drop_duplicates(subset = [“column”, “column2”])