Duplicate data means the same values repeating for an observation point.
This is damaging to our analysis because it can either deflate/inflate our number (e.g. we count more customers than there actually are, or the average changes because some values are more often represented)
We can use the following code to remove duplicates from a python data frame
import pandas as pd
df = df.drop_duplicates()
df = df.drop_duplicates(subset = “column”)
df = df.drop_duplicates(subset = [“column”, “column2”])