How to get summary statistics of a Pandas Dataframe in Python?

In Pandas, is there a method to get summary statistics for all dataframe columns at once, or should I do it individually for each column? Having an overall data insight through collective statistics before detailed analysis is often beneficial. You can use the following sample dataframe that I have, to share insights.

1 Like

Yes, you can get summary statistics of the Pandas dataframe by using the describe() method. Here’s how it’s done:

  • This method returns a new dataframe containing statistics such as count, mean, standard deviation, minimum, and maximum values for each column.
  • By default, df.describe() only includes columns with numeric data types, but it can also be used to include non-numeric columns by using the include parameter.
  • We have used this method to include all columns, and there would be many NaN values for non-numerical columns because statistics like mean, standard deviation, and percentiles only work for numerical columns.

@mubashir_rizvi you can check this method. The df.info() is used to print a concise summary of a DataFrame, including:

→ the number of non-null values in each column,
→ the data type of each column, and
→ the memory usage of the DataFrame.

This method can be used to quickly assess the shape and structure of a DataFrame, as well as identify potential issues such as missing values or incorrect data types.

Hey @mubashir_rizvi, summary statistics is what we all crave, and you can also get it by using the following method:

If you cannot get the code flow, please let me know. I would love to explain it.