How to get summary statistics of a Pandas Dataframe in Python?

mubashir_rizvi · March 24, 2023, 4:12pm

In Pandas, is there a method to get summary statistics for all dataframe columns at once, or should I do it individually for each column? Having an overall data insight through collective statistics before detailed analysis is often beneficial. You can use the following sample dataframe that I have, to share insights.

sabih · April 20, 2023, 1:25pm

Yes, you can get summary statistics of the Pandas dataframe by using the describe() method. Here’s how it’s done:

This method returns a new dataframe containing statistics such as count, mean, standard deviation, minimum, and maximum values for each column.
By default, df.describe() only includes columns with numeric data types, but it can also be used to include non-numeric columns by using the include parameter.
We have used this method to include all columns, and there would be many NaN values for non-numerical columns because statistics like mean, standard deviation, and percentiles only work for numerical columns.

nimrah · April 25, 2023, 3:59pm

@mubashir_rizvi you can check this method. The df.info() is used to print a concise summary of a DataFrame, including:

→ the number of non-null values in each column,
→ the data type of each column, and
→ the memory usage of the DataFrame.

This method can be used to quickly assess the shape and structure of a DataFrame, as well as identify potential issues such as missing values or incorrect data types.

safa · April 26, 2023, 3:45pm

Hey @mubashir_rizvi, summary statistics is what we all crave, and you can also get it by using the following method:

If you cannot get the code flow, please let me know. I would love to explain it.