Question about pandas.DataFrame.describe

sabbyiqbal · January 19, 2020, 8:40pm

I am trying to get back into Python so I can do basic exploratory data analysis before building dashboards in JS/D3. But I guess I’ve forgotten more than I realized.

I am following along with this blog to start getting my feet wet: https://medium.com/datadriveninvestor/introduction-to-exploratory-data-analysis-682eb64063ff

The results I get from df.describe() is different from the authors and I can’t figure out why.

The author gets STD, percentiles, mean, min, max. As you can see, I get other results.
(Apologies for the very long file; just scroll directly to end and scroll up. I didn’t realize that github would show all results unlike Jupyter which truncates it by default): https://github.com/SabahatPK/TestDataFiles/blob/master/EDA%20on%20All_Data.ipynb

I checked the API reference page: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html

But that did not shed any light.

I’m using Python3.7…could it be a version problem? I did look up changes to Python but without knowing the author’s version, this is just a shot in the dark. And even if I knew it, these change logs are not the easiest to read.

Help! And thanks!

sabbyiqbal · January 19, 2020, 8:57pm

Never mind. It’s because the data was not defined as Number or Integer.