Pandas Features.

The pandas documentation is quite hairy and long-winded. I am listing down the main topics from the pandas documentation.

Bold indicates I have tried out this feature e2e myself. I intend to to keep track of progress, as I try out the different features. Hence this list for bookkeeping.

  1. 10 Minutes to Pandas:
  1. Essential Basic Functionality.
  1. Indexing and Selecting data
  1. MultiIndex
  1. Working With Missing Data.
  1. Group By
  1. Merge, join, concatenate
  1. Reshaping And Pivot Tables
  1. Time Series / Date Functionality.
  1. Visualization.

Pandas. Views versus copy

In pandas, it can be tricky setting the values of a dataframe. One can hit the ‘SettingWithCopy’ warning message.

The crux of the problem is this:

  • “The reason for having the SettingWithCopy warning is this. Sometimes when you slice an array you will simply get a view back, which means you can set it no problem. However, even a single dtyped array can generate a copy if it is sliced in a particular way. A multi-dtyped DataFrame (meaning it has say float and object data), will almost always yield a copy. Whether a view is created is dependent on the memory layout of the array”

–> Note, Pandas throws this warning because of the ambiguity. It doesnt know if the operation was successful or not etc. In some cases it actually fails, but in other cases it works fine.  So Pandas just warns user that its operating on a copy.

  • Sometimes a SettingWithCopy warning will arise at times when there’s no obvious chained indexing going on. Theseare the bugs that SettingWithCopy is designed to catch! Pandas is probably trying to warn you that you’ve done this:
    def do_something(df):
       foo = df[['bar', 'baz']]  
       # Is foo a view? A copy? Nobody knows!
       # ... many lines here ...
       foo['quux'] = value
       # We don't know whether this will modify df or not!
       return foo
    

    Yikes!

  • In particular, the warning can be a misnomer if there is no chained indexing going on (as described above)

__main__:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

  • In case I know I  want to work with a copy,  I can create a copy explicitly (using .copy(). see below). This suppresses the warning.

 

Code:

 

Reference: