Data Wrangling Using Pandas

As part of a data wrangling exercise, this is what I had to do recently:

  1. Crack open a 2.7GB file.  File has rows and columns.
  2. Filter this file to extract rows which were satisfying some conditions.
    • Conditions were imposed on couple of columns with specific values
  3. Write out the result to a new file.

Tips / Insights:

  • Approach 1 : The file can be read in line by line, and the filters applied etc.
    • Below I have shown the code in both python and perl
  • Approach 2 : With pandas its a 2 line code
    • Go pandas!!

Code: