As part of a data wrangling exercise, this is what I had to do recently:
- Crack open a 2.7GB file. File has rows and columns.
- Filter this file to extract rows which were satisfying some conditions.
- Conditions were imposed on couple of columns with specific values
- Write out the result to a new file.
Tips / Insights:
- Approach 1 : The file can be read in line by line, and the filters applied etc.
- Below I have shown the code in both python and perl
- Approach 2 : With pandas its a 2 line code
- Go pandas!!
Code: