Reading files in Python.

Lets say we have a file coffee.csv:

 

$ cat -n ../_resources/coffee.csv
1 “Coffee”,”Water”,”Milk”,”Icecream”
2 “Espresso”,”No”,”No”,”No”
3 “Long Black”,”Yes”,”No”,”No”
4 “Flat White”,”No”,”Yes”,”No”
5 “Cappuccino”,”No”,“Yes,Frothy”,”No”
6 “Affogato”,”No”,”No”,”Yes”
7
8
9 abcd
$ wc -l ../_resources/coffee.csv
8 ../_resources/coffee.csv

Note: how the cat -n is showing 9 lines, but wc -l is showing 8. This is because in this file the last line does not end in a newline.  wc -l is only counting the number of newlines

Note: how the file has two empty lines (not really empty its the special ‘\n’ character). Also one of the columns has a comma. So we need to be careful.

So lets explore ways to read in this file:

[1]. f.read() :

The entire contents of the file will be read

[2] f.readline() : 

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by ‘\n’, a string containing only a single newline.

[3] f.readlines()

f.readlines() reads everything in the text file and has them in a list of lines.

[4] Looping over the file object.

[5]  Using csv library.

This is especially important since CSV files can have unexpected commas (e.g. “Yes,Frothy” above) .

In this case we first create a csv reader using the file handle. Then we iterate over the csv reader. In each iteration we get a list.

#CSV Code

import csv
with open(filename, ‘rb’) as f:
header = f.readline().strip()
print header

line1 = f.next().strip()
print “line1 : {0}”.format(line1)

# creating the csv reader
csvreader = csv.reader(f)

line2 = csvreader.next()
print “line2 : {0}”.format(line2)

# iterate over the csvreader
for line in csvreader:
print line

Output: 

“Coffee”,”Water”,”Milk”,”Icecream”
line1 : “Espresso”,”No”,”No”,”No”
line2 : [‘Long Black’, ‘Yes’, ‘No’, ‘No’]
[‘Flat White’, ‘No’, ‘Yes’, ‘No’]
[‘Cappuccino’, ‘No’, ‘Yes,Frothy’, ‘No’]
[‘Affogato’, ‘No’, ‘No’, ‘Yes’]
[]
[]
[‘abcd’]

 

[6] Pandas.

(See how pandas throws out the empty lines)

import pandas as pd
df = pd.read_csv(filename)

df
Out[114]:
Coffee Water Milk Icecream
0 Espresso No No No
1 Long Black Yes No No
2 Flat White No Yes No
3 Cappuccino No Yes,Frothy No
4 Affogato No No Yes
5 abcd NaN NaN NaN

 

Code:

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s