This is an archived version of the course. Please see the latest version of the course.

Reading CSV files

A Comma Separated Value (CSV) file is a type of plain text file that uses specific structuring to arrange tabular data. Think spreadsheets.

Generally, CSV files use a comma (,) to separate each data value (hence its name), but other delimiters can be used: tab (\t), colon (:), and semi-colon (;).

The first row usually contains the name of the columns. Think of headers in tables. This is usually followed by one record per line.

Let’s say I have a CSV file called students.csv with the content below (just copy and paste the text below into an editor and save it as students.csv).

name,faculty,department
Alice Smith,Science,Chemistry
Ben Williams,Eng,EEE
Bob Jones,Science,Physics
Andrew Taylor,Eng,Computing

Let’s use the csv module to read this file.

1
2
3
4
5
6
7
8
9
import csv

with open("students.csv") as csv_file: 
    csv_reader = csv.reader(csv_file, delimiter=",") 
    column_data = next(csv_reader) 
    print (f"Column names are {', '.join(column_data)}")

    for row in csv_reader: 
        print (f"Student {row[0]} is from faculty of {row[1]}, {row[2]} dept.")

The expected output is:

Column names are name, faculty, department
Student Alice Smith is from faculty of Science, Chemistry dept. 
Student Ben Williams is from faculty of Eng, EEE dept. 
Student Bob Jones is from faculty of Science, Physics dept. 
Student Andrew Taylor is from faculty of Eng, Computing dept.

Going back to the code:

  • with open("students.csv") as csv_file: open the CSV file as a text file, returning a file object
  • csv_reader = csv.reader(csv_file, delimiter=",") construct a csv.reader object, by passing the file object to its constructor. Also specifying that we want the separator to be a comma.
  • column_data = next(csv_reader) get the column headers on the first line using the next() function
  • for row in csv_reader: each remaining row is a list of str items containing the data found by removing the delimiter

Reading CSV files into a dictionary

You can also read in the CSV files into a dictionary. You can then access elements using the column names as keys (first row).

1
2
3
4
5
6
7
8
import csv

with open("students.csv") as csv_file: 
    csv_reader = csv.DictReader(csv_file)

    for row in csv_reader: 
        print(f"Student {row['name']} is from faculty of {row['faculty']}, "
              f"{row['department']} dept. ")

If the CSV file does not contain the column names, you will need to specify your own keys. You can do this by setting the fieldnames parameter to a list containing the keys.

fieldnames = ['name', 'faculty', 'department'] 
csv_reader = csv.DictReader(csv_file, fieldnames=fieldnames)