Reading CSV files
A Comma Separated Value (CSV) file is a type of plain text file that uses specific structuring to arrange tabular data. Think spreadsheets.
Generally, CSV files use a comma (,) to separate each data value (hence its name), but other delimiters can be used: tab (\t), colon (:), and semi-colon (;).
The first row usually contains the name of the columns. Think of headers in tables. This is usually followed by one record per line.
Let’s say I have a CSV file called students.csv
with the content below (just copy and paste the text below into an editor and save it as students.csv
).
name,faculty,department
Alice Smith,Science,Chemistry
Ben Williams,Eng,EEE
Bob Jones,Science,Physics
Andrew Taylor,Eng,Computing
Let’s use the csv
module to read this file.
1
2
3
4
5
6
7
8
9
import csv
with open("students.csv") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
column_data = next(csv_reader)
print (f"Column names are {', '.join(column_data)}")
for row in csv_reader:
print (f"Student {row[0]} is from faculty of {row[1]}, {row[2]} dept.")
The expected output is:
Column names are name, faculty, department
Student Alice Smith is from faculty of Science, Chemistry dept.
Student Ben Williams is from faculty of Eng, EEE dept.
Student Bob Jones is from faculty of Science, Physics dept.
Student Andrew Taylor is from faculty of Eng, Computing dept.
Going back to the code:
with open("students.csv") as csv_file:
open the CSV file as a text file, returning a file objectcsv_reader = csv.reader(csv_file, delimiter=",")
construct acsv.reader
object, by passing the file object to its constructor. Also specifying that we want the separator to be a comma.column_data = next(csv_reader)
get the column headers on the first line using thenext()
functionfor row in csv_reader:
each remaining row is a list ofstr
items containing the data found by removing the delimiter
Reading CSV files into a dictionary
You can also read in the CSV files into a dictionary. You can then access elements using the column names as keys (first row).
1
2
3
4
5
6
7
8
import csv
with open("students.csv") as csv_file:
csv_reader = csv.DictReader(csv_file)
for row in csv_reader:
print(f"Student {row['name']} is from faculty of {row['faculty']}, "
f"{row['department']} dept. ")
If the CSV file does not contain the column names, you will need to specify your own keys. You can do this by setting the fieldnames
parameter to a list containing the keys.
fieldnames = ['name', 'faculty', 'department']
csv_reader = csv.DictReader(csv_file, fieldnames=fieldnames)