This is an archived version of the course. Please see the latest version of the course.

Handling JSON files

You will likely see many Machine Learning datasets nowadays representing their data in JSON format. One example would be the COCO dataset which stores its annotations in JSON.

JSON is also used quite often for communication between a web server and your web app or browser. That tweet or Facebook Timeline post that you just received? Most likely to be sent using JSON in the background (I have not verified this though!)

If you look at an example JSON file (below), it may look awfully familiar. What does it remind you of? A dict perhaps?

{
    "name": "Smith", 
    "interests": ["maths", "programming"], 
    "age": 25, 
    "courses": [ 
        {
            "name": "Python", 
            "term": 1
        }, 
        {
            "name": "Soft Eng", 
            "term": 2
        } 
    ] 
}

The root object is generally either a list or a dictionary.

JSON serialisation

We can easily save our Python data structure into a JSON string (serialisation).

To write your data into a JSON file, use json.dump(). The following code serialises data to JSON and saves it in data.json.

import json

data = { "course": { "name": "Introduction to Machine Learning", "term": 2 } }

with open("data.json", "w") as f: 
    json.dump(data, f)

To write your data to a string (and do something else with it later), use json.dumps().

json_string = json.dumps(data)
print(json_string)  ## {"course": {"name": "Introduction to Machine Learning", "term": 2}}

JSON deserialisation

We can also convert a JSON string representation into a Python data structure (deserialisation).

Similar to serialisation, we have json.load(fileobject) and json.loads(json_string) for this.

# load JSON from file
with open("data.json", "r") as f: 
    data = json.load(f)

print(data)

# this is fine too, since we are not writing to the file
data = json.load(open("data.json", "r"))

# load JSON from a string
# assuming we still have json_string from earlier
data = json.loads(json_string)