Analyze Your Google Location History: Getting and Loading Data

Before we begin, I want to ensure you’re able to follow along. For you to be able to run this successfully and copy my code, you’ll need the following items:

A Google account that has the timeline feature set up in Google Maps
An export of your location history in JSON format available through Google Takeout.
A Mac or Linux machine with Google Cloud SDK installed.
At least the free tier of Google Cloud Platform, set up and ready to go.

Make sure all of this is ready before you begin.

I’ve been using Google’s Timeline feature to track my locations since 2011 when it was called Google Latitude. When I first signed up, I thought it was an interesting way to share your location with family and friends, but I pretty much forgot about it until recently.

I decided that it might be a good time to take the information that Google has been gathering for the past seven years and start a new project to see what I could learn about myself.

The first issue that I ran into is the size of the GeoJSON file that Google lets you export.

My exported file was around 400 MB once it was unzipped, and I don’t really want to load the entire file into R to work with, so I thought this would be a great use case for Google’s Cloud Platform.

I could easily store the file in Cloud Storage and then load the data into Google BigQuery’s data warehouse to analyze with R.

Writing SQL is similar to the dplyr package in R. I also get the added benefit of offloading all of the processing power to Google BigQuery which I can run for free since I have the free tier of GCP.

Today I will share the bash script that I’ve written to automate extracting, uploading and storing the information.

Let’s take a look at the script: Before you begin, be sure to create a folder for the project and download your takeout file here.

The script above is going to do a few things for us. It will:

Unzip the file that we exported from Google. Read More about the process here.
The script will use JQ to parse the JSON file and save it as a newline delimited file, which is BigQuery’s format for using JSON.
It will upload the file to Google Cloud Storage for long-term storage. This part uses the gsutil feature of the Google Cloud SDK to load the data into a storage bucket. The flag -mq makes the process quiet and let’s you upload multiple files in parallel.
Finally the script upload the file from cloud storage into BigQuery for analyzing. For BQ, I’m using two flags: --source_format=NEWLINE_DELIMITED_JSON and --autodetect. The autodetect feature will attempt to get the schema from the JSON file and set the field names for you, which I find super helpful, especially when you have a lot of key value pairs. The other flag just identifies which type of file you’re uploading. In this case it’s a newline delimited file which we created in step two.

This script is written for a Mac, but could easily be rewritten for other machines. It also uses the Google Cloud SDK, so I’m going to assume you already have that set up on your machine locally.

In the next post, we’ll dive into the data and take an initial look at things through an exploratory data analysis using R.

Check out part II of How to analyze your Google location history here.

Chip Oglesby

Data Engineer in Asheville, NC | Taking it one day at a time

Analyze Your Google Location History: Getting and Loading Data