How To: Automating BigQuery jobs

Chip Oglesby bio photo By Chip Oglesby

One of my favorite tools that I’ve been using for the past few months is the Google BigQuery command line tool. The bq tool lets you interact with Google BigQuery through the command line interface. It’s a part of the Google Cloud SDK.

So, why would you want to use this tool?

Great question. With this tool, you can run queries like bq query "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6" and it will return a result programmatically for you.

At my job though, we use BigQuery for our data warehouse. We also have specific tables that need to be built or rebuilt on a daily basis using the same queries. With that in mind, I was able to set up a cronjob that runs on a given interval that will automatically build datasets for me.

Example

Here’s an example that will walk through.

#!/bin/bash

# Variables
token='token'
url='https://github.com'
table='table'

sql=$(curl -H "Content-Type: application/json" -H "Authorization: token $token" $url | jq '.content' | sed -e 's/\\n/ /g' -re 's/"//g' | base64 --decode)

bq query --q --destination_table '$table --allow_large_results --replace $sql

In this script, I’m remotely calling a large sql file that’s stored on Github. The bq query will then run the $sql variable and save the output to a given table. The flags --allow_large_results lets you return large query results and --replace will replace all of the data in the destination table.

If you’re interested in how the sql variable is set up, you can check out my post from yesterday.