By the end of this tutorial, you will have created a dashboard that estimates the time it takes to get to the airport from different parts of a city.

If you are using your own machine, install the following packages and environment variables. For GOOGLE_KEY, please enable the Distance Matrix API.

pip install \
    jupyterlab-crosscompute \
    crosscompute-views-map \
    h3 \
    matplotlib \
    requests \
jupyter lab

Phase 0: Plan Dashboard Variables

The first step is to decide what variables you want to show in your dashboard. Choose your output variables carefully -- decide what data is appropriate to share with the outside world.

  1. Configure automation
  2. Define batches
  3. Add styles
  4. Add scripts

Create Automation

We have three input variables and two output variables. If you are starting from scratch, save this configuration in a file called automate.yml.

crosscompute: 0.9.2
name: See Airport Traffic
description: See how long it takes to get to the airport from different parts of a city
version: 0.0.1

    - id: districts_uri
      view: string
      path: variables.dictionary
    - id: destination_address
      view: string
      path: variables.dictionary
    - id: travel_mode
      view: string
      path: variables.dictionary

    - id: districts_map
      view: map-mapbox
      path: map.geojson
        style: mapbox://styles/mapbox/dark-v10
          - type: fill
              fill-color: ['interpolate', ['linear'], ['get', 't'], 0, 'blue', 60, 'red']
              fill-opacity: 0.8
    - id: time_histogram
      view: image
      path: histogram.png

  - folder: batches/{city_name | slug}-{destination_name | slug}-{travel_mode | slug}
    name: '{city_name} - {destination_name} - {travel_mode}'
    slug: '{city_name | slug}-{destination_name | slug}-{travel_mode | slug}'
      path: datasets/batches.csv

  - path: run.ipynb

  interval: 25 hours

    - path: style.css
    - id: automation
        design: output
    - id: output
        design: none

Define Batches

In automate.yml, there is a section that defines the batches for your automation. Batches are pre-defined runs.


  - folder: batches/{city_name | slug}-{destination_name | slug}-{travel_mode | slug}

    # Set batch name
    name: '{city_name} - {destination_name} - {travel_mode}'

    # Set batch uri
    slug: '{city_name | slug}-{destination_name | slug}-{travel_mode | slug}'

    # Configure batch variables from a file
      path: datasets/batches.csv

Create a folder called datasets. Then, create a file called batches.csv in the datasets folder. These batches run automatically.

NYC,"*&outSR=4326&f=pgeojson",JFK,"JFK Airport",transit

Add Styles

In automate.yml, styles are part of the display configuration.



    # Define styles using CSS
    - path: style.css


    # Set the automation page to show the output of the first batch
    - id: automation
        design: output

    # Remove default design for output page
    - id: output
        design: none

Create a file called style.css.

.districts_map {
  height: 50vh;
._image {
  max-width: 100%;

You can style your views using CSS for variable names or view names. Note that view names are prefixed by an underscore.

Add Scripts

Create a notebook and name it run.ipynb.

Click the CrossCompute logo on the right to open the CrossCompute sidebar, then click Launch to start the development server.

Plan Variables

Phase 1: Prepare Simplified Polygons

Now let's draw the polygons that we will color in our choropleth map.

  1. Prepare inputs.
  2. Download polygons.
  3. Simplify features.
  4. Save polygons.

Prepare Inputs

Create the following file structure.


Save the following data in variables.dictionary.

{"districts_uri": "*&outSR=4326&f=pgeojson", "destination_address": "JFK Airport", "travel_mode": "transit"}

Open run.ipynb. Get the input and output folders at the top of the notebook.

from os import getenv
from pathlib import Path

input_folder = Path(getenv(
    'CROSSCOMPUTE_INPUT_FOLDER', 'tests/standard/input'))
output_folder = Path(getenv(
    'CROSSCOMPUTE_OUTPUT_FOLDER', 'tests/standard/output'))

output_folder.mkdir(parents=True, exist_ok=True)

Load your input variables.

import json

with (input_folder / 'variables.dictionary').open('rt') as f:
    d = json.load(f)
districts_uri = d['districts_uri']

Download Polygons

from pathlib import Path
from urllib.request import urlretrieve as download_uri

districts_path = output_folder / 'raw.geojson'
download_uri(districts_uri, districts_path)
ls datasets -l -h

Download Polygons

Note that the raw download is over 5mb. As the district polygons are not likely to change significantly, we can cache the downloaded file to speed subsequent runs. We hash the uri to get a mostly unique filename, then we check whether the file has already been processed before attempting to download it.

from hashlib import blake2b

def get_hash(text):
    h = blake2b()
    return h.hexdigest()
import json
from urllib.request import urlretrieve as download_uri

datasets_folder = Path('datasets')
raw_path = (
    datasets_folder / 'districts' / get_hash(districts_uri)
if not raw_path.exists():
    raw_path.parent.mkdir(parents=True, exist_ok=True)
    download_uri(districts_uri, raw_path)
with'rt') as f:
    d = json.load(f)
    features = d['features']
districts_geojson = d

We could map our polygons at this point, but requiring users to download a 5mb geojson will cause slower connections to lag. In the next section, we will simplify the polygons.

Simplify Polygons

import json

with'rt') as f:
    districts_geojson = json.load(f)
features = districts_geojson['features']
from shapely.geometry import shape

feature = features[0]
polygon = shape(feature['geometry'])
simplified_polygon = polygon.simplify(0.001)

Simplify Polygons

To speed subsequent runs, we can cache the simplified polygons. Let's combine the above steps to save our simplified polygons.

from shapely.geometry import mapping

for feature in features:
    polygon = shape(feature['geometry'])
    simplified_polygon = polygon.simplify(0.001)
    feature['geometry'] = mapping(simplified_polygon)
from shapely.geometry import mapping, shape

def simplify_feature(feature, tolerance):
    raw_geometry = shape(feature['geometry'])
    simplified_geometry = raw_geometry.simplify(tolerance)
    feature['geometry'] = mapping(simplified_geometry)
    return feature
import json
from urllib.request import urlretrieve as download_uri


datasets_folder = Path('datasets')
districts_path = (
    datasets_folder / 'districts' / get_hash(districts_uri)
if not districts_path.exists():
    districts_path.parent.mkdir(parents=True, exist_ok=True)
    raw_path = districts_path.with_suffix('.raw')
    download_uri(districts_uri, raw_path)
    with'rt') as f:
        d = json.load(f)
        d['features'] = features = [simplify_feature(
            _, SIMPLIFICATION_TOLERANCE) for _ in d['features']]
    with'wt') as f:
        json.dump(d, f)
    with'rt') as f:
        d = json.load(f)
        features = d['features']
districts_geojson = d
ls $districts_path.parent -l -h

File Sizes

Simplifying the polygons reduced our geojson size to a much more manageable 124kb.

Color Polygons

Let's save our polygons.

import json

with (output_folder / 'map.geojson').open('wt') as f:
    json.dump(districts_geojson, f)

Relaunch the development server by clicking the Stop button in the CrossCompute sidebar in JupyterLab and clicking Launch again. Your should see a shadow where our district boundaries should be. This is because in automate.yml, we defined our mapbox fill-color to look for a property named t in each feature.

fill-color: ['interpolate', ['linear'], ['get', 't'], 1, 'blue', 60, 'red']

Null Times

Add a random value for t to each polygon to see the district boundaries more clearly. We vary t from 1 to 60 because that we defined fill-color to expect values from 1 to 60.

from random import choice

for feature in features:
    feature['properties'] = {'t': choice(range(1, 60))}

Relaunch the development server by clicking the Stop button in the CrossCompute sidebar in JupyterLab and clicking Launch again. You should see each district painted randomly.

Random Times

Change the update interval to 10 seconds in automate.yml to see the map colors update in real-time.

  interval: 10 seconds

Random Times

Phase 2: Add Time Histogram

Next, we add a histogram to show the distribution of travel times.

import matplotlib.pyplot as plt

ts = [_['properties']['t'] for _ in features]
plt.hist(ts, bins=10)

Raw Histogram

Add some labels.

import json

with (input_folder / 'variables.dictionary').open('rt') as f:
    d = json.load(f)
districts_uri = d['districts_uri']
destination_address = d['destination_address']
travel_mode = d['travel_mode']
travel_name = {
    'driving': 'car',
    'transit': 'public transit',
plt.hist(ts, bins=10)
plt.title(f'Time to {destination_address} by {travel_name.title()}')

Labelled Histogram

Add a color map.

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap


n, bins, patches = plt.hist(ts, bins=10)
bin_centers = 0.5 * (bins[:-1] + bins[1:])
color_indices = bin_centers / REFERENCE_TIME_IN_MINUTES
color_map = LinearSegmentedColormap.from_list('', ['blue', 'red'])
for i, p in zip(color_indices, patches):
    plt.setp(p, 'facecolor', color_map(i))
plt.title(f'Time to {destination_address} by {travel_name.title()}')

Colorful Histogram

Adjust dimensions, remove padding and save!

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap


px = 1 / plt.rcParams['figure.dpi']
plt.figure(figsize=(800 * px, 200 * px))
n, bins, patches = plt.hist(ts, bins=10)
bin_centers = 0.5 * (bins[:-1] + bins[1:])
color_indices = bin_centers / REFERENCE_TIME_IN_MINUTES
color_map = LinearSegmentedColormap.from_list('', ['blue', 'red'])
for i, p in zip(color_indices, patches):
    plt.setp(p, 'facecolor', color_map(i))
plt.title(f'Time to {destination_address} by {travel_name.title()}')
plt.savefig(output_folder / 'histogram.png')

Times Histogram

Phase 3: Compute Travel Times

Finally, we will use the Google Distance Matrix API to compute travel time to the airport from each district. You need a valid GOOGLE_KEY enabled with the Distance Matrix API to complete this phase.

Add the GOOGLE_KEY environment variable to automate.yml. Remember to adjust your dashboard update interval to control your Google API budget. You might need to restart JupyterLab if you forgot to export GOOGLE_KEY before starting JupyterLab.

    - id: GOOGLE_KEY
  interval: 25 hours

Add the following snippets to run.ipynb.

from os import getenv

import json

with (input_folder / 'variables.dictionary').open('rt') as f:
    d = json.load(f)
districts_uri = d['districts_uri']
destination_address = d['destination_address']
travel_mode = d['travel_mode']
import requests
import sys

def get_travel_packs(origin_strings, destination_strings):
    endpoint_uri = ''
    origins_string = '|'.join(origin_strings)
    destinations_string = '|'.join(destination_strings)
    uri = f'{endpoint_uri}?origins={origins_string}&destinations={destinations_string}&key={GOOGLE_KEY}'
    response = requests.get(uri)
    d = response.json()
    if d['status'] != 'OK':
        print(d, file=sys.stderr)
    origin_addresses = d['origin_addresses']
    travel_packs = []
    for origin_address, row in zip(origin_addresses, d['rows']):
    return travel_packs

Use Representative Points

from shapely.geometry import GeometryCollection, shape
geometry = shape(features[0]['geometry'])
GeometryCollection([geometry.representative_point(), geometry])

Representative Point

origin_points = []
origin_indices = []
for index, feature in enumerate(features):
    geometry = shape(feature['geometry'])
    sample_points = [geometry.representative_point()]
    origin_indices.extend([index] * len(sample_points))
def split(l, s):
    l = list(l)
    for i in range(0, len(l), s):
        yield l[i:i + s]
from shapely.geometry import Point

def get_coordinate_string(point):
    return ','.join(str(_) for _ in reversed(point.coords[0]))

get_coordinate_string(Point(0, 1))
import math

time_packs = []
destination_strings = [destination_address]

for some_origin_packs in split(zip(
    some_origin_points, some_origin_indices = zip(*some_origin_packs)
    some_origin_strings = [get_coordinate_string(_) for _ in some_origin_points]
    travel_packs = get_travel_packs(some_origin_strings, destination_strings)
    for index, (origin_address, time_in_seconds) in zip(some_origin_indices, travel_packs):
        time_packs.append((origin_address, time_in_seconds))
        feature = features[index]
        feature['properties'] = {'t': math.ceil(time_in_seconds / 60)}

Times Map

Use Random Points

We experimented with using random points within each geometry to estimate average travel time from each district. We decided not to use random points because it would create too much variability between runs.

import random
from shapely.geometry import MultiPoint, Point
from shapely.ops import unary_union

def make_random_points(region_geometry, target_count):
    points = []
    minimum_x, minimum_y, maximum_x, maximum_y = region_geometry.bounds
    while len(points) < target_count:
        # Generate random points inside bounds
        random_points = [Point(
            random.uniform(minimum_x, maximum_x),
            random.uniform(minimum_y, maximum_y),
        ) for _ in range(target_count)]
        # Retain points inside region
        collection = unary_union(random_points + points)
        intersection = collection.intersection(region_geometry)
        if intersection.type == 'Point':
            points = [intersection]
            points = list(intersection.geoms)
    # Trim if there are too many
    return points[:target_count]

from shapely.geometry import GeometryCollection
geometry = shape(features[0]['geometry'])
GeometryCollection(make_random_points(geometry, 3) + [geometry])

Random Points

Use Grid Points

We experimented with using grid points within each geometry to estimate average travel time from each district. We used the h3 package to generate the grid points. However, grid points would greatly increase the number of Google Distance Matrix API calls. We could offset this by updating the dashboard less frequently. In the end, we decided that it is more important to have the dashboard updated more often than for it to be more accurate.

import h3

def make_grid_points(geometry, resolution):
    xys = []
    geometries = geometry.geoms if hasattr(geometry, 'geoms') else [geometry]
    for g in geometries:
        if g.area < 0.0001:
        hexs = h3.polyfill(g.__geo_interface__, resolution, geo_json_conformant=True)
        if len(hexs) > 1:
            xys.extend(tuple(reversed(h3.h3_to_geo(_))) for _ in hexs)
    return [Point(_) for _ in xys]

geometry = shape(features[0]['geometry'])
GeometryCollection(make_grid_points(geometry, 8) + [geometry])

Hexagonal Points

Phase 4: Embed in Website

You can now deploy and embed this dashboard in your website. Click the embed icon and copy the code snippet. Paste the code snippet into your website.

