Modernizing Access to Statistics Canada Data 📊

A five minute walkthrough.

Presented on July 11, 2025

Current State

  • Vast public data exists. Difficult to access and analyze.
    • 12,207 “tables”, with 7,919 are available via the Web Data Service (WDS).
    • 284 “Profiles of a community or region”. Some examples include:
      • 2021 Census of Population.
      • National Address Register (NAR).
  • Thousands of CSVs (>3TBs) and other file formats.
  • Many datasets are trapped behind archived web pages and legacy file formats (ex. ARC/INFO, MapInfo, IVT).

Use Case: A Basic Data Task Made Difficult

Let’s say that you want to visualize:

What’s Required Today?

To complete this simple analysis, you would need to:

  1. Download a 2.25 GB ZIP file.
  2. Extract it - now 26.60 GB of CSVs.
  3. Parse and filter your chosen characteristic.
  4. Download an additional 97 MB ZIP file of a DA boundary shapefile.
  5. Extract the ZIP file. Now you have a 171.72 MB file.
  6. Link the processed CSV (step #3) to the Shapefile from step #5.

All of this just to make one map.

Solution

SELECT
    geo.da_dguid,
    cop.count_total_1,
    cop.count_total_155,
    cop.count_total_168,
    CASE
        WHEN cop.count_total_168 = 0.0 THEN 0
        WHEN cop.count_total_155 = 0.0 THEN 0
        WHEN cop.count_total_168 IS NULL THEN 0
        WHEN cop.count_total_155 IS NULL THEN 0
        ELSE 
            ((cop.count_total_168/cop.count_total_155) * 100) 
    END AS percentage_over_100k,
    geo.geom
FROM
    'https://data.source.coop/dataforcanada/d4c-datapkg-statistical/processed/ca_statcan_2021A000011124_d4c-datapkg-statistical_census_pop_dissemination_areas_2021_v0.1.0-beta.parquet' AS cop,
    'https://data.source.coop/dataforcanada/d4c-datapkg-statistical/processed/ca_statcan_2021A000011124_d4c-datapkg-statistical_dissemination_areas_digital_2021_v0.1.0-beta.parquet' AS geo
WHERE geo.csd_dguid IN (
    '2021A00056001009', -- Whitehorse, YT
    '2021A00056106023', -- Yellowknife, NT
    '2021A00056204003', -- Iqaluit, NU
    '2021A00055915022', -- Vancouver, BC
    '2021A00054806016', -- Calgary, AB
    '2021A00054706027', -- Regina, SK
    '2021A00054611040', -- Winnipeg, MB
    '2021A00053506008', -- Ottawa, ON
    '2021A00052466023', -- Montréal, QC
    '2021A00051301006', -- Saint John, NB
    '2021A00051102075', -- Charlottetown, PE
    '2021A00051209034', -- Halifax, NS
    '2021A00051001519' -- St. John's, NL
    ) 
AND cop.da_dguid = geo.da_dguid;

🚀 132.75 MB, 2.89 seconds

Want a DGUID for your region? Use the StatCan Geo Search Tool (2021 vintage).

Query Performance Snapshot

DuckDB Explain Analyze

Progress So Far

https://source.coop/dataforcanada/d4c-datapkg-statistical/processed/tables/{productId}.parquet

What’s Next?

  • Build a Dagster pipeline to auto-refresh WDS tables.
  • Process all Census of Population and Census of Agriculture to the highest detail available as far back as possible (work backwards: 2016, 2011, 2006, 2001, etc.).

What’s Next (Continued)

  • Build Python and R bindings for programmatic access.
  • Generate vector tiles for geographies and Census data.