Reproduction of: Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA
Original study by Kang, J. Y., A. Michels, F. Lyu, Shaohua Wang, N. Agbodo, V. L. Freeman, and Shaowen Wang. 2020. Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA. International Journal of Health Geographics 19 (1):1–17. DOI:10.1186/s12942-020-00229-x.
Reproduction Authors: Joe Holler, Derrick Burt, and Kufre Udoh With contributions from Peter Kedron, Drew An-Pham, Katie Heo, and the Spring 2021 Open Source GIScience class at Middlebury
Reproduction Materials Available at: github.com/HEGSRR/RPr-Kang-2020
Created: 2021-06-01
Revised: 2023-12-16
by Katie Heo, Middlebury College
To perform the ESFCA method, three types of data are required, as follows: (1) road network, (2) population, and (3) hospital information. The road network can be obtained from the OpenStreetMap Python Library, called OSMNX. The population data is available on the American Community Survey. Lastly, hospital information is also publically available on the Homeland Infrastructure Foundation-Level Data.
The original Kang et al. (2020) study aimed to assess the spatial accessibility of COVID-19 healthcare resources in Chicago, Illinois through network analysis. Utilizing hospital point data, OSM road networks, population data (ages over 50 representing at-risk populations), and COVID-19 case data, catchment areas were calculated for hospitals at 10, 20, and 30-minute radii. Weighted service ratios were then computed based on population centroids and hospital catchment polygons, aggregated into a hexagonal grid.
Our original reproduction of Kang et al. was mostly successful but had areas for improvement, which this reanalysis attempts to address. Our reanalysis intends to resolve boundary issues by buffering the road network, resolve uncertainty by improving road network speed data using OSMNX, and improve accuracy and refine accessibility calculations by utilizing area-weighted reaggregation.
The reanalysis changes three things from the original study to improve its reproducibility. First, the reanalysis tries to address boundary issues by extending the road network to 15 miles outside of Chicago. Next, it aims to replace the missing speed limit values with the OSMNX speed function. Lastly, we utilized area-weighted reaggregation to aggregate data into hexagons to try and improve the accuracy of the study.
Import necessary libraries to run this model.
See environment.yml
for the library versions used for this analysis.
# Import modules
import numpy as np
import pandas as pd
import geopandas as gpd
import networkx as nx
import osmnx as ox
import re
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from tqdm import tqdm
import multiprocessing as mp
import folium
import itertools
import os
import time
import warnings
import IPython
import requests
from IPython.display import display, clear_output
warnings.filterwarnings("ignore")
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
ipywidgets==7.6.5 numpy==1.22.0 pandas==1.3.5 geopandas==0.10.2 networkx==2.6.3 osmnx==1.1.2 re==2.2.1 folium==0.12.1.post1 IPython==8.3.0 requests==2.27.1
Because we have restructured the repository for replication, we need to check our working directory and make necessary adjustments.
# Check working directory
os.getcwd()
'/home/jovyan/work/RPr-Kang-2020/procedure/code'
# Use to set work directory properly
if os.path.basename(os.getcwd()) == 'code':
os.chdir('../../')
os.getcwd()
'/home/jovyan/work/RPr-Kang-2020'
If you would like to use the data generated from the pre-processing scripts, use the following code:
covid_data = gpd.read_file('./data/raw/public/Pre-Processing/covid_pre-processed.shp')
atrisk_data = gpd.read_file('./data/raw/public/Pre-Processing/atrisk_pre-processed.shp')
# Read in at risk population data
atrisk_data = gpd.read_file('./data/raw/public/PopData/Illinois_Tract.shp')
atrisk_data.head()
GEOID | STATEFP | COUNTYFP | TRACTCE | NAMELSAD | Pop | Unnamed_ 0 | NAME | OverFifty | TotalPop | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 17091011700 | 17 | 091 | 011700 | Census Tract 117 | 3688 | 588 | Census Tract 117, Kankakee County, Illinois | 1135 | 3688 | POLYGON ((-87.88768 41.13594, -87.88764 41.136... |
1 | 17091011800 | 17 | 091 | 011800 | Census Tract 118 | 2623 | 220 | Census Tract 118, Kankakee County, Illinois | 950 | 2623 | POLYGON ((-87.89410 41.14388, -87.89400 41.143... |
2 | 17119400951 | 17 | 119 | 400951 | Census Tract 4009.51 | 5005 | 2285 | Census Tract 4009.51, Madison County, Illinois | 2481 | 5005 | POLYGON ((-90.11192 38.70281, -90.11128 38.703... |
3 | 17119400952 | 17 | 119 | 400952 | Census Tract 4009.52 | 3014 | 2299 | Census Tract 4009.52, Madison County, Illinois | 1221 | 3014 | POLYGON ((-90.09442 38.72031, -90.09360 38.720... |
4 | 17135957500 | 17 | 135 | 957500 | Census Tract 9575 | 2869 | 1026 | Census Tract 9575, Montgomery County, Illinois | 1171 | 2869 | POLYGON ((-89.70369 39.34803, -89.69928 39.348... |
# Read in covid case data
covid_data = gpd.read_file('./data/raw/public/PopData/Chicago_ZIPCODE.shp')
covid_data['cases'] = covid_data['cases']
covid_data.head()
ZCTA5CE10 | County | State | Join | ZONE | ZONENAME | FIPS | pop | cases | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 60660 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 43242 | 78 | POLYGON ((-87.65049 41.99735, -87.65029 41.996... |
1 | 60640 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 69715 | 117 | POLYGON ((-87.64645 41.97965, -87.64565 41.978... |
2 | 60614 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 71308 | 134 | MULTIPOLYGON (((-87.67703 41.91845, -87.67705 ... |
3 | 60712 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 12539 | 42 | MULTIPOLYGON (((-87.76181 42.00465, -87.76156 ... |
4 | 60076 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 31867 | 114 | MULTIPOLYGON (((-87.74782 42.01540, -87.74526 ... |
Note that 999 is treated as a "NULL"/"NA" so these hospitals are filtered out. This data contains the number of ICU beds and ventilators at each hospital.
# Read in hospital data
hospitals = gpd.read_file('./data/raw/public/HospitalData/Chicago_Hospital_Info.shp')
hospitals.head()
FID | Hospital | City | ZIP_Code | X | Y | Total_Bed | Adult ICU | Total Vent | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | Methodist Hospital of Chicago | Chicago | 60640 | -87.671079 | 41.972800 | 145 | 36 | 12 | MULTIPOINT (-87.67108 41.97280) |
1 | 4 | Advocate Christ Medical Center | Oak Lawn | 60453 | -87.732483 | 41.720281 | 785 | 196 | 64 | MULTIPOINT (-87.73248 41.72028) |
2 | 13 | Evanston Hospital | Evanston | 60201 | -87.683288 | 42.065393 | 354 | 89 | 29 | MULTIPOINT (-87.68329 42.06539) |
3 | 24 | AMITA Health Adventist Medical Center Hinsdale | Hinsdale | 60521 | -87.920116 | 41.805613 | 261 | 65 | 21 | MULTIPOINT (-87.92012 41.80561) |
4 | 25 | Holy Cross Hospital | Chicago | 60629 | -87.690841 | 41.770001 | 264 | 66 | 21 | MULTIPOINT (-87.69084 41.77000) |
# Plot hospital data
m = folium.Map(location=[41.85, -87.65], tiles='cartodbpositron', zoom_start=10)
for i in range(0, len(hospitals)):
folium.CircleMarker(
location=[hospitals.iloc[i]['Y'], hospitals.iloc[i]['X']],
popup="{}{}\n{}{}\n{}{}".format('Hospital Name: ',hospitals.iloc[i]['Hospital'],
'ICU Beds: ',hospitals.iloc[i]['Adult ICU'],
'Ventilators: ', hospitals.iloc[i]['Total Vent']),
radius=5,
color='blue',
fill=True,
fill_opacity=0.6,
legend_name = 'Hospitals'
).add_to(m)
legend_html = '''<div style="position: fixed; width: 20%; heigh: auto;
bottom: 10px; left: 10px;
solid grey; z-index:9999; font-size:14px;
"> Legend<br>'''
m
# Read in and plot grid file for Chicago
grid_file = gpd.read_file('./data/raw/public/GridFile/Chicago_Grid.shp')
grid_file.plot(figsize=(8,8))
<AxesSubplot:>
If Chicago_Network_Buffer.graphml
does not already exist, this cell will query the road network from OpenStreetMap.
Each of the road network code blocks may take a few minutes to run.
%%time
# To create a new graph from OpenStreetMap, delete or rename data/raw/private/Chicago_Network_Buffer.graphml
# (if it exists), and set OSM to True
OSM = False
# if buffered street network is not saved, and OSM is preferred, # generate a new graph from OpenStreetMap and save it
if not os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml") and OSM:
print("Loading buffered Chicago road network from OpenStreetMap. Please wait... runtime may exceed 9min...", flush=True)
G = ox.graph_from_place('Chicago', network_type='drive', buffer_dist=24140.2)
print("Saving Chicago road network to raw/private/Chicago_Network_Buffer.graphml. Please wait...", flush=True)
ox.save_graphml(G, './data/raw/private/Chicago_Network_Buffer.graphml')
print("Data saved.")
# otherwise, if buffered street network is not saved, download graph from the OSF project
elif not os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml"):
print("Downloading buffered Chicago road network from OSF...", flush=True)
url = 'https://osf.io/download/z8ery/'
r = requests.get(url, allow_redirects=True)
print("Saving buffered Chicago road network to file...", flush=True)
open('./data/raw/private/Chicago_Network_Buffer.graphml', 'wb').write(r.content)
# if the buffered street network is already saved, load it
if os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml"):
print("Loading buffered Chicago road network from raw/private/Chicago_Network_Buffer.graphml. Please wait...", flush=True)
G = ox.load_graphml('./data/raw/private/Chicago_Network_Buffer.graphml')
print("Data loaded.")
else:
print("Error: could not load the road network from file.")
Loading buffered Chicago road network from raw/private/Chicago_Network_Buffer.graphml. Please wait... Data loaded. CPU times: user 36.5 s, sys: 290 ms, total: 36.8 s Wall time: 36.7 s
%%time
ox.plot_graph(G, node_size = 1, bgcolor = 'white', node_color = 'black', edge_color = "#333333", node_alpha = 0.5, edge_linewidth = 0.5)
CPU times: user 53 s, sys: 112 ms, total: 53.2 s Wall time: 52.9 s
(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)
Display all the unique speed limit values and count how many network edges (road segments) have each value. We will compare this to our cleaned network later.
%%time
# Turn nodes and edges into geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
# Get unique counts of road segments for each speed limit
print(edges['maxspeed'].value_counts())
print(str(len(edges)) + " edges in graph")
# can we also visualize highways / roads with higher speed limits to check accuracy?
# the code above converts the graph into an edges geodataframe, which could theoretically be filtered
# by fast road segments and mapped, e.g. in folium
25 mph 6016 30 mph 4873 35 mph 4803 20 mph 3621 40 mph 2842 45 mph 2423 55 mph 876 60 mph 293 50 mph 287 15 mph 107 70 mph 79 [40 mph, 45 mph] 54 10 mph 44 [30 mph, 35 mph] 36 65 mph 36 [40 mph, 35 mph] 36 [45 mph, 35 mph] 34 [45 mph, 55 mph] 29 45,30 24 [50 mph, 45 mph] 19 25 14 25, east 14 [30 mph, 25 mph] 13 [30 mph, 40 mph] 11 [20 mph, 35 mph] 6 [25 mph, 35 mph] 6 [60 mph, 55 mph] 5 [60 mph, 65 mph] 5 [20 mph, 25 mph] 4 20 4 [65 mph, 55 mph] 4 [60 mph, 70 mph] 4 [50 mph, 55 mph] 3 [50 mph, 40 mph] 3 [40 mph, 45 mph, 35 mph] 3 [50 mph, 45, east, 55 mph] 2 [60 mph, 45 mph] 2 [40 mph, 25 mph, 35 mph] 2 5 mph 2 [30 mph, 25, east] 2 [50 mph, 45 mph, 55 mph] 2 [15 mph, 45 mph] 2 [15 mph, 25 mph] 2 [30 mph, 45 mph] 2 [30 mph, 15 mph, 40 mph] 2 [30 mph, 15 mph] 2 [5 mph, 35 mph] 2 [45mph, 45 mph] 1 [30 mph, 20 mph] 1 [5 mph, 70 mph, 45 mph] 1 [70 mph, 5 mph, 45 mph] 1 [55 mph, 35 mph] 1 [40 mph, 55 mph, 35 mph] 1 [35 mph, 40 mph, 55 mph] 1 [15 mph, 15] 1 Name: maxspeed, dtype: int64 384974 edges in graph CPU times: user 30 s, sys: 9 ms, total: 30 s Wall time: 30 s
edges.head()
osmid | oneway | lanes | ref | name | highway | maxspeed | length | geometry | bridge | tunnel | access | junction | width | area | service | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
u | v | key | ||||||||||||||||
738776 | 768967302 | 0 | [61699092, 918557247] | True | [5, 4] | I 294 | Tri-State Tollway | motorway | 55 mph | 467.708 | LINESTRING (-87.68109 41.58525, -87.68096 41.5... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
738920 | 348225363 | 0 | [61431949, 31298719] | True | 5 | I 80;I 94 | Kingery Expressway | motorway | 55 mph | 1220.747 | LINESTRING (-87.56225 41.57764, -87.55790 41.5... | yes | NaN | NaN | NaN | NaN | NaN | NaN |
739113 | 1875082688 | 0 | 60862616 | True | 2 | NaN | NaN | motorway_link | NaN | 549.609 | LINESTRING (-87.34349 41.56738, -87.34277 41.5... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
739130 | 0 | 292493273 | True | 4 | I 80;I 94;US 6 | Borman Expressway | motorway | 55 mph | 1191.046 | LINESTRING (-87.34349 41.56738, -87.34104 41.5... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
739117 | 739113 | 0 | 292493271 | True | 5 | I 80;I 94;US 6 | Borman Expressway | motorway | 55 mph | 381.798 | LINESTRING (-87.34806 41.56768, -87.34689 41.5... | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Cleans the OSMNX network to work better with drive-time analysis.
First, we remove all nodes with 0 outdegree because any hospital assigned to such a node would be unreachable from everywhere. Next, we remove small (under 10 node) strongly connected components to reduce erroneously small ego-centric networks. Lastly, we ensure that the max speed is set and in the correct units before calculating time.
Replacing the network_setting using the osmnx.speed module will help refine the accuracy of the speed limit information in the transportation network model. This modification will allow for a more realistic representation of road speeds, impacting the overall accuracy of accessibility results.
Args:
Returns:
# view all highway types
print(edges['highway'].value_counts())
residential 293005 secondary 35062 tertiary 33478 primary 14566 unclassified 2474 motorway_link 2378 motorway 1378 primary_link 808 trunk 544 secondary_link 464 living_street 282 trunk_link 195 tertiary_link 150 [tertiary, residential] 60 [unclassified, residential] 58 [secondary, primary] 19 [secondary, tertiary] 13 [living_street, residential] 9 [unclassified, tertiary] 6 [motorway, motorway_link] 5 busway 4 [trunk, primary] 4 [secondary, residential] 2 [motorway, trunk] 2 [secondary, secondary_link] 2 [secondary, motorway_link] 1 [motorway, primary] 1 [primary, motorway_link] 1 [tertiary, tertiary_link] 1 [tertiary, motorway_link] 1 [tertiary, trunk_link] 1 Name: highway, dtype: int64
# two things about this function:
# 1) the work to remove nodes is hardly worth it now that OSMnx cleans graphs by default
# the function is now only pruning < 300 nodes
# 2) try using the OSMnx speed module for setting speeds, travel times
# https://osmnx.readthedocs.io/en/stable/user-reference.html#module-osmnx.speed
# just be careful about units of speed and time!
# the remainder of this code expects 'time' to be measured in minutes
def network_setting(network):
_nodes_removed = len([n for (n, deg) in network.out_degree() if deg ==0])
network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg ==0])
for component in list(nx.strongly_connected_components(network)):
if len(component)<10:
for node in component:
_nodes_removed+=1
network.remove_node(node)
ox.speed.add_edge_speeds(network)
ox.speed.add_edge_travel_times(network)
print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed/float(network.number_of_nodes())))
print("Number of nodes: {}".format(network.number_of_nodes()))
print("Number of edges: {}".format(network.number_of_edges()))
return(network)
%%time
# G, hospitals, grid_file, pop_data = file_import (population_dropdown.value)
G = network_setting(G)
# Create point geometries for each node in the graph, to make constructing catchment area polygons easier
for node, data in G.nodes(data=True):
data['geometry']=Point(data['x'], data['y'])
# Modify code to react to processor dropdown (got rid of file_import function)
Removed 315 nodes (0.0022%) from the OSMNX network Number of nodes: 142777 Number of edges: 384591 CPU times: user 41.3 s, sys: 386 ms, total: 41.6 s Wall time: 41.6 s
Display all the unique speed limit values and count how many network edges (road segments) have each value. Compare to the previous results.
%%time
## Get unique counts for each road network
# Turn nodes and edges in geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
CPU times: user 31.5 s, sys: 15.6 ms, total: 31.5 s Wall time: 31.4 s
def hospital_setting(hospitals, G):
# Create an empty column
hospitals['nearest_osm']=None
# Append the neaerest osm column with each hospitals neaerest osm node
for i in tqdm(hospitals.index, desc="Find the nearest network node from hospitals", position=0):
hospitals['nearest_osm'][i] = ox.get_nearest_node(G, [hospitals['Y'][i], hospitals['X'][i]], method='euclidean') # find the nearest node from hospital location
print ('hospital setting is done')
return(hospitals)
Converts geodata to centroids
Args:
Returns:
def pop_centroid (pop_data, pop_type):
pop_data = pop_data.to_crs({'init': 'epsg:4326'})
# If pop is selected in dropdown, select at risk pop where population is greater than 0
if pop_type =="pop":
pop_data=pop_data[pop_data['OverFifty']>=0]
# If covid is selected in dropdown, select where covid cases are greater than 0
if pop_type =="covid":
pop_data=pop_data[pop_data['cases']>=0]
pop_cent = pop_data.centroid # it make the polygon to the point without any other information
# Convert to gdf
pop_centroid = gpd.GeoDataFrame()
i = 0
for point in tqdm(pop_cent, desc='Pop Centroid File Setting', position=0):
if pop_type== "pop":
pop = pop_data.iloc[i]['OverFifty']
code = pop_data.iloc[i]['GEOID']
if pop_type =="covid":
pop = pop_data.iloc[i]['cases']
code = pop_data.iloc[i].ZCTA5CE10
pop_centroid = pop_centroid.append({'code':code,'pop': pop,'geometry': point}, ignore_index=True)
i = i+1
return(pop_centroid)
This is a function written by Joe Holler and Derrick Burt. It is a more efficient way to calculate distance-weighted catchment areas for each hospital. The algorithm runs quicker than the original one ("calculate_catchment_area"). It first creates a dictionary (with a node and its corresponding drive time from the hospital) of all nodes within a 30 minute drive time (using single_cource_dijkstra_path_length function). From here, two more dictionaries are constructed by querying the original one. From this dictionaries, single part convex hulls are created for each drive time interval and appended into a single list (one list with 3 polygon geometries). Within the list, the polygons are differenced from each other to produce three catchment areas.
Args:
Returns:
def dijkstra_cca_polygons(G, nearest_osm, distances, distance_unit = "travel_time"):
'''
Before running: must assign point geometries to street nodes
# create point geometries for the entire graph
for node, data in G.nodes(data=True):
data['geometry']=Point(data['x'], data['y'])
'''
## CREATE DICTIONARIES
# create dictionary of nearest nodes
nearest_nodes_30 = nx.single_source_dijkstra_path_length(G, nearest_osm, distances[2], distance_unit) # creating the largest graph from which 10 and 20 minute drive times can be extracted from
# extract values within 20 and 10 (respectively) minutes drive times
nearest_nodes_20 = dict()
nearest_nodes_10 = dict()
for key, value in nearest_nodes_30.items():
if value <= distances[1]:
nearest_nodes_20[key] = value
if value <= distances[0]:
nearest_nodes_10[key] = value
## CREATE POLYGONS FOR 3 DISTANCE CATEGORIES (10 min, 20 min, 30 min)
# 30 MIN
# If the graph already has a geometry attribute with point data,
# this line will create a GeoPandas GeoDataFrame from the nearest_nodes_30 dictionary
points_30 = gpd.GeoDataFrame(gpd.GeoSeries(nx.get_node_attributes(G.subgraph(nearest_nodes_30), 'geometry')))
# This line converts the nearest_nodes_30 dictionary into a Pandas data frame and joins it to points
# left_index=True and right_index=True are options for merge() to join on the index values
points_30 = points_30.merge(pd.Series(nearest_nodes_30).to_frame(), left_index=True, right_index=True)
# Re-name the columns and set the geodataframe geometry to the geometry column
points_30 = points_30.rename(columns={'0_x':'geometry','0_y':'z'}).set_geometry('geometry')
# Create a convex hull polygon from the points
polygon_30 = gpd.GeoDataFrame(gpd.GeoSeries(points_30.unary_union.convex_hull))
polygon_30 = polygon_30.rename(columns={0:'geometry'}).set_geometry('geometry')
# 20 MIN
# Select nodes less than or equal to 20
points_20 = points_30.query("z <= 1200")
# Create a convex hull polygon from the points
polygon_20 = gpd.GeoDataFrame(gpd.GeoSeries(points_20.unary_union.convex_hull))
polygon_20 = polygon_20.rename(columns={0:'geometry'}).set_geometry('geometry')
# 10 MIN
# Select nodes less than or equal to 10
points_10 = points_30.query("z <= 600")
# Create a convex hull polygon from the points
polygon_10 = gpd.GeoDataFrame(gpd.GeoSeries(points_10.unary_union.convex_hull))
polygon_10 = polygon_10.rename(columns={0:'geometry'}).set_geometry('geometry')
# Create empty list and append polygons
polygons = []
# Append
polygons.append(polygon_10)
polygons.append(polygon_20)
polygons.append(polygon_30)
# Clip the overlapping distance ploygons (create two donuts + hole)
for i in reversed(range(1, len(distances))):
polygons[i] = gpd.overlay(polygons[i], polygons[i-1], how="difference")
return polygons
Measures the effect of a single hospital on the surrounding area. (Uses dijkstra_cca_polygons
)
Args:
Returns:
def hospital_measure_acc (_thread_id, hospital, pop_data, distances, weights):
# Create polygons
polygons = dijkstra_cca_polygons(G, hospital['nearest_osm'], distances)
# Calculate accessibility measurements
num_pops = []
for j in pop_data.index:
point = pop_data['geometry'][j]
# Multiply polygons by weights
for k in range(len(polygons)):
if len(polygons[k]) > 0: # To exclude the weirdo (convex hull is not polygon)
if (point.within(polygons[k].iloc[0]["geometry"])):
num_pops.append(pop_data['pop'][j]*weights[k])
total_pop = sum(num_pops)
for i in range(len(distances)):
polygons[i]['time']=distances[i]
polygons[i]['total_pop']=total_pop
polygons[i]['hospital_icu_beds'] = float(hospital['Adult ICU'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
polygons[i]['hospital_vents'] = float(hospital['Total Vent'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
polygons[i].crs = { 'init' : 'epsg:4326'}
polygons[i] = polygons[i].to_crs({'init':'epsg:32616'})
print('{:.0f}'.format(_thread_id), end=" ", flush=True)
return(_thread_id, [ polygon.copy(deep=True) for polygon in polygons ])
Parallel implementation of accessibility measurement.
Args:
Returns:
def hospital_acc_unpacker(args):
return hospital_measure_acc(*args)
# WHERE THE RESULTS ARE POOLED AND THEN REAGGREGATED
def measure_acc_par (hospitals, pop_data, network, distances, weights, num_proc = 4):
catchments = []
for distance in distances:
catchments.append(gpd.GeoDataFrame())
pool = mp.Pool(processes = num_proc)
hospital_list = [ hospitals.iloc[i] for i in range(len(hospitals)) ]
print("Calculating", len(hospital_list), "hospital catchments...\ncompleted number:", end=" ")
results = pool.map(hospital_acc_unpacker, zip(range(len(hospital_list)), hospital_list, itertools.repeat(pop_data), itertools.repeat(distances), itertools.repeat(weights)))
pool.close()
results.sort()
results = [ r[1] for r in results ]
for i in range(len(results)):
for j in range(len(distances)):
catchments[j] = catchments[j].append(results[i][j], sort=False)
return catchments
Calculates and aggregates accessibility statistics for one catchment on our grid file.
Args:
Returns:
from collections import Counter
def overlap_calc(_id, poly, grid_file, weight, service_type):
value_dict = Counter()
if type(poly.iloc[0][service_type])!=type(None):
value = float(poly[service_type])*weight
intersect = gpd.overlay(grid_file, poly, how='intersection')
intersect['overlapped']= intersect.area
intersect['percent'] = intersect['overlapped']/intersect['area']
intersect=intersect[intersect['percent']>=0.5]
intersect_region = intersect['id']
for intersect_id in intersect_region:
try:
value_dict[intersect_id] +=value
except:
value_dict[intersect_id] = value
return(_id, value_dict)
def overlap_calc_unpacker(args):
return overlap_calc(*args)
Calculates how all catchment areas overlap with and affect the accessibility of each grid in our grid file.
Args:
Returns:
def overlapping_function (grid_file, catchments, service_type, weights, num_proc = 4):
grid_file[service_type]=0
pool = mp.Pool(processes = num_proc)
acc_list = []
for i in range(len(catchments)):
acc_list.extend([ catchments[i][j:j+1] for j in range(len(catchments[i])) ])
acc_weights = []
for i in range(len(catchments)):
acc_weights.extend( [weights[i]]*len(catchments[i]) )
results = pool.map(overlap_calc_unpacker, zip(range(len(acc_list)), acc_list, itertools.repeat(grid_file), acc_weights, itertools.repeat(service_type)))
pool.close()
results.sort()
results = [ r[1] for r in results ]
service_values = results[0]
for result in results[1:]:
service_values+=result
for intersect_id, value in service_values.items():
grid_file.loc[grid_file['id']==intersect_id, service_type] += value
return(grid_file)
Normalizes our result (Geodataframe) for a given resource (res).
def normalization (result, res):
result[res]=(result[res]-min(result[res]))/(max(result[res])-min(result[res]))
return result
Imports all files we need to run our code and pulls the Illinois network from OSMNX if it is not present (will take a while).
NOTE: even if we calculate accessibility for just Chicago, we want to use the Illinois network (or at least we should not use the Chicago network) because using the Chicago network will result in hospitals near but outside of Chicago having an infinite distance (unreachable because roads do not extend past Chicago).
Args:
Returns:
def output_map(output_grid, base_map, hospitals, resource):
ax=output_grid.plot(column=resource, cmap='PuBuGn',figsize=(18,12), legend=True, zorder=1)
# Next two lines set bounds for our x- and y-axes because it looks like there's a weird
# Point at the bottom left of the map that's messing up our frame (Maja)
ax.set_xlim([314000, 370000])
ax.set_ylim([540000, 616000])
base_map.plot(ax=ax, facecolor="none", edgecolor='gray', lw=0.1)
hospitals.plot(ax=ax, markersize=10, zorder=1, c='blue')
Below you can customize the input of the model:
import ipywidgets
from IPython.display import display
processor_dropdown = ipywidgets.Dropdown( options=[("1", 1), ("2", 2), ("3", 3), ("4", 4)],
value = 4, description = "Processor: ")
population_dropdown = ipywidgets.Dropdown( options=[("Population at Risk", "pop"), ("COVID-19 Patients", "covid") ],
value = "pop", description = "Population: ")
resource_dropdown = ipywidgets.Dropdown( options=[("ICU Beds", "hospital_icu_beds"), ("Ventilators", "hospital_vents") ],
value = "hospital_icu_beds", description = "Resource: ")
hospital_dropdown = ipywidgets.Dropdown( options=[("All hospitals", "hospitals"), ("Subset", "hospital_subset") ],
value = "hospitals", description = "Hospital:")
display(processor_dropdown,population_dropdown,resource_dropdown,hospital_dropdown)
Dropdown(description='Processor: ', index=3, options=(('1', 1), ('2', 2), ('3', 3), ('4', 4)), value=4)
Dropdown(description='Population: ', options=(('Population at Risk', 'pop'), ('COVID-19 Patients', 'covid')), …
Dropdown(description='Resource: ', options=(('ICU Beds', 'hospital_icu_beds'), ('Ventilators', 'hospital_vents…
Dropdown(description='Hospital:', options=(('All hospitals', 'hospitals'), ('Subset', 'hospital_subset')), val…
if population_dropdown.value == "pop":
pop_data = pop_centroid(atrisk_data, population_dropdown.value)
elif population_dropdown.value == "covid":
pop_data = pop_centroid(covid_data, population_dropdown.value)
distances=[600, 1200, 1800] # Distances in travel time
weights=[1.0, 0.68, 0.22] # Weights where weights[0] is applied to distances[0]
# Other weighting options representing different distance decays
# weights1, weights2, weights3 = [1.0, 0.42, 0.09], [1.0, 0.75, 0.5], [1.0, 0.5, 0.1]
# it is surprising how long this function takes just to calculate centroids.
# why not do it with the geopandas/pandas functions rather than iterating through every item?
Pop Centroid File Setting: 100%|██████████| 3121/3121 [03:23<00:00, 15.33it/s]
If you have already run this code and changed the Hospital selection, rerun the Load Hospital Data block.
# Set hospitals according to hospital dropdown
if hospital_dropdown.value == "hospital_subset":
hospitals = hospital_setting(hospitals[:1], G)
else:
hospitals = hospital_setting(hospitals, G)
resources = ["hospital_icu_beds", "hospital_vents"] # resources
# this is also slower than it needs to be; if network nodes and hospitals are both
# geopandas data frames, it should be possible to do a much faster spatial join rather than iterating through every hospital
Find the nearest network node from hospitals: 100%|██████████| 66/66 [01:16<00:00, 1.16s/it]
hospital setting is done
# Create point geometries for entire graph
# what is the purpose of the following two lines? Can this be deleted?
# for node, data in G.nodes(data=True):
# data['geometry']=Point(data['x'], data['y'])
# which hospital to visualize?
fighosp = 7
# Create catchment for hospital 0
poly = dijkstra_cca_polygons(G, hospitals['nearest_osm'][fighosp], distances)
# Reproject polygons
for i in range(len(poly)):
poly[i].crs = { 'init' : 'epsg:4326'}
poly[i] = poly[i].to_crs({'init':'epsg:32616'})
# Reproject hospitals
# Possible to map from the hospitals data rather than creating hospital_subset?
hospital_subset = hospitals.iloc[[fighosp]].to_crs(epsg=32616)
fig, ax = plt.subplots(figsize=(12,8))
min_10 = poly[0].plot(ax=ax, color="royalblue", label="10 min drive")
min_20 = poly[1].plot(ax=ax, color="cornflowerblue", label="20 min drive")
min_30 = poly[2].plot(ax=ax, color="lightsteelblue", label="30 min drive")
hospital_subset.plot(ax=ax, color="red", legend=True, label = "hospital")
# Add legend
ax.legend()
<matplotlib.legend.Legend at 0x7ff969d98d90>
poly
[ geometry 0 POLYGON ((441787.793 4610504.026, 433342.680 4..., geometry 0 POLYGON ((433849.512 4600241.594, 427684.513 4..., geometry 0 POLYGON ((451766.066 4587286.033, 438932.445 4...]
%%time
catchments = measure_acc_par(hospitals, pop_data, G, distances, weights, num_proc=processor_dropdown.value)
Calculating 66 hospital catchments... completed number: 5 15 0 10 6 1 16 11 7 2 17 12 3 8 18 13 4 9 19 14 20 25 30 35 26 21 31 36 27 22 32 37 28 23 33 38 29 24 34 39 40 45 50 55 41 46 51 56 42 47 57 52 43 48 58 53 44 49 59 54 60 65 61 62 63 64 CPU times: user 2.1 s, sys: 471 ms, total: 2.57 s Wall time: 2min 5s
%%time
for j in range(len(catchments)):
catchments[j] = catchments[j][catchments[j][resource_dropdown.value]!=float('inf')]
result=overlapping_function(grid_file, catchments, resource_dropdown.value, weights, num_proc=processor_dropdown.value)
CPU times: user 6.15 s, sys: 382 ms, total: 6.53 s Wall time: 17.1 s
# add weight field to each catchment polygon
for i in range(len(weights)):
catchments[i]['weight'] = weights[i]
# combine the three sets of catchment polygons into one geodataframe
geocatchments = pd.concat([catchments[0], catchments[1], catchments[2]])
geocatchments
geometry | time | total_pop | hospital_icu_beds | hospital_vents | weight | |
---|---|---|---|---|---|---|
0 | POLYGON ((448183.185 4637565.186, 445181.084 4... | 600 | 774246.84 | 0.000046 | 0.000015 | 1.00 |
0 | POLYGON ((438560.568 4609646.631, 432067.664 4... | 600 | 719649.38 | 0.000272 | 0.000089 | 1.00 |
0 | POLYGON ((444065.785 4649104.166, 442871.481 4... | 600 | 455860.76 | 0.000195 | 0.000064 | 1.00 |
0 | POLYGON ((421468.232 4621045.724, 421031.920 4... | 600 | 718206.62 | 0.000091 | 0.000029 | 1.00 |
0 | POLYGON ((443313.505 4615987.097, 440030.033 4... | 600 | 694887.00 | 0.000095 | 0.000030 | 1.00 |
... | ... | ... | ... | ... | ... | ... |
0 | POLYGON ((443223.331 4604956.986, 440431.675 4... | 1800 | 999144.68 | 0.000027 | 0.000009 | 0.22 |
0 | MULTIPOLYGON (((420251.781 4675101.028, 428590... | 1800 | 758913.48 | 0.000059 | 0.000018 | 0.22 |
0 | POLYGON ((415910.447 4618609.875, 409824.239 4... | 1800 | 963567.82 | 0.000087 | 0.000028 | 0.22 |
0 | POLYGON ((444519.744 4602914.274, 438784.832 4... | 1800 | 930027.68 | 0.000066 | 0.000022 | 0.22 |
0 | POLYGON ((416033.726 4607281.060, 413086.072 4... | 1800 | 784872.50 | 0.000120 | 0.000038 | 0.22 |
198 rows × 6 columns
%%time
# set weighted to False for original 50% threshold method
# switch to True for area-weighted overlay
weighted = True
# if the value to be calculated is already in the hegaxon grid, delete it
# otherwise, the field name gets a suffix _1 in the overlay step
if resource_dropdown.value in list(grid_file.columns.values):
grid_file = grid_file.drop(resource_dropdown.value, axis = 1)
# calculate hexagon 'target' areas
grid_file['area'] = grid_file.area
# Intersection overlay of hospital catchments and hexagon grid
print("Intersecting hospital catchments with hexagon grid...")
fragments = gpd.overlay(grid_file, geocatchments, how='intersection')
# Calculate percent coverage of the hexagon by the hospital catchment as
# fragment area / target(hexagon) area
fragments['percent'] = fragments.area / fragments['area']
# if using weighted aggregation...
if weighted:
print("Calculating area-weighted value...")
# multiply the service/population ratio by the distance weight and the percent coverage
fragments['value'] = fragments[resource_dropdown.value] * fragments['weight'] * fragments['percent']
# if using the 50% coverage rule for unweighted aggregation...
else:
print("Calculating value for hexagons with >=50% overlap...")
# filter for only the fragments with > 50% coverage by hospital catchment
fragments = fragments[fragments['percent']>=0.5]
# multiply the service/population ration by the distance weight
fragments['value'] = fragments[resource_dropdown.value] * fragments['weight']
# select just the hexagon id and value from the fragments,
# group the fragments by the (hexagon) id,
# and sum the values
print("Summarizing results by hexagon id...")
sum_results = fragments[['id', 'value']].groupby(by = ['id']).sum()
# join the results to the hexagon grid_file based on hexagon id
print("Joining results to hexagons...")
result_new = pd.merge(grid_file, sum_results, how="left", on = "id")
# rename value column name to the resource name
result_new.rename(columns = {'value' : resource_dropdown.value})
Intersecting hospital catchments with hexagon grid... Calculating area-weighted value... Summarizing results by hexagon id... Joining results to hexagons... CPU times: user 11.3 s, sys: 69.1 ms, total: 11.4 s Wall time: 11.4 s
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.416087 | 4.638515e+06 | 441420.766356 | 4.638015e+06 | 4158 | 216506.350946 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.003569 |
1 | 440843.416087 | 4.638015e+06 | 441420.766356 | 4.637515e+06 | 4159 | 216506.350946 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.003607 |
2 | 440843.416087 | 4.639515e+06 | 441420.766356 | 4.639015e+06 | 4156 | 216506.350946 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.003651 |
3 | 440843.416087 | 4.639015e+06 | 441420.766356 | 4.638515e+06 | 4157 | 216506.350946 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.003593 |
4 | 440843.416087 | 4.640515e+06 | 441420.766356 | 4.640015e+06 | 4154 | 216506.350946 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.003608 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
3274 | 440843.416087 | 4.643015e+06 | 441420.766356 | 4.642515e+06 | 4149 | 216506.350946 | POLYGON ((440843.416 4642765.403, 440987.754 4... | 0.003535 |
3275 | 440843.416087 | 4.644515e+06 | 441420.766356 | 4.644015e+06 | 4146 | 216506.350946 | POLYGON ((440843.416 4644265.403, 440987.754 4... | 0.003515 |
3276 | 440843.416087 | 4.644015e+06 | 441420.766356 | 4.643515e+06 | 4147 | 216506.350946 | POLYGON ((440843.416 4643765.403, 440987.754 4... | 0.003525 |
3277 | 440843.416087 | 4.645515e+06 | 441420.766356 | 4.645015e+06 | 4144 | 216506.350946 | POLYGON ((440843.416 4645265.403, 440987.754 4... | 0.003403 |
3278 | 440843.416087 | 4.645015e+06 | 441420.766356 | 4.644515e+06 | 4145 | 216506.350946 | POLYGON ((440843.416 4644765.403, 440987.754 4... | 0.003472 |
3279 rows × 8 columns
%%time
result = normalization (result, resource_dropdown.value)
CPU times: user 2.28 ms, sys: 17 µs, total: 2.29 ms Wall time: 2.09 ms
result.head()
left | top | right | bottom | id | area | geometry | hospital_icu_beds | |
---|---|---|---|---|---|---|---|---|
0 | 440843.416087 | 4.638515e+06 | 441420.766356 | 4.638015e+06 | 4158 | 216661.173 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.925107 |
1 | 440843.416087 | 4.638015e+06 | 441420.766356 | 4.637515e+06 | 4159 | 216661.168 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.947280 |
2 | 440843.416087 | 4.639515e+06 | 441420.766356 | 4.639015e+06 | 4156 | 216661.169 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.952407 |
3 | 440843.416087 | 4.639015e+06 | 441420.766356 | 4.638515e+06 | 4157 | 216661.171 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.933176 |
4 | 440843.416087 | 4.640515e+06 | 441420.766356 | 4.640015e+06 | 4154 | 216661.171 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.940998 |
%%time
hospitals = hospitals.to_crs({'init': 'epsg:26971'})
result = result.to_crs({'init': 'epsg:26971'})
output_map(result, pop_data, hospitals, resource_dropdown.value)
CPU times: user 1.93 s, sys: 408 ms, total: 2.33 s Wall time: 1.73 s
def output_map_classified(output_grid, hospitals, resource):
ax=output_grid.plot(column=resource,
scheme='Equal_Interval',
k=5,
linewidth=0,
cmap='Blues',
figsize=(18,12),
legend=True,
label="Acc Measure",
zorder=1)
# Next two lines set bounds for our x- and y-axes because it looks like there's a weird
# Point at the bottom left of the map that's messing up our frame (Maja)
ax.set_xlim([325000, 370000])
ax.set_ylim([550000, 600000])
hospitals.plot(ax=ax,
markersize=10,
zorder=2,
c='black',
legend=True,
label="Hospital"
)
# Add hospital legend
output_map_classified(result, hospitals, resource_dropdown.value)
# save as image with file name including the resource value, population value, and buffered / not buffered
plt.savefig('./results/figures/reproduction/{}_{}_buff_classified_spdLimit.png'.format(population_dropdown.value, resource_dropdown.value))
The modifications introduced in this reanalysis shed light on key aspects of healthcare resource accessibility in Illinois. Notable changes were observed in the Classified Accessibility Map, particularly when considering different settings such as buffered areas, populations over 50 (pop50), and ICU beds. The zone of highest accessibility expanded, revealing that more healthcare resources were accessible to a larger population than initially assumed in the reproduction. Altering the setting to "COVID patients" reinforced the assumption that hospital beds and ventilators exhibit high correlation in influencing accessibility. The lack of substantial changes for COVID-19 patients suggests that the modifications might be more relevant for the population over 50, prompting further investigation into the relationship between the expanded road network and its impact on COVID patient data.
Expanding the road network beyond Chicago's limits, coupled with improved road speed dataset accuracy, played a pivotal role in the observed larger zones of accessibility. The expansion likely accommodated individuals residing outside the city who might access healthcare facilities within city boundaries. The modification of the speed limit dataset, using more accurate values based on road types, contributed to enhanced construct validity, rectifying the initial overgeneralization issue. The improved speed limit data, with segments ranging from 25 mph to 60-70 mph, showcased a more nuanced representation of road speeds, impacting the overall accuracy of accessibility results.
Implementation of Area Weighted Aggregation (AWR) in place of the 50% threshold from the original study and reproduction addressed uncertainties, offering a more refined representation in the hexagonal grid. This change not only improved the study's internal validity but also contributed to the observed increase in accessibility across various zones. The shift from a binary assessment of overlap to a more nuanced, weighted approach allowed for a more accurate depiction of accessibility.
However, despite the improvements, geographic threats to validity persist. The study grapples with partition distortion arising from hexagonal grid division and simplification of population data to centroids. A continued lack of consideration for multimodal transit introduces assumptions about patient transportation, potentially impacting accessibility results.
Additionally, the study remains susceptible to space-time limitations that assume standard speed limits for travel time calculations. Ambulance transport, traffic priority, and traffic variations introduce potential biases. These challenges underline the need for ongoing refinement in the methodology to enhance the study's overall validity.
Furthermore, the speed limit modification revealed a nuanced impact on accessibility, primarily affecting the population over 50. The smaller catchment zones resulting from slower average speed limits influenced accessibility maps for ICU beds in combination with the population over 50. However, these modifications had minimal impact on accessibility for COVID-19 patients, reinforcing the original assumption of a high correlation between ICU beds and ventilators.
The alternate accessibility map showcased differences in accessibility patterns, emphasizing the importance of area-weighted reaggregation. By addressing the modifiable areal unit problem, the reanalysis provided a more accurate representation of hospital accessibility, leading to higher content validity. However, challenges persist, such as the potential underrepresentation of people living within the extended catchment areas due to reliance on centroid-based population data.
In conclusion, our reanalysis of Kang et al. 2020 has significantly contributed to refining the original study, addressing key issues, and enhancing its reproducibility and validity. One major improvement focused on mitigating the threat of partition distortion inherent in the hexagonal grid representation of the study area. Our findings suggest that further investigation into the impact of varying hexagon sizes on distortion and spatial accessibility could provide valuable insights, promoting a more accurate representation of healthcare resource accessibility. Moreover, it is important to understand that while the results are more precise in our study than the original results, the overall accessibility does not change drastically.
Expanding the scope of the study, we recognized the limitation of relying solely on a transportation network based on personal vehicles. The study's assumption neglects significant portions of the population that depend on alternative modes of transportation such as public transit, walking, or biking. To bolster the study's applicability, we recommend incorporating a more diverse transportation network to better capture accessibility patterns, especially among underserved low-income populations.
Additionally, the modification addressing the time-space threat to validity introduced a more nuanced understanding of how traffic congestion at different times of the day influences access to healthcare resources. Incorporating traffic data allowed for a more accurate estimation of travel times along busy road segments, shedding light on temporal variations in accessibility.
The introduction of speed limit adjustments in our reanalysis further bolstered the validity of the study, specifically enhancing the accuracy of speed limit data. Notably, this modification influenced the spatial accessibility results for the population over 50, highlighting the importance of considering variations in speed limits, especially in residential areas. Caution should be exercised when interpreting the original study results, as they may have wrongly inflated accessibility metrics due to higher default speed limits.
Our reanalysis underscores the value of refining and improving upon existing methodologies, which aligns with the philosophy of reproduction and replication studies. While our modifications have advanced the study, it is crucial to acknowledge that there are still opportunities for further enhancement and exploration. Future research endeavors could delve into refining hexagonal partitioning, incorporating a diverse transportation network, and continuing to address threats to validity.
Luo, W., & Qi, Y. (2009). An enhanced two-step floating catchment area (E2SFCA) method for measuring spatial accessibility to primary care physicians. Health & place, 15(4), 1100-1107.