scrollytelling · ~5 min · static OD snapshot

The subway tide

Four million subway riders move through New York City every weekday. The MTA used to know where they boarded but not where they got off. Then they built an algorithm.

The destination problem

For its entire pre-2024 history, the NYC subway captured ridership data the way it captured fares: at the turnstile. You swipe in. The system records the entry. You travel. You exit through a turnstile that doesn't read your card. The system has no idea where you went.

This was, for transit-planning purposes, a catastrophe. You can't optimize service routes against unknown destinations. You can't build evidence-based cases for new station investments if you can't show how riders move. You can't even compute basic things like average trip distance with anything but FOIL'd survey data and modeling assumptions. The MTA had a high-resolution origin dataset and a complete absence of destination data, and the informational asymmetry shaped two decades of transit politics in the city.

The algorithm

The agency's response, codified under the MTA Open Data Act, was to infer destinations. The reasoning runs like this: every subway rider eventually swipes in again somewhere, to start a new trip. That second entry is presumably from a station near the destination of the first trip. By analyzing the time, location, and frequency pattern of each rider's subsequent entries, the agency can probabilistically reconstruct exit locations. The probability is highest for commuters with predictable round-trip patterns and lowest for occasional or one-off riders, but in aggregate it produces a usable origin-destination matrix.

The result is the dataset this story draws from: every (origin, destination, hour, day-of- week) tuple, weighted by estimated average ridership. From it, you can reconstruct the city's daily commuter geography in a way that's never been publicly available before.

First, the whole weekday

The origin-destination matrix turns the subway from a set of station counts into a field of currents. Each line is a station-pair relationship, weighted by estimated riders, and the largest flows immediately trace the trunk geography of the system.

This is the baseline: Monday through Friday, all hours, excluding trips that start and end at the same complex.

Morning pulls inward

Between 7 and 9 AM, the visible flow tightens toward Manhattan and the transfer complexes that feed it. The map is not measuring turnstile entries alone; it is showing where those riders are estimated to land.

That distinction is the point of the dataset. The morning commute is not just many people entering the system. It is many borough-to-core relationships firing at the same time.

Then the pressure reverses

From 4 to 7 PM, the same network releases riders back outward. Some routes do not simply mirror the morning because errands, transfers, nightlife, and hybrid schedules bend the return trip, but the tidal structure is still visible.

The story is less "one downtown" than a set of repeated directional pulses through the stations that can absorb and redistribute the load.

The hubs hold the pattern together

Filter the map to flows involving Penn Station, Times Square, Grand Central, Atlantic Av-Barclays, and Fulton Street, and the abstraction becomes more legible. These complexes are not merely busy destinations; they are routing machinery.

They concentrate the city's movement into places where line choice, transfers, and job geography meet. The bars below prove the same structure numerically.

Weekday system baseline 300 flows

The tide

The hour-of-day distribution is the cleanest visible signature of the morning-evening tidal pattern that defines the system's rhythm:

estimated average daily ridership by hour

The morning peak around 8 AM (~21,735,742.088 estimated riders/hour at the busiest), the evening peak from 5–7 PM (~23,552,707.493), and the pre-dawn trough at 3–5 AM (~441,448.22 — orders of magnitude lower) tell you what every New Yorker already knows experientially: this is fundamentally a commuter system. The off-peak ridership is real and growing, but the system's load-bearing function is moving people from residential boroughs to Manhattan business districts in the morning and back out at night.

Where they board

estimated average daily ridership

Top origin station complexes

Times Sq-42 St (N,Q,R,W,S,1,2,3,7)/42 St (A,C,E)

8.6M
Grand Central-42 St (S,4,5,6,7)

6.9M
34 St-Herald Sq (B,D,F,M,N,Q,R,W)

4.7M
14 St-Union Sq (L,N,Q,R,W,4,5,6)

4.3M
Fulton St (A,C,J,Z,2,3,4,5)

3.9M
34 St-Penn Station (A,C,E)

3.7M
34 St-Penn Station (1,2,3)

3.3M
59 St-Columbus Circle (A,B,C,D,1)

3.1M
Flushing-Main St (7)

2.7M
Chambers St (A,C)/WTC (E)/Park Pl (2,3)/Cortlandt (R,W)

2.7M
74-Broadway (7)/Jackson Hts-Roosevelt Av (E,F,M,R)

2.6M
47-50 Sts-Rockefeller Ctr (B,D,F,M)

2.6M

0 8.6M max

The top origin stations are where commuters live (or transfer through on their way in from further out). The top destination stations are where commuters work. Plotted side by side, the asymmetry is the asymmetry of NYC's labor geography:

estimated average daily ridership

Top destination station complexes

Times Sq-42 St (N,Q,R,W,S,1,2,3,7)/42 St (A,C,E)

8.5M
Grand Central-42 St (S,4,5,6,7)

6.8M
34 St-Herald Sq (B,D,F,M,N,Q,R,W)

4.8M
14 St-Union Sq (L,N,Q,R,W,4,5,6)

4.6M
Fulton St (A,C,J,Z,2,3,4,5)

4.0M
34 St-Penn Station (A,C,E)

3.5M
59 St-Columbus Circle (A,B,C,D,1)

3.3M
34 St-Penn Station (1,2,3)

3.1M
47-50 Sts-Rockefeller Ctr (B,D,F,M)

2.8M
Chambers St (A,C)/WTC (E)/Park Pl (2,3)/Cortlandt (R,W)

2.7M
Lexington Av-53 St (E,M)/51 St (6)

2.7M
Bryant Pk (B,D,F,M)/5 Av (7)

2.6M

0 8.5M max

The names of these complexes — Times Sq-42 St, Union Sq, Grand Central, Penn Station — are the daily commuter geography of NYC. The same pattern would have looked very different even five years ago, before the MTA's algorithmic O-D reconstruction made the asymmetry measurable. Pre-2024, "where do my riders go?" was a survey question. Now it's a SoQL query.

What the algorithm gets wrong

The destination data is an estimate, not a measurement, and it has predictable failure modes. Riders who don't make a return trip within the inference window — visitors, occasional travelers, anyone who took a one-way trip and used a different mode to come back — fall back to population-statistical averages. This means low-frequency riders are systematically less well-represented in destination data than commuters. Late-night and weekend trips are estimated less accurately than weekday commute hours. None of which makes the dataset useless — it makes it the cleanest version of an estimate that was previously unavailable in any form at all.

That uncertainty is why the visual story moves from animated flows back to bars and rankings. The map shows the structure; the charts pin it down with the aggregates the agency publishes. Together, they describe a city that breathes with the workday at a resolution we couldn't see two years ago.