NYC Mobility
The MTA O-D dataset is the cleanest public reconstruction of NYC's transit circulatory system that exists, built from probabilistic inference on entry-only turnstile data. The TLC trip records are a separate beast — too big to serve live, distributed as Parquet, and the subject of an ongoing pipeline-design effort documented in the TLC placeholder story.
Stories
After the parade
The 2024 NYC Pride march concluded on June 30 around 6 PM. In the four hours that followed, hundreds of thousands of attendees dispersed to bars, dinners, after-parties, and homes across the five boroughs. The taxi drop-off pattern shows you exactly where the post-parade economy lives.
The taxi data found the cellular dead zones
Every yellow cab logs its trip to the TLC's central server in real time. When the cellular signal drops, the meter buffers the trip locally and uploads it later. The TLC published the flag that marks these buffered trips. They probably did not realize they were also publishing a map of NYC's cellular dead spots.
What the congestion toll did to yellow cabs
On January 5, 2025, the Congestion Relief Zone toll went live. Battery Park lost 40% of its yellow cab pickups. World Trade Center dropped 22%. The TLC trip records show which zones the toll hit hardest — and which barely moved.
The tip tells you where you are
Yellow cab tip percentages by pickup zone don't track the income map as neatly as you'd expect. Airport runs, tourist corridors, and short hops have their own tipping logic — all of it baked into every credit card receipt since 2008.
NYC at 3 am
Every yellow cab drop-off between midnight and 5 am in 2023, aggregated by zone. The East Village handles more late-night arrivals than most of the outer boroughs combined. The nocturnal city has a geography — and it's not where you think.
The black car takeover
In 2017 there were more yellow cab trips than Uber and Lyft combined. By 2023 it wasn't close. Seven years of TLC data tells the story of the largest disruption in urban transportation since the car replaced the horse.
The taxi data is coming
1.5 billion rows of NYC taxi trips. The largest mobility dataset any U.S. city publishes — and the first to include the new Manhattan congestion-toll field. Why it doesn't fit our live-Socrata pattern, and what the planned pipeline looks like.
The subway tide
Four million weekday riders. The MTA used to know where they boarded but not where they got off — turnstiles only read entries. Then they built an algorithm. The cleanest public view of NYC's transit circulatory system that has ever existed.
Datasets
NYC TLC taxi trip records
One-and-a-half billion yellow / green / FHV trips since 2009. Stories use build-time DuckDB aggregates. The Playground tab runs DuckDB WASM in the browser — ad-hoc SQL against remote Parquet, no server required.
MTA Subway Origin-Destination
The MTA's algorithmic reconstruction of where 4M daily subway riders actually go. Turnstiles only capture entries; exits are probabilistically inferred from each rider's next entry. The cleanest public view of NYC's transit circulatory system.