New York City
Stories from NYC Open Data, NYS Open Data, and the MTA — twenty-four million 311 records, three quarter-million tax lots, every restaurant inspection, the lead-pipe inventory, the subway's tidal flow, and 1.5 billion taxi trips.
Stories
What the inspectors heard
DOHMH's restaurant inspection data carries a cuisine_description column — one of the few structured cuisine fields on any city portal. Scroll through pizza, Chinese, Latin American, and Japanese density across NYC, and the neighborhoods name themselves.
NYC's decade of pipe work
231,000 properties. One EPA deadline: 2037. Framing NYC's lead service line replacement program as a construction-industry challenge — which boroughs carry the heaviest load, and what annual pace the city needs to hit.
After the parade
The 2024 NYC Pride march concluded on June 30 around 6 PM. In the four hours that followed, hundreds of thousands of attendees dispersed to bars, dinners, after-parties, and homes across the five boroughs. The taxi drop-off pattern shows you exactly where the post-parade economy lives.
Boundaries that don't match
Stand on the corner of any block in New York City, and you are simultaneously inside at least seven different administrative jurisdictions — none of which share boundaries with any of the others. The geographic fragmentation is one of the under-appreciated reasons NYC civic data is so hard to use, and so easy to misuse.
The taxi data found the cellular dead zones
Every yellow cab logs its trip to the TLC's central server in real time. When the cellular signal drops, the meter buffers the trip locally and uploads it later. The TLC published the flag that marks these buffered trips. They probably did not realize they were also publishing a map of NYC's cellular dead spots.
The upzoning paradox
Between 2002 and 2010, New York City upzoned hundreds of blocks — increasing legal density, raising allowed building heights, opening capacity for thousands of new residential units. Eight years later, an academic team rolled the data forward to see what the rents had done. The answer punctured one of the most durable assumptions in U.S. urban policy.
What the congestion toll did to yellow cabs
On January 5, 2025, the Congestion Relief Zone toll went live. Battery Park lost 40% of its yellow cab pickups. World Trade Center dropped 22%. The TLC trip records show which zones the toll hit hardest — and which barely moved.
The tip tells you where you are
Yellow cab tip percentages by pickup zone don't track the income map as neatly as you'd expect. Airport runs, tourist corridors, and short hops have their own tipping logic — all of it baked into every credit card receipt since 2008.
NYC at 3 am
Every yellow cab drop-off between midnight and 5 am in 2023, aggregated by zone. The East Village handles more late-night arrivals than most of the outer boroughs combined. The nocturnal city has a geography — and it's not where you think.
The black car takeover
In 2017 there were more yellow cab trips than Uber and Lyft combined. By 2023 it wasn't close. Seven years of TLC data tells the story of the largest disruption in urban transportation since the car replaced the horse.
The taxi data is coming
1.5 billion rows of NYC taxi trips. The largest mobility dataset any U.S. city publishes — and the first to include the new Manhattan congestion-toll field. Why it doesn't fit our live-Socrata pattern, and what the planned pipeline looks like.
Less than 2 percent
Drug activity, drinking, disorderly youth, graffiti — the categories most invoked when 311 gets framed as a 'social disorder hotline' — together account for under 2% of NYC's 311 calls. Noise alone is roughly 30%. The chaos isn't disorder; the chaos is plumbing.
The unknown pipes
NYC's lead service line inventory looks like a public health story. Scroll through the density maps and it starts to look like a construction program — 231,000 properties across five boroughs, all with a 2037 deadline. The 'Unknown' classification is where the mandate gets complicated.
Mice vs roaches
DOHMH doesn't track pest type as a column on its restaurant inspection data. But it's all there in the violation descriptions, parsed by the inspector. Mice, roaches, flies — the urban biome of NYC's kitchens, mapped per borough.
The algorithmic city
Every year NYC publishes a list of the algorithmic tools its agencies use to make decisions affecting residents' rights and benefits. Local Law 35, the ACS predictive risk-score controversy, the GUARD Act response — and the next horizon of open data.
Twenty-seven months in the Bronx
A 180-unit affordable building was complete in 2022. Eighteen months after the lottery had closed and the waitlist filled, no one had moved in. The lease-up bottleneck that suppresses affordable housing availability for years after construction is physically done.
Who owns this building?
Twelve buildings citywide generated more than 20 Class C — immediately hazardous — housing violations apiece since 2024. The named owner is always an LLC. The beneficial owner is always findable. The corporate-veil-piercing pattern at the heart of NYC tenant advocacy.
The sound of the city
NYC's noise complaints have grown every year since 2010 — population is roughly flat, awareness was already high, but the calls keep coming. Epidemiologists treat the 311 noise feed as a city-scale environmental surveillance layer. The growth is a public-health signal.
The subway tide
Four million weekday riders. The MTA used to know where they boarded but not where they got off — turnstiles only read entries. Then they built an algorithm. The cleanest public view of NYC's transit circulatory system that has ever existed.
Datasets
NYC 311 service requests
Live snapshot of New York City's 311 service requests, twenty-four million rows from 2010 to today. Queried via SODA v3 against data.cityofnewyork.us through our Cloudflare worker proxy. Noise dominates; less than 2% is what most people would call 'social disorder.'
NYC TLC taxi trip records
One-and-a-half billion yellow / green / FHV trips since 2009. Stories use build-time DuckDB aggregates. The Playground tab runs DuckDB WASM in the browser — ad-hoc SQL against remote Parquet, no server required.
PLUTO — every NYC tax lot
The Department of City Planning's Primary Land Use Tax Lot Output. ~860K tax lots, ~70 fields each — zoning district, land use, building class, year built, residential units, assessed value. The substrate beneath nearly every quantitative urban-policy paper written about NYC.
NYC Restaurant Inspections
Every sustained violation issued to every food establishment by the Department of Health and Mental Hygiene. One row per violation per inspection. The grade card hung in your favorite spot's window comes from this dataset — and the famous 1900-01-01 placeholder dates.
NYC Lead Service Line Inventory
Per-property classification of which NYC buildings are still served by lead pipes. Published per the EPA's 2024 Lead and Copper Rule Improvements. The headline isn't the lead count — it's the staggering "Unknown" classification, the public-health data void at the heart of the city's 2037 replacement deadline.
HPD Maintenance Code Violations
Every Housing Maintenance Code violation issued by HPD. Joins to PLUTO via BBL. The substrate beneath every "worst landlord" feature, the join key for tenant-advocacy tools that pierce LLC corporate-veil opacity to identify serial offenders.
HPD Affordable Housing Production
Every affordable housing project the city has financed under Housing New York and successor programs. Unit counts segmented by AMI band — what "affordable" actually means depends on which bands you include in the headline.
MTA Subway Origin-Destination
The MTA's algorithmic reconstruction of where 4M daily subway riders actually go. Turnstiles only capture entries; exits are probabilistically inferred from each rider's next entry. The cleanest public view of NYC's transit circulatory system.
NYC LL35 Algorithmic Tools Report
The city's annual algorithmic-tools disclosure required by Local Law 35 of 2021. AI/ML systems used by city agencies that affect rights, liberties, benefits, or safety — including the controversial ACS predictive risk scores that prompted the GUARD Act response. Static editorial; no live adapter.