article · 2026-06-02 · ~4 min · frozen TLC aggregate

The taxi data found the cellular dead zones

Every yellow cab logs its trip to the TLC's central server in real time. When the cellular signal drops, the meter buffers the trip locally and uploads it later. The TLC published the flag that marks these buffered trips. They probably did not realize they were also publishing a map of NYC's cellular dead spots.

The Yellow Cab trip record schema includes a column called store_and_fwd_flag. Almost always it's 'N' — the meter logged the trip's start, distance, fare, and end position to the central server in real time. Sometimes, though, it's 'Y'. That means the meter couldn't reach the server when it tried, so it stored the trip locally on the device and forwarded it once cellular service came back.

The flag was added for operational reasons — the TLC needs to know which trip records came in via the buffered path so they can audit timing for fare discrepancies. The flag was not designed as a connectivity-monitoring tool. But because the meter only sets the flag when it can't reach the network at the moment of the trip start, the spatial distribution of flagged trips is an extremely accurate map of where in the city cellular service is unreliable for vehicles in motion.

What it picks up

Aggregated by TLC taxi zone across 65,703 forwarded trips from 2016-06, the zones with the most buffered trips are:

forwarded trips · top 10 pickup zones
01.3K2.5K3.8K5.0KLaGuardia Airport: 3,6583.7KJFK Airport: 3,4133.4KMidtown Center: 2,3642.4KTimes Sq/Theatre District: 2,2682.3KMurray Hill: 2,2552.3KUpper East Side South: 2,1872.2KMidtown East: 2,0592.1KPenn Station/Madison Sq West: 2,0192.0KUnion Sq: 1,9351.9KUpper East Side North: 1,9251.9KLaGuardia AirportJFK AirportMidtown CenterTimes Sq/Theatre DistrictMurray HillUpper East Side SouthMidtown EastPenn Station/Madison Sq WestUnion SqUpper East Side North

The zones that rise to the top match where veteran cab drivers will tell you their meters glitch — midtown west of Penn Station, the tunnel approaches, the lower-Manhattan canyons, Central Park, the industrial edges of Long Island City and Sunset Park. The TLC published sub-zone coordinates from 2009 through mid-2016, when a 500-metre hex-bin view exposed those patterns down to the block; the agency re-encoded the archive to a zone-only schema in 2022, so this map now lives at zone granularity. The thesis survives the coarsening: the same zones still light up.

The accidental surveillance layer

Cell carriers do not publish coverage maps at this resolution. The FCC's broadband-deployment data is too coarse and too laggy for vehicle-mobility-relevant analysis. There is, in fact, no public dataset that says "here's where your phone won't work in NYC, by neighbourhood." But there is now an indirect one, built on a column added for unrelated reasons by an unrelated agency.

This is one of the more delightful patterns in open data: a dataset designed for one purpose becomes useful for an entirely different purpose because of what it incidentally records. DOHMH didn't design its restaurant inspection records to function as a pest-distribution map. The MTA didn't design its O-D ridership reconstruction to function as a commuter-geography research tool. The TLC didn't design its store-and-fwd flag to function as a cellular- coverage map. All three repurposings happen because the underlying data is sufficiently granular and sufficiently public.

Whether anyone — a carrier, the city, residents — should do anything with this map is a separate question. The point is that it exists, hidden in plain sight in a column the TLC has been publishing since 2009. Open data does this. The interesting use cases are rarely the ones the publisher had in mind.

Source: NYC TLC Yellow Cab trip records (Parquet via AWS Open Data). Aggregates pre-baked by scripts/build-tlc-aggregates.ts and committed to static/nyc/tlc-aggregates/. Source month: 2016-06.