UrbanFlowDB: A Multimodal Urban Mobility Database for Traffic, Transit, and Micromobility Intelligence

Xiaolong  Pan; Ruoxi  Jiang; Tao  Cheng; Yanmei  Xu

doi:10.63646/datamind.2023.010204

Open Access PDF

Published 2023-06-30

Xiaolong Pan

School of Transportation Engineering, Chang’an University, Xi’an 710064, China

Ruoxi Jiang*

College of Civil Engineering, Fuzhou University, Fuzhou 350108, China
jiang.ruoxi@fzu.edu.cn

Tao Cheng

School of Geographic Sciences, East China Normal University, Shanghai 200241, China

Yanmei Xu

School of Computer Science, Northwest A&F University, Yangling 712100, China

DOI: https://doi.org/10.63646/datamind.2023.010204

Abstract

Urban mobility intelligence increasingly depends on the joint analysis of transit smart card transactions, taxi GPS probes, shared-bike trips, road-side sensor counts, and meteorological observations. Yet these five data sources are typically curated in isolation, stored in incompatible formats, indexed by incompatible spatial and temporal keys, and exposed under inconsistent privacy regimes, which makes integrated analytical workflows unnecessarily fragile. This article presents UrbanFlowDB, a multimodal urban mobility database that treats the database itself as the principal research artifact. We document the schema, the field dictionary, the spatiotemporal index family, the ingestion and quality control pipeline, the pseudonymization and ethics processing flow, and the reusable application programming interfaces that expose the integrated data to downstream models. The database is co-resident across a Parquet-plus-Delta lakehouse, a PostGIS-extended relational store, a Neo4j property graph for congestion-propagation analysis, and a pgvector index for trajectory similarity search; this polyglot layout is deliberately chosen because each mobility analytical pattern aligns most naturally with a different storage paradigm. We benchmark the database on a runnable urban experiment using one year of data from a Chinese second-tier city (1.42 billion transit taps, 396 million taxi GPS pings, 21.6 million dockless bike trips, 8.4 million sensor records, 215,860 weather observations) and demonstrate that UrbanFlowDB lowers origin-destination demand prediction RMSE from 23.6 to 18.9 trips per 15-minute window relative to the strongest baseline, raises congestion early-warning F1 from 0.793 to 0.851, and reduces trajectory imputation error by 35.4 percent at 30 percent missing rate. End-to-end ingestion latency is below 19 seconds at the 95th percentile for all five sources, and the system sustains 14,200 trajectory queries per second on the production-scale dataset. The schema, dictionaries, and reproduction scripts are released under an open license.

Keywords: urban mobility; smart card transit; taxi GPS trajectory; shared bike; spatiotemporal database; origin–destination prediction; congestion propagation; multimodal data integration

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Pan, X., Jiang, R. ., Cheng, T., & Xu, Y. (2023). UrbanFlowDB: A Multimodal Urban Mobility Database for Traffic, Transit, and Micromobility Intelligence. DATAMIND, 1(2), 33-46. https://doi.org/10.63646/datamind.2023.010204

Download Citation

Article sidebar

Main article

Abstract

Article details

How to Cite