Given a block's coffee shops, schools, and street layout, can you tell whether it is residential, commercial, or something else, and how much of the answer lives in the spatial structure itself? A tutorial published this week on MarkTechPost walks through exactly that question with a reproducible pipeline, and the methodological comparison inside it is the real story.
The task is called urban function inference: given a point of interest in a city, predict its land-use category from spatial context. Instead of treating the city as a grid of points, the tutorial represents it as a network of connected entities, where intersections, streets, and points of interest are nodes linked by relationships a graph neural network, a model class that learns over network-shaped data such as streets and points of interest, can read. This is a working approach, not a research demo, and the tutorial makes the whole workflow runnable in Python.
The stack is three layers. OSMnx, a Python library, pulls live street networks and points of interest from OpenStreetMap, the volunteer-built map of the world. city2graph, the pipeline's construction piece, converts that raw geospatial data into graph structures suitable for machine learning. PyTorch Geometric, a graph-learning extension to PyTorch, handles training and inference. The model trained on top is GraphSAGE, a graph neural network architecture that learns a vector representation for each node by sampling and aggregating features from that node's neighbors.
The methodological hinge sits in how city2graph builds the graph. The tutorial engineers spatial features and constructs multiple proximity graph families, then explicitly compares how different graph-building strategies represent the same urban environment. It also builds both heterogeneous graphs, which have multiple node and edge types, with a point of interest as one kind of node and a street intersection as another, and homogeneous graphs, which collapse everything to a single node and edge type, then trains GraphSAGE on each.
That comparison is the publishable claim, not the absolute accuracy number. Which proximity graph family you pick, and whether you preserve the difference between a coffee shop and a street corner, materially changes what the model can learn. Planners and analysts who care about land-use inference can now test those choices in code rather than treat the graph as an implementation detail.
The honest limits sit in plain sight. The tutorial includes a synthetic points-of-interest data fallback so the workflow still runs when live OpenStreetMap coverage is incomplete. Results come from a fixed random seed, a single model family, and a coarse land-use label space. There is no held-out city test and no independent benchmark. The comparison is the point, and the comparison is reproducible; the conclusion that one graph construction beats another for land-use inference in general is not.
What to watch next: whether the same pipeline gets run on a second city with a different street and points-of-interest density, and whether the heterogeneous-versus-homogeneous gap holds up across that test. The city2graph library and the upstream pieces are version-controlled, so the right follow-up is methodological. More cities, more seeds, more graph families, and the answer is in the code.