Stadestér is an internal urban database of roughly ~32.000-40.000 real-world cities and their population figures from 3000BC-2025AD, developed as part Operation HALIENNE to formalise the upgrade of both Eoscala/Velkscala to 1.5 and 1.0 respectively. Data was primarily sourced from Populstat, Reba et al., Chandler and Modelski, as well as Wikipedia, with future OSM addition plans.
This data was then geolocated using Google Maps. Manual search tools and population figure patching for duplicate names (i.e. Birmingham) was also applied to the Reba et al. dataset alongside various other metadata fixes. As of 2 May 2025, Chandler, Modelski, and Populstat all have cleaned and geolocated databases that are included for use. Unlike Eoscala/Velkscala, Stadestér runs on a separate CLI and UF software stack.
1. Wikipedia Cleaning.
- When compared to the closest data point or cubic spline of Populstat's true figures (in thousands), if Wikipedia data deviates by a factor of more than 4 (i.e. one quartile), it should be scaled by dividing by thousands, then millions. DONE
- If none of these sigfigs now fit the given data by that metric, the Wikipedia data should be rendered void. DONE
- To standardise metro with urban areas (which is the value that Populstat uses whilst Wikipedia is inconsistent), Wikipedia populations should be mean-scaled to Populstat data. DONE
2. Chandler/Modelski Cleaning.
- Develop a function to link Chandler/Modelski city names to Populstat city names. Note here that getPopulstatCity() may be used and iterated over until a suitable placement is found (iterate over both .City and .OtherName columns wherever possible). DONE
3. UUD Merging.
- Merge compatible Wikipedia and Populstat data: where it continues to overlap, take the geometric mean.
- Remove any city entries that have duplicate .coords with each other. When merging these city entries, merge them into a single .population that contains unique year data points not otherwise included.
- Apply cubic spline interpolation to the resultant population graph over the given domain. DONE
- Subtract suburban population from the metropolitan agglomeration for overlapping years. This prevents double-dipping.
- GHS_UCDB_MTUC_GLOBE_R2024A.xlsx data should also be integrated into the given UUD where possible.
- Target keys in .csv: MT_POP_TOT_1975 MT_POP_TOT_1980 MT_POP_TOT_1985 MT_POP_TOT_1990 MT_POP_TOT_1995 MT_POP_TOT_2000 MT_POP_TOT_2005 MT_POP_TOT_2010 MT_POP_TOT_2015 MT_POP_TOT_2020 MT_POP_TOT_2025 MT_POP_TOT_2030
- Output UUD in a separate outputs/file.
The resultant merger from Chandler/Modelski/Wikipedia cleaning should be placed in a separate UUD (Unified Urban Database).
1. Raster Output.
- Map all latlng coords to an equirectangular projection akin to HYDE. For any non-land cells, exclude the given city and give a warning for it.
- Rasters should be per HYDE interval only. Do not have more granular data, even though more granular data exists - there should be options for extending the data to 1-year intervals wherever needed.
2. Raster Output Post-processing.
- Make sure to take the logarithmic median absolute deviation (MAD), k=3 threshold such that it follows a Zipfian power-law. This should remove any statistical outliers during raster processing.
- Remove any urban pixels that are not on tiles with land area (check HYDE from Eoscala/Velkscala to fetch the land area map).
- Use https://human-settlement.emergency.copernicus.eu/ghs_ucdb_2024.php alternatives for 1975-2025 (GHS-UCDB). These should be separately processed rasters.
- Scale Stadestér rasters to HYDE.
3. Use and output separate GHS population grid rasters for 1975-2025.
- Translate this at 5-arcminute resolution to the main dataset. This should be done by using QGIS to convert these rasters down to .asc and reading from them (like we did with HYDE3.3).
After this step is complete, we should have urban population rasters. This allows us to move onto urban radial density scaling (at least per region or time period), and helping to define these functions.