• We begin by aempting to nd census places in the NHGIS place point data, and we then
proceed to look in the GNIS le, where we consider the following feature classes in order:
populated place, locale, civil, census, area, beach, harbor, island, military, mine, park, post
oce, unknown, basin, bay, falls, rapids, reserve, reservoir, ridge, spring, stream, valley.
e classes are ordered in terms of importance. at is, we rst try to match the census
place names to cities in the GNIS feature class ‘populated place’, then we match the census
place names to locales, and so on.
• For each class of places in GNIS, we rst drop the “distant duplicates,” which we dene as
all the places that have the same exact name and are more than 5km apart, because we are
unsure which place to match to. Note that if two places have the same name and are within
5km, we keep the rst listed GNIS place.
• We look for matches in the NHGIS and GNIS features (in the order described above), rst
completing round 1 for NHGIS places, GNIS populated places, GNIS locales, GNIS civil
features,… until we match GNIS valleys. We then proceed to round 2, searching for NHGIS
places, then GNIS populated places,… We then proceed to round 3.
• We have an extra step for the GNIS data where we take all places with duplicates more
than 5km apart and we aempt to match our raw census place name to the GNIS duplicate
in the correct county. If there are multiple matches within the same county, we keep the
match with the lowest latitude.
• For 1850 data and all census years at or aer 1880, the census data contains enumeration
districts and we use them to impute the coordinates of towns that we are not able to geocode
in the previous steps. e procedure works as follows:
– if an as-of-yet ungeocoded place is in the same state, county, and enumeration dis-
trict as one (or more) place with known latitude and longitude, we assign the place
name the same longitude and latitude as the already geocoded place (if multiple places
within the same enumeration district have already been successfully geocoded, we use
the mean of the previously geocoded coordinates);
– if an as-of-yet ungeocoded place is in an enumeration district that is numerically
between two (non-necessarily adjacent) enumerations districts that contain already-
geocoded places and if the distance between these two enumeration districts is smaller
than 50km, then we assign to the ungeocoded place the mean of the means of the
already geocoded latitudes and longitudes of places in those enumeration districts.
Note that this corresponds to the midpoint between the two mean coordinates of the
geocoded enumeration districts. Note that we always use the average. For example, if
we have enumeration districts 1, 2, 3, 4 and we have geocoded coordinates for 1 and 4
that are within 50km of each other, then enumeration districts 2 and 3 would receive
the same coordinates, equal to the average of the coordinates of geocoded places in
enumeration districts 1 and 4.
33