15 Mapping the geographic distribution of names

The goal of this exercise is to collect data on the geographic distribution of surnames in Germany. Such maps are a popular visualization in genealogical and onomastic research, that is, research on names and their origins (Barratt 2008; Christian 2012; Osborn 2012). It has been shown that in spite of increased labor mobility in the last decades, surnames that were bound to a certain regional context continue to retain their geographic strongholds (Barrai et al. 2001; Fox and Lasker 1983; Yasuda et al. 1974). Apart from their scientific value, name maps have a more general appeal for those who are interested in the roots of their namesakes. Plus, they visualize the data in one of the most beautiful and insightful ways—geographic maps.

In this chapter, we briefly introduce the visualization of geographic data in R. This can be a difficult task, and if your data do not match the specifications of the data treated in this chapter, we recommend a look at Kahle and Wickham (2013) and Bivand et al. (2013b) for more advanced visualization tools of spatial data with R. In order to acquire the necessary data, we rely on the online directory of a German phone book provider (www.dastelefonbuch.de). As a showcase, we visualize the geographical distribution of a set of surnames in Germany. The goal is to write a program that can easily be fed with any surname to produce a surname map with a single function call. Further, the call should return ...

Get Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.