Foundations of spatial data in R
We are in the age of data. Everywhere around the globe data professionals in various roles are in high demand. As a data professional, I try to prepare you for all sorts of difficult tasks. One of those tasks is to deal with spatial data.
Spatial data, by definition, is a type of data that is associated with a particular location. This location can be anywhere around the globe and can be recorded in various formats.
I felt the necessity of writing this short introduction because spatial data analysis is used more than ever. It is one of the most essential skills and every data professional should be equipped with the right skillset to import, manipulate, analyze, visualize and export spatial data.
This post aims to give a tutorial on spatial data analysis. I have tried to keep it as short as possible to make it readable. I have to say that this is not a complete spatial data tutorial but is rather a tour of R’s spatial data ecosystem to get you started quickly. By the end of it, I hope you will be able to handle spatial data properly. Let’s get started!
In the Geographical Information System (GIS) literature there are two main types of data: Vector and Raster.
Humankind has been interested in recording geographic entities for a very long time. The invention of the geographic coordinate system with latitude and longitude, as we use it right now, is attributed to Eratosthenes of Cyrene, an ancient Greek philosopher, in the 3rd century BC. That system spread around the world and geographers started to communicate with each other by using it for centuries. Just after the invention of modern computers and the popularity of database systems digital geographical information systems gained a reputation.
Vector data can be perceived exactly as the digitization of communicating with coordinates. Just as people have shared spatial information by writing its coordinate on paper, now they share it by writing the coordinates on files. That is as simple as that.
There are three subcategories of vector data: point, line, and polygon.
A single piece of coordinate is a point. Usually, houses, cars, and places where a particular incident occurs are represented by a point. A series of points connected to each other in line. Roads, rivers, cables, and pipelines are perfect examples of line data. An enclosed area, created by connecting a number of lines is a polygon. Neighborhoods, cities, and countries are examples of polygon data.
After WWII the USA and the USSR started the so-called “Space Race”. It was a competition for conquering the space between these two superpowers, as an extension of the Cold War. The mission was to explore outer space. But the satellites took many photos of the earth as well. The first photo of the earth from space, the Blue Marble, was taken by the crew of Apollo 17 from a distance of about 29,000 kilometers. Since that time numerous different spaceships took photos of the surface of the earth. Those photos were the early examples of raster data.
Raster data, also known as grid data, is a spatial data type that is created by taking photos of the earth from the sky. Raster data is stored as a grid of pixels (sometimes they are called cells), where the grid is an array of rows and columns. Satellite images and aerial photographs are the perfect examples of raster data.
Raster data is divided into two: Single-Band and Multi-Band (or Single-Layer and Multi-Layer). If a raster data has only one grid of pixels it is called a single-band raster. But sometimes raster data contain information on more than one dimension. In these situations, there are grids as many as the number of different information, of the same size, on top of each other. Then they are called multi-band rasters.
Under this topic, we will cover how we can import spatial data in R. The criteria for the methods we discuss here is universality. What do we mean by that? All of the methods here are for retrieving spatial data of anywhere around the globe.
There are some excellent packages and APIs for specific parts of the world. For example our American readers can easily get spatial data of their country with “
urbnmaprpackage. It is very intuitive and easy to use. Packages like this are out of the scope of this article.
The most common file format of vector data is Shapefile with the extension “.shp”.
sppackage has been the de facto package for importing vector data over years.
sfpackage is the successor of
sp. Both of them are created by Edzer Pebesma, a professor at The University of Münster.
We will import vector Shapefile data of London. You can download the data from the link below.
Here we import the libraries, set the working directory, and import the data.
setwd(dir = "C:/Users/ugurc/Desktop/Medium Blog/Geo-Spatial Data in R/StatPlanet_India_Hindi/StatPlanet_India_Hindi/map")
india <- read_sf("map.shp")
Now we will see how to import raster data. For that ,
rasterpackage has a long-standing domination and it is the de facto package for raster data. It is created by Robert J. Hijmans, an environmental scientist from the University of California, Davis. Recently he released
terrapackage , the successor of
The file we use here can be downloaded from this link.
setwd("C:/Users/ugurc/Desktop/Medium Blog/Geo-Spatial Data in R/HYP_LR/HYP_LR")
world <- rast(x = "HYP_LR.tif")
Visualization is an essential part of spatial data analysis. There are a number of tools to visualize spatial data in R. Two of the most known of those tools are
Even if you are an absolute beginner in R, it is highly likely that you have heard of
ggplot2. It is a data visualization package created by Hadley Wickham, a prolific R programmer, and statistician. Generic
ggplot2is not the subject of this article. We will rather focus on the spatial data aspect of it.
The data we use here is the map of the regions of France. You can download it from the links (1 and 2).
If you know nothing about
ggplot2every graph is initiated with
ggplot() function and other layers and geometries are added to it with
+ operator. The layer function we will use is
library(tmap)setwd("C:/Users/ugurc/Desktop/Medium Blog/Geo-Spatial Data in R/france1")france <- read_sf("FRA_adm1.shp")
We can use
plot() function too — the output is not so appealing though.
plot(france)## Warning: plotting the first 9 out of 12 attributes; use max.plot = 12 to plot
This is how we use
ggplot2for spatial data visualization. It is quite simple.
ggplot(france) + geom_sf()
The other package we will use is
tmap. Its use is very similar to that of
ggplot2 and pretty intuitive. Both of the packages are designed based on the “Grammar of Graphics” philosophy.
Since our data is a polygon we can sketch its graph as follows.
tm_shape(france) + tm_polygons()
As a data engineer, I constantly have to deal with very different types of data. This process is always painful and frustrating. Whatever your title is, if you are in the software industry you have to manage different types of data and only a systematic study can make this process easier.
I try to help others so that they can benefit my mistakes and frustration and not make the same mistakes again. In this post, we tackled one of the most trickiest types of data: spatial. If you have read it thoroughly and run the code yourself you should be armed with a very solid set of tools to deal with spatial data. I hope you enjoyed it and found it useful.