Unit 2 - Appendix

Spatial Data Analysis with R (01:450:320)

1 Module 1 practice answers

1.1 Practice 1

1.1.1 Questions

Answers here.
Assuming the tibble has x and y or lat/long coordinates, you apply the function st_as_sf with the “coords” argument set to specify which columns contain the x and y coordinates.
sf::plot by default plots one panel per variable. You can create a single panel by specifying the variable you want, or by using the st_geometry argument to strip out the geometry from object. It also prevent distortions that sometimes occur when overlaying subsequent features on top of a base map.
For a single point you provide an x and y coordinate, otherwise you give an input matrix containing x and y coordinates. A polygon requires that the last point pair in the matrix is the same as the first point pair, to close the polygon.

1.1.2 Code

species %>% st_geometry()
st_geometry(species)

st_crs(species) <- st_crs(districts)
p <- "path/to/your/project/notebooks/data"
st_write(species, dsn = file.path(p, "species.sqlite"))
rm(species)
species <- st_read(file.path(p, "species.sqlite"))

class(roads)
str(roads)
class(districts)
str(districts)

plot(roads %>% st_geometry(), col = "blue")

plot(districts %>% select(distName), main = "Tanzania Districts")

pts <- st_multipoint(x = cbind(x = c(35, 36, 37), y = c(-5, -6, -7)))
plot(districts %>% st_geometry(), col = "grey")
plot(pts, add = TRUE, col = "orange", pch = 16)

1.2 Practice 2

1.2.1 Questions

At least with this example, pretty negligible–well less than 1% mean absolute error. It might matter more in other places and with other scales though. The reason st_area knows how to estimate areas is because it invokes lwgeom::st_geod_area, which calculates a geodetic surface area.
Because for the time being sf::plot is a fair bit faster. However, this recent twitter thread suggests that may change soon.
By using mutate with cut that has break values based on those properties. In our example, we found the breaks using quantile and different probabilities/percentile levels that creating tertiles of area.

1.2.2 Code

set.seed(1)
districts %>% 
  sample_n(20) %>% 
  st_area() %>% 
  units::set_units("ha") %>% 
  mean()

set.seed(1)
roads %>% 
  sample_n(100) %>% 
  st_length() %>% 
  units::set_units("km") %>% 
  mean()

plot(st_geometry(districts), col = "lightgrey")
set.seed(1)
species %>% filter(species == "Syncerus caffer") %>% sample_n(200) %>% 
    st_geometry() %>% 
    plot(col = "red", pch = 20, add = TRUE)

districts %>% st_transform(st_crs(roads)) %>% st_geometry() %>% 
  plot(col = "lightgrey")
roads %>%  
  mutate(length = as.numeric(st_length(.) / 1000)) %>% 
  filter(length > 50 & length < 100) %>% st_geometry() %>% 
  plot(col = "red", pch = 20, add = TRUE)

deciles <- function(x) quantile(x, probs = seq(0, 1, 0.1))
dist_deciles <- districts %>% mutate(area = as.numeric(st_area(.)) / 10^6) %>%
  mutate(acls = cut(area, breaks = deciles(area), include.lowest = TRUE)) %>% 
  group_by(acls) %>% summarize(sum_area = sum(area))  
dist_deciles
#
# #2
cols <- heat.colors(10)
par(mar = rep(0, 4))
plot(st_geometry(dist_deciles), col = cols)
legend(x = "bottomright", pch = 15, col = cols, bty = "n", 
       legend = paste0(1:10, c("st", "nd", "rd", rep("th", 7))))

1.2.3 Practice 3

1.2.3.1 Questions

It changes features from one type to another (e.g. POLYGON to MULTIPOLYGON), either one specified by the user or the simplest possible common feature, if left unspecified. Casting is sometimes necessary to avoid mixed feature types that cause failures for subsequent operations.
st_union runs under the hood of summarise.sf, so a summarize operation on an sf will result in a merged/dissolved set of spatial features.
It affects the order of the fields in the resulting unioned sf object–the fields from the object passed to the “x” argument appear first.

1.2.3.2 Code

coords <- cbind("x" = c(35, 35.5, 35.5, 35, 35), 
                "y" = c(-4.5, -4.5, -4, -4, -4.5))
pol2 <- st_polygon(x = list(coords)) %>% st_sfc %>% st_sf(ID = 1, crs = 4326)

par(mar = rep(0, 4))
plot(st_geometry(districts), col = "grey")
plot(pol2, col = "blue", add = TRUE)

pol2_int_dists <- st_intersection(pol2, districts)
districts[1] %>% plot(col = "grey", main = NULL)
pol2_int_dists[2] %>% plot(col = rainbow(nrow(pol2_int_dists)), add = TRUE)

pol2 is nearly 26 hectares larger

pol2_int_dists %>% st_area() %>% as.numeric() %>% sum() / 10000 -
  pol2 %>% st_area() %>% as.numeric() %>% sum() / 10000

st_difference(districts, pol2)[2] %>% plot(col = "grey", main = NULL)

# Try compare the
set.seed(1)
species_alb %>% filter(species == "Syncerus caffer") %>%
  sample_n(size = 5) %>% 
  st_buffer(dist = 30000) %>% 
  st_geometry %>% 
  plot()

# With 
set.seed(1)
species_alb %>% 
  filter(species == "Syncerus caffer") %>%
  st_sample(size = 5) %>%
  st_buffer(dist = 30000) %>% 
  st_geometry %>% 
  plot()

roads_gt40 <- roads %>% 
  filter(as.numeric(st_length(.)) / 1000 > 40)
par(mar = rep(0, 4))
st_transform(districts, st_crs(roads)) %>%
  st_geometry %>% 
  plot(col = "grey", main = NULL)
roads_gt40 %>% 
  st_buffer(25000) %>%
  plot(col = "green", add = TRUE)

par(mar = rep(0, 4))
st_transform(districts, st_crs(roads)) %>% 
  st_geometry %>% 
  plot(col = "grey", main = NULL)

roads_gt40 %>% st_buffer(25000) %>% 
  plot(col = "tan", add = TRUE)

set.seed(1)
roads_gt40 %>% st_buffer(25000) %>% 
  st_sample(size = rep(20, nrow(.)), exact = TRUE) %>% 
  plot(col = "red", pch = 20, add = TRUE)

2 Module 2 practice answers

2.1 Practice 1

2.1.1 Questions

terra uses the S4 object-oriented system, where sf uses S3. Slots in S4 objects are accessed using the @ operator.
terra::rast, the same function you would use for reading and writing a single-layer raster.
The output of terra’s vectorization functions are vect objects, so you have to convert them to sf objects using st_as_sf.

2.1.2 Code

# recreate r, r2, r3
e <- ext(c("xmin" = 34, "xmax" = 36, "ymin" = -6, "ymax" = -4))
r <- rast(x = e, res = 0.25, crs = crs(districts))
set.seed(1)  
values(r) <- sample(1:100, size = ncell(r), replace = TRUE)  # 3
r2 <- r > 50
r3 <- r
set.seed(1)
values(r3) <- rnorm(n = ncell(r3), mean = 10, sd = 2)

# 
r4 <- r3
set.seed(1)
values(r4) <- runif(n = ncell(r4), 0, 1)
r5 <- r4 > 0.5

s2 <- c(r, r2, r3, r4, r5)
names(s2) <- c("r", "r2", "r3", "r4", "r5")
plot(s2)

We use tempdir() here, but you should use your notebooks/data folder.

b <- writeRaster(s2, file = file.path(tempdir(), "b2.tif"))

tzar <- rast(x = ext(districts), crs = crs(districts), res = 0.2)
values(tzar) <- 1:ncell(tzar)

speciesr <- species %>% 
  dplyr::select(x, y) %>% 
  mutate(count = 1) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326) %>% 
  rasterize(x = ., y = tzar, field = "count", fun = sum)

par(mar = c(0, 0, 1, 4))
plot(st_union(districts), col = "grey", border = "grey")
terra::plot(speciesr, add = TRUE)

tzar_alb <- project(x = tzar, y = crs(roads), res = 20000, method = "near")
speciesr_alb <- project(x = speciesr, y = tzar_alb, method = "bilinear")

par(mar = c(0, 0, 1, 4), mfrow = c(1, 2))
plot(st_union(districts), col = "grey", border = "grey")
plot(speciesr, add = TRUE)

plot(st_transform(st_union(districts), crs(roads)), col = "grey", 
     border = "grey")
plot(speciesr_alb, add = TRUE)

# Note currently there is a bug in terra::plot when mixed with `sf` plots. A 
# workaround is to coerse districts to vect and plot. 
par(mar = c(0, 0, 1, 4), mfrow = c(1, 2))
plot(vect(st_union(districts)), col = "grey", border = "grey", axes = FALSE)
plot(speciesr, add = TRUE)
plot(vect(st_transform(st_union(districts), crs(roads))), col = "grey", 
     border = "grey", axes = FALSE)
plot(speciesr_alb, add = TRUE)

par(mar = c(0, 0, 0, 0))
speciesr %>% 
  as.polygons() %>% 
  st_as_sf %>% 
  plot(main = NULL)

2.2 Practice 2

2.2.1 Questions

global provides summary statistics over an entire layer; zonal calculates statistics within pre-defined zones; focal calculates statistics within a moving window.
Use method = bilinear.
You don’t. You have to resample them to a common resolution and extent first. And then you have to stack them (with c).

2.2.2 Code

as.Date("10-11-2017", "%m-%d-%Y")
as.Date("10-11-17", "%m-%d-%y")
as.Date("101117", "%m%d%y")
as.Date("10112017", "%m%d%Y")
lubridate::mdy("10-11-2017")
lubridate::as_date("20171011")

speciesr2 <- species %>% 
  mutate(count = 1) %>% 
  select(x, y, count) %>% 
  st_as_sf(coords = c("x", "y")) %>% 
  rasterize(., distsr, field = "count", fun = sum)

subtab <- zonal(speciesr2, distsr, fun = sum, na.rm = TRUE)
subst(distsr, from = subtab$ID, to = subtab$sum) %>% plot_noaxes

wmat3 <- matrix(1, nrow = 3, ncol = 3) 
wmat5 <- matrix(1, nrow = 5, ncol = 5) 
fstack <- c(focal(x = bioclimz[[12]], w = wmat3, fun = sd), 
            focal(x = bioclimz[[12]], w = wmat5, fun = sd),
            focal(x = bioclimz[[12]], w = wmat3, fun = max), 
            focal(x = bioclimz[[12]], w = wmat5, fun = max)) 
names(fstack) <- c("sd3", "sd5", "max3", "max5")
plot_noaxes(fstack)

bioclim_d57 <- bioclimz[[1]] %>% 
  crop(., ext(districts[57, ]))
s <- c(disagg(bioclim_d57, fact = 5), 
       disagg(bioclim_d57, fact = 5, method = "bilinear"))
names(s) <- c("d1", "d2")
plot_noaxes(s)

2.3 Practice 3

2.3.1 Questions

Using the cut approach,with vector of breakpoints (e.g. quantiles) passed to classify, or a reclassification matrix.
spatSample, using either the “random” or “stratified” methods. There is also “regular” as an option, and you can pass a weights vector when method is “stratified”.

2.3.2 Code

temp_range <- app(bioclimz[[c("bio05", "bio06")]], 
                  fun = function(x) { x[1] - x[2] })

# Plot to verify the result
plot_noaxes(temp_range, main = "Annual Temp Range (BIO5 - BIO6)")

(temp_range < unlist(global(temp_range, mean, na.rm = TRUE))) %>% plot

qtiles <- global(precip, function(x) {
  quantile(x, probs = seq(0, 1, 0.2), na.rm = TRUE)
}) 
plot_noaxes(classify(precip, rcl = unlist(qtiles)))

set.seed(11)
randdistsr <- districts %>% 
  sample_n(size = 15) %>% 
  rasterize(., precip)
plot_noaxes(randdistsr)

newrandrain <- mask(precip, randdistsr)
plot_noaxes(newrandrain)

set.seed(1)
randsamp <- spatSample(x = newrandrain, size = 300, na.rm = TRUE)

set.seed(1)
ind <- spatSample(x = randdistsr, size = 300 / 15, cells = TRUE, na.rm = TRUE)
stratsamp <- newrandrain[ind[, 1]]

rand_rain_stats <- bind_rows(
  tibble(rain = randsamp$bio12, dat = "Simple"),
  tibble(rain = stratsamp$bio12, dat = "Stratified"),
) %>% drop_na

bp_theme <- theme(legend.title = element_blank(), axis.text.x = element_blank(),
                  axis.ticks.x = element_blank(), 
                  panel.grid.major.x = element_blank(), 
                  panel.grid.minor.x = element_blank(), 
                  panel.background = element_rect(fill = "grey95"))

rand_rain_stats %>% ggplot() +
  geom_boxplot(mapping = aes(y = rain, fill = dat), position = "dodge2") +
  scale_fill_manual(values = c("lightblue", "steelblue")) + 
  ggtitle("Rainfall distributions") + xlab(NULL) + ylab("mm") + bp_theme

2.4 Practice 4

2.4.1 Questions

The cellSize function is your friend for this.
Use expression together combined with paste as needed for more complex labels.
Make names(predstack) matches the predictor names used by the model.

2.4.2 Code

demalb145 <- districts %>% filter(ID == 145) %>% st_transform(st_crs(roads)) %>% 
  crop(demalb, .)
vars <- c("slope", "aspect", "flowdir", "TRI")
terrvars <- do.call(c, lapply(1:length(vars), function(x) {
  terrain(x = demalb145, v = vars[x], unit = "degrees")
}))

names(terrvars) <- vars

plot_noaxes(terrvars)

# #1
raintotalb <- project(x = precip, y = crs(roads), res = 5000)
names(raintotalb) <- "rain"
r <- rast(ext(raintotalb), res = res(raintotalb), 
          crs = crs(raintotalb), vals = 1)

# lapply approach to interpolation
idw_list <- do.call(c, lapply(c(1000, 500, 250), function(x) {
  set.seed(1)
  rainsamp <- spatSample(raintotalb, size = x, xy = TRUE, na.rm = TRUE)
  invdist <- gstat(id = "rain", formula = rain ~ 1, locations = ~x + y, 
                   data = rainsamp)
  invdistr <- interpolate(object = r, model = invdist)
  mask(x = invdistr[[1]], mask = raintotalb)
}))

idws <- c(raintotalb, idw_list)

titles <- c("bio12", "1000 pt IDW", "500 pt IDW", "250 pt IDW")
plot_noaxes(idws, main = titles, zlim = c(0, 150))

districts %>% filter(ID %in% seq(15, 50, 5)) %>% 
    st_transform(st_crs(roads)) %>% st_geometry %>% st_centroid %>% 
    st_as_sf() %>% 
    rasterize(., raintotalb) %>% distance() %>% 
    mask(., raintotalb) %>% plot_noaxes

The AUC is still high, but the suitability map is very different. And variable importance changed as well.

# #1. Subset occurrence data for Syncerus caffer
species_pts <- species %>% filter(species == "Syncerus caffer") %>% 
    st_as_sf(coords = c("x", "y"), crs = 4326)

# #2. Select our 3 relevant bioclim predictors
predictors <- bioclimz[[c("bio01", "bio02", "bio12")]]

# #3. Generate 1000 random background (pseudo-absence) points
set.seed(42) # For reproducibility
bg_pts <- spatSample(predictors, size = 100, method = "random", 
                     as.points = TRUE, na.rm = TRUE)

# #4. Extract environmental values at presence and background locations
pres_env <- terra::extract(predictors, species_pts, ID = FALSE)
bg_env <- terra::extract(predictors, bg_pts, ID = FALSE)

# #5. Combine into a training data frame
# 1 = Presence, 0 = Background
sdm_data <- data.frame(
  pa = c(rep(1, nrow(pres_env)), rep(0, nrow(bg_env))),
  rbind(pres_env, bg_env)
) %>% na.omit()

# #6. Fit the Random Forest Model
load_quietly("randomForest") # load randomForest package

mod_rf <- randomForest(as.factor(pa) ~ bio01 + bio02 + bio12, data = sdm_data,
                       importance = TRUE)

mod_rf

# #7 Accuracy assessment
# 1. Predict values for the training data points to check accuracy
# 'type = "prob"' gives us the probability (0 to 1)
test_preds <- predict(mod_rf, sdm_data, type = "prob")[, 2]

# 2. AUC (Area Under the Curve)
# AUC = 0.5 is random; AUC > 0.8 is generally considered a good model
load_quietly("pROC") # load pROC package

roc_obj <- roc(sdm_data$pa, test_preds)
auc_val <- auc(roc_obj)

auc_val
plot(roc_obj, main = "Variable Importance for Species Model")

# #8. Visualize Variable Importance
# This shows which of our 3 bioclim vars was most influential
varImpPlot(mod_rf, main = "Variable Importance for Species Model")

# #9. Predict across all of Tanzania to create a suitability map
suitability_map <- predict(predictors, mod_rf, type = "prob", index = 2)

# #10. Plot the result
plot_noaxes(suitability_map, main = "Predicted Species Suitability (Syncerus caffer)")
points(species_pts, pch = 16, cex = 0.5, col = "red")

Back to home