Unit 2 - Appendix
Spatial Data Analysis with R (01:450:320)
1 Module 1 practice answers
1.1 Practice 1
1.1.1 Questions
Answers here.
Assuming the
tibblehas x and y or lat/long coordinates, you apply the functionst_as_sfwith the “coords” argument set to specify which columns contain the x and y coordinates.sf::plotby default plots one panel per variable. You can create a single panel by specifying the variable you want, or by using thest_geometryargument to strip out the geometry from object. It also prevent distortions that sometimes occur when overlaying subsequent features on top of a base map.For a single point you provide an x and y coordinate, otherwise you give an input matrix containing x and y coordinates. A polygon requires that the last point pair in the matrix is the same as the first point pair, to close the polygon.
1.2 Practice 2
1.2.1 Questions
At least with this example, pretty negligible–well less than 1% mean absolute error. It might matter more in other places and with other scales though. The reason
st_areaknows how to estimate areas is because it invokeslwgeom::st_geod_area, which calculates a geodetic surface area.Because for the time being
sf::plotis a fair bit faster. However, this recent twitter thread suggests that may change soon.By using
mutatewithcutthat has break values based on those properties. In our example, we found the breaks usingquantileand different probabilities/percentile levels that creating tertiles of area.
1.2.2 Code
plot(st_geometry(districts), col = "lightgrey")
set.seed(1)
species %>% filter(species == "Syncerus caffer") %>% sample_n(200) %>%
st_geometry() %>%
plot(col = "red", pch = 20, add = TRUE)districts %>% st_transform(st_crs(roads)) %>% st_geometry() %>%
plot(col = "lightgrey")
roads %>%
mutate(length = as.numeric(st_length(.) / 1000)) %>%
filter(length > 50 & length < 100) %>% st_geometry() %>%
plot(col = "red", pch = 20, add = TRUE)deciles <- function(x) quantile(x, probs = seq(0, 1, 0.1))
dist_deciles <- districts %>% mutate(area = as.numeric(st_area(.)) / 10^6) %>%
mutate(acls = cut(area, breaks = deciles(area), include.lowest = TRUE)) %>%
group_by(acls) %>% summarize(sum_area = sum(area))
dist_deciles
#
# #2
cols <- heat.colors(10)
par(mar = rep(0, 4))
plot(st_geometry(dist_deciles), col = cols)
legend(x = "bottomright", pch = 15, col = cols, bty = "n",
legend = paste0(1:10, c("st", "nd", "rd", rep("th", 7))))1.2.3 Practice 3
1.2.3.1 Questions
It changes features from one type to another (e.g. POLYGON to MULTIPOLYGON), either one specified by the user or the simplest possible common feature, if left unspecified. Casting is sometimes necessary to avoid mixed feature types that cause failures for subsequent operations.
st_unionruns under the hood ofsummarise.sf, so asummarizeoperation on ansfwill result in a merged/dissolved set of spatial features.It affects the order of the fields in the resulting unioned
sfobject–the fields from the object passed to the “x” argument appear first.
1.2.3.2 Code
coords <- cbind("x" = c(35, 35.5, 35.5, 35, 35),
"y" = c(-4.5, -4.5, -4, -4, -4.5))
pol2 <- st_polygon(x = list(coords)) %>% st_sfc %>% st_sf(ID = 1, crs = 4326)
par(mar = rep(0, 4))
plot(st_geometry(districts), col = "grey")
plot(pol2, col = "blue", add = TRUE)pol2_int_dists <- st_intersection(pol2, districts)
districts[1] %>% plot(col = "grey", main = NULL)
pol2_int_dists[2] %>% plot(col = rainbow(nrow(pol2_int_dists)), add = TRUE)pol2is nearly 26 hectares larger
pol2_int_dists %>% st_area() %>% as.numeric() %>% sum() / 10000 -
pol2 %>% st_area() %>% as.numeric() %>% sum() / 10000# Try compare the
set.seed(1)
species_alb %>% filter(species == "Syncerus caffer") %>%
sample_n(size = 5) %>%
st_buffer(dist = 30000) %>%
st_geometry %>%
plot()
# With
set.seed(1)
species_alb %>%
filter(species == "Syncerus caffer") %>%
st_sample(size = 5) %>%
st_buffer(dist = 30000) %>%
st_geometry %>%
plot()roads_gt40 <- roads %>%
filter(as.numeric(st_length(.)) / 1000 > 40)
par(mar = rep(0, 4))
st_transform(districts, st_crs(roads)) %>%
st_geometry %>%
plot(col = "grey", main = NULL)
roads_gt40 %>%
st_buffer(25000) %>%
plot(col = "green", add = TRUE)par(mar = rep(0, 4))
st_transform(districts, st_crs(roads)) %>%
st_geometry %>%
plot(col = "grey", main = NULL)
roads_gt40 %>% st_buffer(25000) %>%
plot(col = "tan", add = TRUE)
set.seed(1)
roads_gt40 %>% st_buffer(25000) %>%
st_sample(size = rep(20, nrow(.)), exact = TRUE) %>%
plot(col = "red", pch = 20, add = TRUE)2 Module 2 practice answers
2.1 Practice 1
2.1.1 Questions
terrauses the S4 object-oriented system, wheresfuses S3. Slots in S4 objects are accessed using the@operator.terra::rast, the same function you would use for reading and writing a single-layer raster.The output of
terra’s vectorization functions arevectobjects, so you have to convert them tosfobjects usingst_as_sf.
2.1.2 Code
# recreate r, r2, r3
e <- ext(c("xmin" = 34, "xmax" = 36, "ymin" = -6, "ymax" = -4))
r <- rast(x = e, res = 0.25, crs = crs(districts))
set.seed(1)
values(r) <- sample(1:100, size = ncell(r), replace = TRUE) # 3
r2 <- r > 50
r3 <- r
set.seed(1)
values(r3) <- rnorm(n = ncell(r3), mean = 10, sd = 2)
#
r4 <- r3
set.seed(1)
values(r4) <- runif(n = ncell(r4), 0, 1)
r5 <- r4 > 0.5
s2 <- c(r, r2, r3, r4, r5)
names(s2) <- c("r", "r2", "r3", "r4", "r5")
plot(s2)- We use
tempdir()here, but you should use yournotebooks/datafolder.
tzar <- rast(x = ext(districts), crs = crs(districts), res = 0.2)
values(tzar) <- 1:ncell(tzar)
speciesr <- species %>%
dplyr::select(x, y) %>%
mutate(count = 1) %>%
st_as_sf(coords = c("x", "y"), crs = 4326) %>%
rasterize(x = ., y = tzar, field = "count", fun = sum)
par(mar = c(0, 0, 1, 4))
plot(st_union(districts), col = "grey", border = "grey")
terra::plot(speciesr, add = TRUE)tzar_alb <- project(x = tzar, y = crs(roads), res = 20000, method = "near")
speciesr_alb <- project(x = speciesr, y = tzar_alb, method = "bilinear")
par(mar = c(0, 0, 1, 4), mfrow = c(1, 2))
plot(st_union(districts), col = "grey", border = "grey")
plot(speciesr, add = TRUE)
plot(st_transform(st_union(districts), crs(roads)), col = "grey",
border = "grey")
plot(speciesr_alb, add = TRUE)
# Note currently there is a bug in terra::plot when mixed with `sf` plots. A
# workaround is to coerse districts to vect and plot.
par(mar = c(0, 0, 1, 4), mfrow = c(1, 2))
plot(vect(st_union(districts)), col = "grey", border = "grey", axes = FALSE)
plot(speciesr, add = TRUE)
plot(vect(st_transform(st_union(districts), crs(roads))), col = "grey",
border = "grey", axes = FALSE)
plot(speciesr_alb, add = TRUE)2.2 Practice 2
2.2.1 Questions
globalprovides summary statistics over an entire layer;zonalcalculates statistics within pre-defined zones;focalcalculates statistics within a moving window.Use
method = bilinear.You don’t. You have to resample them to a common resolution and extent first. And then you have to stack them (with
c).
2.2.2 Code
as.Date("10-11-2017", "%m-%d-%Y")
as.Date("10-11-17", "%m-%d-%y")
as.Date("101117", "%m%d%y")
as.Date("10112017", "%m%d%Y")
lubridate::mdy("10-11-2017")
lubridate::as_date("20171011")speciesr2 <- species %>%
mutate(count = 1) %>%
select(x, y, count) %>%
st_as_sf(coords = c("x", "y")) %>%
rasterize(., distsr, field = "count", fun = sum)subtab <- zonal(speciesr2, distsr, fun = sum, na.rm = TRUE)
subst(distsr, from = subtab$ID, to = subtab$sum) %>% plot_noaxeswmat3 <- matrix(1, nrow = 3, ncol = 3)
wmat5 <- matrix(1, nrow = 5, ncol = 5)
fstack <- c(focal(x = bioclimz[[12]], w = wmat3, fun = sd),
focal(x = bioclimz[[12]], w = wmat5, fun = sd),
focal(x = bioclimz[[12]], w = wmat3, fun = max),
focal(x = bioclimz[[12]], w = wmat5, fun = max))
names(fstack) <- c("sd3", "sd5", "max3", "max5")
plot_noaxes(fstack)2.3 Practice 3
2.3.1 Questions
Using the cut approach,with vector of breakpoints (e.g. quantiles) passed to
classify, or a reclassification matrix.spatSample, using either the “random” or “stratified” methods. There is also “regular” as an option, and you can pass a weights vector when method is “stratified”.
2.3.2 Code
temp_range <- app(bioclimz[[c("bio05", "bio06")]],
fun = function(x) { x[1] - x[2] })
# Plot to verify the result
plot_noaxes(temp_range, main = "Annual Temp Range (BIO5 - BIO6)")qtiles <- global(precip, function(x) {
quantile(x, probs = seq(0, 1, 0.2), na.rm = TRUE)
})
plot_noaxes(classify(precip, rcl = unlist(qtiles)))set.seed(11)
randdistsr <- districts %>%
sample_n(size = 15) %>%
rasterize(., precip)
plot_noaxes(randdistsr)
newrandrain <- mask(precip, randdistsr)
plot_noaxes(newrandrain)set.seed(1)
randsamp <- spatSample(x = newrandrain, size = 300, na.rm = TRUE)
set.seed(1)
ind <- spatSample(x = randdistsr, size = 300 / 15, cells = TRUE, na.rm = TRUE)
stratsamp <- newrandrain[ind[, 1]]
rand_rain_stats <- bind_rows(
tibble(rain = randsamp$bio12, dat = "Simple"),
tibble(rain = stratsamp$bio12, dat = "Stratified"),
) %>% drop_na
bp_theme <- theme(legend.title = element_blank(), axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.background = element_rect(fill = "grey95"))
rand_rain_stats %>% ggplot() +
geom_boxplot(mapping = aes(y = rain, fill = dat), position = "dodge2") +
scale_fill_manual(values = c("lightblue", "steelblue")) +
ggtitle("Rainfall distributions") + xlab(NULL) + ylab("mm") + bp_theme2.4 Practice 4
2.4.1 Questions
The
cellSizefunction is your friend for this.Use
expressiontogether combined withpasteas needed for more complex labels.Make
names(predstack)matches the predictor names used by the model.
2.4.2 Code
demalb145 <- districts %>% filter(ID == 145) %>% st_transform(st_crs(roads)) %>%
crop(demalb, .)
vars <- c("slope", "aspect", "flowdir", "TRI")
terrvars <- do.call(c, lapply(1:length(vars), function(x) {
terrain(x = demalb145, v = vars[x], unit = "degrees")
}))
names(terrvars) <- vars
plot_noaxes(terrvars)# #1
raintotalb <- project(x = precip, y = crs(roads), res = 5000)
names(raintotalb) <- "rain"
r <- rast(ext(raintotalb), res = res(raintotalb),
crs = crs(raintotalb), vals = 1)
# lapply approach to interpolation
idw_list <- do.call(c, lapply(c(1000, 500, 250), function(x) {
set.seed(1)
rainsamp <- spatSample(raintotalb, size = x, xy = TRUE, na.rm = TRUE)
invdist <- gstat(id = "rain", formula = rain ~ 1, locations = ~x + y,
data = rainsamp)
invdistr <- interpolate(object = r, model = invdist)
mask(x = invdistr[[1]], mask = raintotalb)
}))
idws <- c(raintotalb, idw_list)
titles <- c("bio12", "1000 pt IDW", "500 pt IDW", "250 pt IDW")
plot_noaxes(idws, main = titles, zlim = c(0, 150))districts %>% filter(ID %in% seq(15, 50, 5)) %>%
st_transform(st_crs(roads)) %>% st_geometry %>% st_centroid %>%
st_as_sf() %>%
rasterize(., raintotalb) %>% distance() %>%
mask(., raintotalb) %>% plot_noaxes- The AUC is still high, but the suitability map is very different. And variable importance changed as well.
# #1. Subset occurrence data for Syncerus caffer
species_pts <- species %>% filter(species == "Syncerus caffer") %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# #2. Select our 3 relevant bioclim predictors
predictors <- bioclimz[[c("bio01", "bio02", "bio12")]]
# #3. Generate 1000 random background (pseudo-absence) points
set.seed(42) # For reproducibility
bg_pts <- spatSample(predictors, size = 100, method = "random",
as.points = TRUE, na.rm = TRUE)
# #4. Extract environmental values at presence and background locations
pres_env <- terra::extract(predictors, species_pts, ID = FALSE)
bg_env <- terra::extract(predictors, bg_pts, ID = FALSE)
# #5. Combine into a training data frame
# 1 = Presence, 0 = Background
sdm_data <- data.frame(
pa = c(rep(1, nrow(pres_env)), rep(0, nrow(bg_env))),
rbind(pres_env, bg_env)
) %>% na.omit()
# #6. Fit the Random Forest Model
load_quietly("randomForest") # load randomForest package
mod_rf <- randomForest(as.factor(pa) ~ bio01 + bio02 + bio12, data = sdm_data,
importance = TRUE)
mod_rf
# #7 Accuracy assessment
# 1. Predict values for the training data points to check accuracy
# 'type = "prob"' gives us the probability (0 to 1)
test_preds <- predict(mod_rf, sdm_data, type = "prob")[, 2]
# 2. AUC (Area Under the Curve)
# AUC = 0.5 is random; AUC > 0.8 is generally considered a good model
load_quietly("pROC") # load pROC package
roc_obj <- roc(sdm_data$pa, test_preds)
auc_val <- auc(roc_obj)
auc_val
plot(roc_obj, main = "Variable Importance for Species Model")
# #8. Visualize Variable Importance
# This shows which of our 3 bioclim vars was most influential
varImpPlot(mod_rf, main = "Variable Importance for Species Model")
# #9. Predict across all of Tanzania to create a suitability map
suitability_map <- predict(predictors, mod_rf, type = "prob", index = 2)
# #10. Plot the result
plot_noaxes(suitability_map, main = "Predicted Species Suitability (Syncerus caffer)")
points(species_pts, pch = 16, cex = 0.5, col = "red")