Helpful functions
Spatial Data Analysis with R (01:450:320)
1 strings
1.1 paste, paste0
Concatenate strings. paste includes the option to add a
specific separator, like a space or hyphen.
paste0 assumes there is no separator.
1.2 str_replace
From stringr package. Use to replace strings
1.3 str_replace_all
Similar to str_replace, but str_replace_all
replaces
library(stringr)
v <- "it was the best of times it was the worst of times"
print(v)
#> [1] "it was the best of times it was the worst of times"
w <- stringr::str_replace(v, "times", "spaghetti" )
print(w) ## only first "times" replaced
#> [1] "it was the best of spaghetti it was the worst of times"
x <- stringr::str_replace_all(v, "times", "spaghetti" )
print(x) ## all "times" replaced
#> [1] "it was the best of spaghetti it was the worst of spaghetti"1.4 str_sub
Create substrings based on character index
2 dates
2.1 as_date
From lubridate package. Converts from character to
date.
Dates in “YYYY-MM-DD” format don’t need additional information.
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
a <- as_date("2020-11-01")
print(a)
#> [1] "2020-11-01"Dates in other formats may need the format parameter.
See different format options here or
run ?strptime.
Can also convert from date to character.
3 dplyr
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union3.1 pipe operator ( %>% )
Use pipe operator to chain commands.
Commonly used with tibbles. Note that in dplyr, you
don’t need to use quotes for column names.
Example below groups by “status” and summarizes the mean pressure.
storm_summary <- storms %>%
filter(year >= 2000) %>%
group_by(status) %>%
summarise(pressure = mean(pressure))
storm_summary
#> # A tibble: 9 × 2
#> status pressure
#> <fct> <dbl>
#> 1 disturbance 1009.
#> 2 extratropical 992.
#> 3 hurricane 966.
#> 4 other low 1009.
#> 5 subtropical depression 1006.
#> 6 subtropical storm 998.
#> 7 tropical depression 1007.
#> 8 tropical storm 999.
#> 9 tropical wave 1009.You can use . to specify in which argument the
%>% should go to. Let’s say you want to take a sample of
size 50 from the numbers 1:100.
3.2 mutate
Creates a new column based on calculations you define.
set.seed(1)
tib <- tibble(a = 1:10, b = sample(1:100, 10))
tib <- tib %>% mutate(product = a * b) ## new column is product of columns a, b
tib
#> # A tibble: 10 × 3
#> a b product
#> <int> <int> <int>
#> 1 1 68 68
#> 2 2 39 78
#> 3 3 1 3
#> 4 4 34 136
#> 5 5 87 435
#> 6 6 43 258
#> 7 7 14 98
#> 8 8 82 656
#> 9 9 59 531
#> 10 10 51 5103.3 dplyr::select
set.seed(1)
tib <- tibble(a = 1:10, b = sample(1:100, 10))
tib <- tib %>% mutate(product = a * b) ## new column is product of columns a, b
tib
#> # A tibble: 10 × 3
#> a b product
#> <int> <int> <int>
#> 1 1 68 68
#> 2 2 39 78
#> 3 3 1 3
#> 4 4 34 136
#> 5 5 87 435
#> 6 6 43 258
#> 7 7 14 98
#> 8 8 82 656
#> 9 9 59 531
#> 10 10 51 5103.4 arrange
Sorts by a column. Default is ascending order. You can also arrange multiple columns.
set.seed(1)
tib <- tibble(a = 1:10, b = sample(1:100, 10))
tib <- tib %>% mutate(product = a * b) ## new column is product of columns a, b
## sort tib by product column
tib_sorted <- tib %>% arrange(product)
tib_sorted
#> # A tibble: 10 × 3
#> a b product
#> <int> <int> <int>
#> 1 3 1 3
#> 2 1 68 68
#> 3 2 39 78
#> 4 7 14 98
#> 5 4 34 136
#> 6 6 43 258
#> 7 5 87 435
#> 8 10 51 510
#> 9 9 59 531
#> 10 8 82 656Use - for descending order
4 control structures
4.1 for loops
In for loop, you perform the operations once for each item in the
iterator. So if the loop starts for(k in items) then
items is the iterator.
4.2 if-else
items <- sample(LETTERS, 10)
for(k in items){
print(k)
if(k %in% c("A", "E", "I", "O", "U")){
print("vowel")
} else {
print("consonant")
}
}
#> [1] "H"
#> [1] "consonant"
#> [1] "Q"
#> [1] "consonant"
#> [1] "Y"
#> [1] "consonant"
#> [1] "L"
#> [1] "consonant"
#> [1] "I"
#> [1] "vowel"
#> [1] "R"
#> [1] "consonant"
#> [1] "K"
#> [1] "consonant"
#> [1] "A"
#> [1] "vowel"
#> [1] "C"
#> [1] "consonant"
#> [1] "P"
#> [1] "consonant"4.3 lapply
lapply returns objects in a list.
4.4 sapply
sapply returns elements in a vector (when possible)
5 sampling
5.1 sample
sample is used for picking samples from a discrete
object, like a vector.
5.2 runif
runif samples from a uniform distribution (equal
probability for all values in the defined interval)
The example below picks 5 values from a uniform distribution between 0 and 2.
6 read/write
6.1 read/write csv’s
You can use Base R read.csv(), or readr
read_csv()
library(readr)
f <- readr_example("mtcars.csv") # an example csv file in readr package
mtcars <- read.csv(f)
print(class(mtcars))
#> [1] "data.frame"maize2 <- readr::read_csv(f)
#> Rows: 32 Columns: 11
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (11): mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
print(class(maize2))
#> [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
maize2
#> # A tibble: 32 × 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ 22 more rows6.2 save/load
Saving and loading is used for RData objects. Use
extension .rda. You can save any R object in this way (data
frames, tibbles, lists, rasters etc)
f <- readr_example("mtcars.csv") # an example csv file in readr package
mtcars <- read.csv(f)
save(mtcars, file = "~/mtcars.rda") ## save to your user homeWhen you load data, it will retain the variable name it had.
7 table indexes
7.1 Base R
Use [ , ] notation. Row conditions (filtering) are to
the left of comma. Column conditions (dplyr::selecting columns) are to
the right.
DF <- data.frame(v1 = 1:5, v2 = 6:10)
rownames(DF) <- LETTERS[1:5]
DF
#> v1 v2
#> A 1 6
#> B 2 7
#> C 3 8
#> D 4 9
#> E 5 10Subsetting data.
7.2 dplyr
Use filter for row conditions and
dplyr::select to dplyr::select columns.
DF <- tibble(v1 = 1:5, v2 = 6:10)
rownames(DF) <- LETTERS[1:5]
#> Warning: Setting row names on a tibble is deprecated.
DF
#> # A tibble: 5 × 2
#> v1 v2
#> * <int> <int>
#> 1 1 6
#> 2 2 7
#> 3 3 8
#> 4 4 9
#> 5 5 10Filter to rows where v1 is greater than 3.
DF_filt <- DF %>% filter(v1 > 3)
DF_filt
#> # A tibble: 2 × 2
#> v1 v2
#> * <int> <int>
#> 1 4 9
#> 2 5 10Same as above but only show column v2.
7.3 slice
slice is a dplyr function to dplyr::select
rows by number.
dplyr::select second and third rows.
7.4 head, tail
head dplyr::selects the first n rows in a data frame or
tibble. tail dplyr::selects the last n rows.
8 table functions
8.1 cbind, rbind
8.2 joins
8.3 pivot_longer
8.4 pivot_wider
9 Other
9.1 which
Returns indices (position in vector) where a condition is true.
9.2 which.min
Finds index of minimum value. Only returns first location of min, even if multiple values exist.
9.3 which.max
Finds index of maximum value. Only returns first location of max, even if multiple values exist.
9.4 unique
unique filters an object to unique values
set.seed(2)
birthdays <- sample(1:365, 50, replace = T) ## sample 100 birthdays
print(birthdays)
#> [1] 341 198 262 273 349 204 297 178 75 131 306 311 63 136 231 289 54 361 112
#> [20] 171 38 361 110 144 45 238 208 134 339 9 350 130 244 3 129 304 297 301
#> [39] 289 274 8 164 350 37 226 149 205 327 242 358distinct_birthdays <- (unique(birthdays))
print(distinct_birthdays)
#> [1] 341 198 262 273 349 204 297 178 75 131 306 311 63 136 231 289 54 361 112
#> [20] 171 38 110 144 45 238 208 134 339 9 350 130 244 3 129 304 301 274 8
#> [39] 164 37 226 149 205 327 242 358
print(paste0(length(distinct_birthdays), " distinct birthdays"))
#> [1] "46 distinct birthdays"