Spatial Data Analysis (450:320)

Class 9

We have learned

Fundamentals of R.
Functions, packages, environment, and namespace.
Reproducibility.
Getting help.
Git/GitHub
Indexing, subsetting, and replacing
*apply family

Today

tidy data and tidyverse
Read, write data and paths

Tidy data

Each variable is a column; each column is a variable.
Each observation is a row; each row is an observation.
Each value is a cell; each cell is a single value.

Tidy data

# A tibble: 4 × 4
  name   assignment1 assignment2 quiz1
  <chr>  <chr>       <chr>       <chr>
1 Billy  <NA>        D           C    
2 Suzy   F           <NA>        <NA> 
3 Lionel B           C           B    
4 Jenny  A           A           B

Is this table tidy?

Tidy data

# A tibble: 12 × 3
   name   assessment  grade
   <chr>  <chr>       <chr>
 1 Billy  assignment1 <NA> 
 2 Billy  assignment2 D    
 3 Billy  quiz1       C    
 4 Suzy   assignment1 F    
 5 Suzy   assignment2 <NA> 
 6 Suzy   quiz1       <NA> 
 7 Lionel assignment1 B    
 8 Lionel assignment2 C    
 9 Lionel quiz1       B    
10 Jenny  assignment1 A    
11 Jenny  assignment2 A    
12 Jenny  quiz1       B

Tidy data

How about this one?

# A tibble: 3 × 5
  assessment  Billy Suzy  Lionel Jenny
  <chr>       <chr> <chr> <chr>  <chr>
1 assignment1 <NA>  F     B      A    
2 assignment2 D     <NA>  C      A    
3 quiz1       C     <NA>  B      B

Tidy data

# A tibble: 12 × 3
   assessment  student grade
   <chr>       <chr>   <chr>
 1 assignment1 Billy   <NA> 
 2 assignment1 Suzy    F    
 3 assignment1 Lionel  B    
 4 assignment1 Jenny   A    
 5 assignment2 Billy   D    
 6 assignment2 Suzy    <NA> 
 7 assignment2 Lionel  C    
 8 assignment2 Jenny   A    
 9 quiz1       Billy   C    
10 quiz1       Suzy    <NA> 
11 quiz1       Lionel  B    
12 quiz1       Jenny   B

Introduction to `tidyverse`

A collection of packages to create, process and manipulate tidy data.

dplyr, provides functions for data manipulation
tidyr, provide functions to get to tidy data
ggplot2, a system to create figures.
stringr, provides functions to deal with strings

Visit tidyverse web together.

`dplyr` vs base R

filter (by condition) and slice (by index) to subset rows
select to subset columns
pull to read a column as a vector (like the double bracket [[]])
mutate to add a new column

Run an example

library(dplyr)
classroom

# A tibble: 12 × 3
   assessment  student grade
   <chr>       <chr>   <chr>
 1 assignment1 Billy   <NA> 
 2 assignment1 Suzy    F    
 3 assignment1 Lionel  B    
 4 assignment1 Jenny   A    
 5 assignment2 Billy   D    
 6 assignment2 Suzy    <NA> 
 7 assignment2 Lionel  C    
 8 assignment2 Jenny   A    
 9 quiz1       Billy   C    
10 quiz1       Suzy    <NA> 
11 quiz1       Lionel  B    
12 quiz1       Jenny   B

# filter
good_stu <- filter(classroom, grade == "A")
good_stu

# A tibble: 2 × 3
  assessment  student grade
  <chr>       <chr>   <chr>
1 assignment1 Jenny   A    
2 assignment2 Jenny   A

first_stus <- slice(classroom, 1:2)
first_stus

# A tibble: 2 × 3
  assessment  student grade
  <chr>       <chr>   <chr>
1 assignment1 Billy   <NA> 
2 assignment1 Suzy    F

Run an example

# select
students <- select(classroom, c(student, grade))
students

# A tibble: 12 × 2
   student grade
   <chr>   <chr>
 1 Billy   <NA> 
 2 Suzy    F    
 3 Lionel  B    
 4 Jenny   A    
 5 Billy   D    
 6 Suzy    <NA> 
 7 Lionel  C    
 8 Jenny   A    
 9 Billy   C    
10 Suzy    <NA> 
11 Lionel  B    
12 Jenny   B

# pull
stuts <- pull(classroom, student)
stuts

 [1] "Billy"  "Suzy"   "Lionel" "Jenny"  "Billy"  "Suzy"   "Lionel" "Jenny" 
 [9] "Billy"  "Suzy"   "Lionel" "Jenny"

Run an example

# mutate
classroom <- mutate(classroom, good = ifelse(grade == "A", 1, 0))
classroom

# A tibble: 12 × 4
   assessment  student grade  good
   <chr>       <chr>   <chr> <dbl>
 1 assignment1 Billy   <NA>     NA
 2 assignment1 Suzy    F         0
 3 assignment1 Lionel  B         0
 4 assignment1 Jenny   A         1
 5 assignment2 Billy   D         0
 6 assignment2 Suzy    <NA>     NA
 7 assignment2 Lionel  C         0
 8 assignment2 Jenny   A         1
 9 quiz1       Billy   C         0
10 quiz1       Suzy    <NA>     NA
11 quiz1       Lionel  B         0
12 quiz1       Jenny   B         0

Revisit Task 1 in code cracker

How many unique crop types are in the label column?
What is the name of the crop that appears first alphabetically?
What is the name of the crop that appears last alphabetically?

crop_types <- unique(crops[["label"]])
n_crops <- length(crop_types)
first_crop <- sort(crop_types)[1]
last_crop <- sort(crop_types, decreasing = TRUE)[1]

Use dplyr syntax to redo this task. Share your solution in Chat.

Pipeline

For these two lines:

crop_types <- unique(crops[["label"]])
n_crops <- length(crop_types)

I can chain all functions into a single pipeline using %>%.

n_crops <- crops %>% pull(label) %>% unique() %>% length()

Pipeline

The full version is:

n_crops <- crops %>% pull(., label) %>% unique(.) %>% length(.)

Use . to refer to the result from the previous step. By default, the result is passed as the first argument.

Sometimes you may want to control where the result is passed in the next function call:

crops %>% pull(label) %>% unique() %>% length() %>% paste0("Crop number is: ", .)

Your try

Revisit task 2 in code cracker:

What is the maximum N (Nitrogen) value for maize?

Your task:

Get the result in one dplyr pipeline
Share your solution in Chat.

Your try (10 mins)

Redo all 5 code cracker tasks using dplyr syntax one by one.
Challenge: try to finish every step in a single pipeline.
Ask questions if you get stuck.

Read and write data

read.csv to read csv file
write.csv to save csv file

# Base R
soemthing <- read.csv("/path/to/the/file", stringsAsFactors = FALSE)
write.csv(something, "/path/for/the/new/file", row.names = FALSE)

Paths

Interactive environment (e.g. console):
- getwd() to get the working directory
- setwd() to set the new working directory
- relative paths are relative to this working directory.
R markdown knit directory:
- Document directory
- Project directory
- Current working directory
R script
- Current working directory

Strategies to work with paths

Package here will definitely what you should try.

Function here of package here will specify the right path of a file within a project:
- here::here() will give you the project root path
- here::here("/a/path/relative/to/your/project/root") to specify the right path of a file within your package.

Homework

Finish reading Unit1-Module4.
Finish rewriting all five code cracker tasks using dplyr if you did not complete them earlier in class.
Read this online section to know better about tidy data.
Important: Finish Homework (“Tidyversing” the crop dataset section).

Spatial Data Analysis (450:320)

We have learned

Today

Tidy data

Tidy data

Tidy data

Tidy data

Tidy data

Introduction to tidyverse

dplyr vs base R

Run an example

Run an example

Run an example

Revisit Task 1 in code cracker

Pipeline

Pipeline

Your try

Your try (10 mins)

Read and write data

Paths

Strategies to work with paths

Homework

Introduction to `tidyverse`

`dplyr` vs base R