Spatial Data Analysis (450:320)

Class 10

We have learned

Fundamentals of R.
Functions, packages, environment, and namespace.
Reproducibility.
Getting help.
Git/GitHub
Indexing, subsetting, and replacing
*apply family
tidy data and tidyverse

The answer for all the in-class practices so far: code_cracker1_answer.Rmd.

I have updated answers for previous homework.

Today

Read, write data and paths
Data combining
Data reshaping and renaming
Data summaries

Directory vs. file

dir (folder):

Full path:
- /Users/leisong/sdawr
- C:\Users\lsong\ls320
Relative path (relative to the working directory):
- R
- vignettes

file:

/Users/leisong/sdawr/README.md (full path)
vignettes\unit1_module1.Rmd (relative path)

Your turn

Using your package as the example, give me four paths:

A full path of the folder “R” in your package.
A relative path of the folder “man” in your package.
A full path of the file “my_calc.R” in your package.
A relative path of the file “my_calc.md” in your package.

Post your answers in Google chat.

Paths

Interactive environment (e.g. console):
- getwd() to get the working directory
- setwd() to set the new working directory
- relative paths are relative to this working directory.
R markdown knit directory:
- Document directory
- Project directory
- Current working directory
R script
- Current working directory

Strategies to work with paths

Package here will definitely what you should try.

Function here of package here will specify the right path of a file within a project:
- here::here() will give you the project root path
- here::here("/a/path/relative/to/your/project/root") to specify the right path of a file within your package.

Strategies to work with paths

Check, create, or delete directory/file
- dir.exists() and file.exists() to check if a directory or file exists.
- dir.create() to create a directory.
- file.remove() or file.rename() to delete or rename a file.
List files/folders within a directory:
- list.files() to list files with arguments pattern or full.names based on needs.
  - E.g. list.files('dir_path', pattern = 'csv$', full.names = T).
- list.dirs() to list folders. Works similar to list.files().

Read and write data

read.csv to read csv file
write.csv to save csv file

# Base R
something <- read.csv("/path/to/the/file", stringsAsFactors = FALSE)
write.csv(something, "/path/for/the/new/file", row.names = FALSE)

The path can be either a local file path or a web (online) URL.

file.path to construct a path, e.g,

fname <- file.path("/Users/leisong/sdawr", "new_script.R")
fname

[1] "/Users/leisong/sdawr/new_script.R"

Practice 1

Download the crop recommendation file by right clicking here.
Put it under your class note folder.
Use code to create a new folder crops under class note folder.
Split the crop recommendation table into sub-tables by crop types and save them to new crops folder as csv files.

[1] "crops/rice.csv"        "crops/maize.csv"       "crops/chickpea.csv"   
[4] "crops/kidneybeans.csv" "crops/pigeonpeas.csv"  "crops/mothbeans.csv"  
[7] "crops/mungbean.csv"    "crops/blackgram.csv"   "crops/lentil.csv"

Hints:

Packages: here, dplyr
Useful function: read.csv, dir.create, sapply, file.path, write.csv.
Hint for the path: file.path(here('your_class_note/crops'), paste0(crop_type, '.csv'))

Practice 2

Now let’s do reverse:

read all sub-tables in crops folder and merge them back to a whole table and
save out as a new file in your class note folder named as “crop_yourname.csv”.
- Useful function: list.files, read.csv, *apply, write.csv.
- The magic of do.call(), rbind and lapply.

Data combining

*_join function in dplyr package.

Create the example data

crop_rmd <- read.csv(here('docs/Crop_recommendation.csv')) 

crop_rmd2 <- crop_rmd %>% 
    group_by(label) %>% 
    summarise_at(c('temperature', 'humidity', 'rainfall',
                   'N', 'P', 'K', 'ph'), mean) %>% data.frame()
crop_rmd_weather <- crop_rmd2 %>% 
    select(label, temperature, humidity, rainfall) %>% 
    slice(-c(1:2))
crop_rmd_soil <- crop_rmd2 %>% 
    select(label, N, P, K, ph) %>% 
    slice(1:18)

Run these lines to observe differences

left_join(crop_rmd_weather, crop_rmd_soil, by = 'label')
right_join(crop_rmd_weather, crop_rmd_soil, by = 'label')
inner_join(crop_rmd_weather, crop_rmd_soil, by = 'label')
full_join(crop_rmd_weather, crop_rmd_soil, by = 'label')

Data reshaping

pivot_longer() in tidyr package.
pivot_wider() in tidyr package.

Data reshaping

Practice

Make the crop recommendation data to longer like this:

# A tibble: 5 × 3
  label variable    value
  <chr> <chr>       <dbl>
1 rice  N            90  
2 rice  P            42  
3 rice  K            43  
4 rice  temperature  20.9
5 rice  humidity     82.0

Then convert it back.

Renaming

rename() in dplyr package.

crop_rmd <- crop_rmd %>% rename(precipitation = rainfall)
head(crop_rmd)

   N  P  K temperature humidity       ph precipitation label
1 90 42 43    20.87974 82.00274 6.502985      202.9355  rice
2 85 58 41    21.77046 80.31964 7.038096      226.6555  rice
3 60 55 44    23.00446 82.32076 7.840207      263.9642  rice
4 74 35 40    26.49110 80.15836 6.980401      242.8640  rice
5 78 42 42    20.13017 81.60487 7.628473      262.7173  rice
6 69 37 42    23.05805 83.37012 7.073454      251.0550  rice

Summarizing

group_by and summarize in dplyr package.

tp_mean <- crop_rmd %>% 
    group_by(label) %>% 
    summarise(temp_mean = mean(temperature),
              precp_mean = mean(precipitation))
tp_mean

# A tibble: 22 × 3
   label       temp_mean precp_mean
   <chr>           <dbl>      <dbl>
 1 apple            22.6      113. 
 2 banana           27.4      105. 
 3 blackgram        30.0       67.9
 4 chickpea         18.9       80.1
 5 coconut          27.4      176. 
 6 coffee           25.5      158. 
 7 cotton           24.0       80.4
 8 grapes           23.8       69.6
 9 jute             25.0      175. 
10 kidneybeans      20.1      106. 
# ℹ 12 more rows

Summarizing

More summarize functions.

# Another example
all_mean <- crop_rmd %>% 
    group_by(label) %>% 
    summarise_at(c('temperature', 'humidity', 'precipitation',
                 'N', 'P', 'K', 'ph'), mean)
all_mean

# A tibble: 22 × 8
   label       temperature humidity precipitation     N     P     K    ph
   <chr>             <dbl>    <dbl>         <dbl> <dbl> <dbl> <dbl> <dbl>
 1 apple              22.6     92.3         113.   20.8 134.  200.   5.93
 2 banana             27.4     80.4         105.  100.   82.0  50.0  5.98
 3 blackgram          30.0     65.1          67.9  40.0  67.5  19.2  7.13
 4 chickpea           18.9     16.9          80.1  40.1  67.8  79.9  7.34
 5 coconut            27.4     94.8         176.   22.0  16.9  30.6  5.98
 6 coffee             25.5     58.9         158.  101.   28.7  29.9  6.79
 7 cotton             24.0     79.8          80.4 118.   46.2  19.6  6.91
 8 grapes             23.8     81.9          69.6  23.2 133.  200.   6.03
 9 jute               25.0     79.6         175.   78.4  46.9  40.0  6.73
10 kidneybeans        20.1     21.6         106.   20.8  67.5  20.0  5.75
# ℹ 12 more rows

Homework

Assignment 3 due this week.
Important: Weekly homework (Spotted Hyenas Distribution).