We have learned

  • Fundamentals of R.
  • Functions, packages, environment, and namespace.
  • Reproducibility.
  • Getting help.
  • Git/GitHub
  • Indexing, subsetting, and replacing
  • *apply family
  • tidy data and tidyverse

The answer for all the in-class practices so far: code_cracker1_answer.Rmd.

I have updated answers for previous homework.

Today

  • Read, write data and paths
  • Data combining
  • Data reshaping and renaming
  • Data summaries

Directory vs. file

dir (folder):

  • Full path:

    • /Users/leisong/sdawr
    • C:\Users\lsong\ls320
  • Relative path (relative to the working directory):

    • R
    • vignettes

file:

  • /Users/leisong/sdawr/README.md (full path)
  • vignettes\unit1_module1.Rmd (relative path)

Your turn

Using your package as the example, give me four paths:

  • A full path of the folder “R” in your package.
  • A relative path of the folder “man” in your package.
  • A full path of the file “my_calc.R” in your package.
  • A relative path of the file “my_calc.md” in your package.

Post your answers in Google chat.

Paths

  • Interactive environment (e.g. console):

    • getwd() to get the working directory
    • setwd() to set the new working directory
    • relative paths are relative to this working directory.
  • R markdown knit directory:

    • Document directory
    • Project directory
    • Current working directory
  • R script

    • Current working directory

Strategies to work with paths

Package here will definitely what you should try.

  • Function here of package here will specify the right path of a file within a project:

    • here::here() will give you the project root path
    • here::here("/a/path/relative/to/your/project/root") to specify the right path of a file within your package.

Strategies to work with paths

  • Check, create, or delete directory/file

    • dir.exists() and file.exists() to check if a directory or file exists.
    • dir.create() to create a directory.
    • file.remove() or file.rename() to delete or rename a file.
  • List files/folders within a directory:

    • list.files() to list files with arguments pattern or full.names based on needs.

      • E.g. list.files('dir_path', pattern = 'csv$', full.names = T).
    • list.dirs() to list folders. Works similar to list.files().

Read and write data

  • read.csv to read csv file
  • write.csv to save csv file
# Base R
something <- read.csv("/path/to/the/file", stringsAsFactors = FALSE)
write.csv(something, "/path/for/the/new/file", row.names = FALSE)

The path can be either a local file path or a web (online) URL.

  • file.path to construct a path, e.g,
fname <- file.path("/Users/leisong/sdawr", "new_script.R")
fname
[1] "/Users/leisong/sdawr/new_script.R"

Practice 1

  • Download the crop recommendation file by right clicking here.
  • Put it under your class note folder.
  • Use code to create a new folder crops under class note folder.
  • Split the crop recommendation table into sub-tables by crop types and save them to new crops folder as csv files.
[1] "crops/rice.csv"        "crops/maize.csv"       "crops/chickpea.csv"   
[4] "crops/kidneybeans.csv" "crops/pigeonpeas.csv"  "crops/mothbeans.csv"  
[7] "crops/mungbean.csv"    "crops/blackgram.csv"   "crops/lentil.csv"     

Hints:

  • Packages: here, dplyr
  • Useful function: read.csv, dir.create, sapply, file.path, write.csv.
  • Hint for the path: file.path(here('your_class_note/crops'), paste0(crop_type, '.csv'))

Practice 2

Now let’s do reverse:

  • read all sub-tables in crops folder and merge them back to a whole table and

  • save out as a new file in your class note folder named as “crop_yourname.csv”.

    • Useful function: list.files, read.csv, *apply, write.csv.
    • The magic of do.call(), rbind and lapply.

Data combining

  • *_join function in dplyr package.

Create the example data

crop_rmd <- read.csv(here('docs/Crop_recommendation.csv')) 

crop_rmd2 <- crop_rmd %>% 
    group_by(label) %>% 
    summarise_at(c('temperature', 'humidity', 'rainfall',
                   'N', 'P', 'K', 'ph'), mean) %>% data.frame()
crop_rmd_weather <- crop_rmd2 %>% 
    select(label, temperature, humidity, rainfall) %>% 
    slice(-c(1:2))
crop_rmd_soil <- crop_rmd2 %>% 
    select(label, N, P, K, ph) %>% 
    slice(1:18)


Run these lines to observe differences

left_join(crop_rmd_weather, crop_rmd_soil, by = 'label')
right_join(crop_rmd_weather, crop_rmd_soil, by = 'label')
inner_join(crop_rmd_weather, crop_rmd_soil, by = 'label')
full_join(crop_rmd_weather, crop_rmd_soil, by = 'label')

Data reshaping

  • pivot_longer() in tidyr package.
  • pivot_wider() in tidyr package.

Data reshaping

Practice

  • Make the crop recommendation data to longer like this:
# A tibble: 5 × 3
  label variable    value
  <chr> <chr>       <dbl>
1 rice  N            90  
2 rice  P            42  
3 rice  K            43  
4 rice  temperature  20.9
5 rice  humidity     82.0
  • Then convert it back.

Renaming

  • rename() in dplyr package.
crop_rmd <- crop_rmd %>% rename(precipitation = rainfall)
head(crop_rmd)
   N  P  K temperature humidity       ph precipitation label
1 90 42 43    20.87974 82.00274 6.502985      202.9355  rice
2 85 58 41    21.77046 80.31964 7.038096      226.6555  rice
3 60 55 44    23.00446 82.32076 7.840207      263.9642  rice
4 74 35 40    26.49110 80.15836 6.980401      242.8640  rice
5 78 42 42    20.13017 81.60487 7.628473      262.7173  rice
6 69 37 42    23.05805 83.37012 7.073454      251.0550  rice

Summarizing

  • group_by and summarize in dplyr package.
tp_mean <- crop_rmd %>% 
    group_by(label) %>% 
    summarise(temp_mean = mean(temperature),
              precp_mean = mean(precipitation))
tp_mean
# A tibble: 22 × 3
   label       temp_mean precp_mean
   <chr>           <dbl>      <dbl>
 1 apple            22.6      113. 
 2 banana           27.4      105. 
 3 blackgram        30.0       67.9
 4 chickpea         18.9       80.1
 5 coconut          27.4      176. 
 6 coffee           25.5      158. 
 7 cotton           24.0       80.4
 8 grapes           23.8       69.6
 9 jute             25.0      175. 
10 kidneybeans      20.1      106. 
# ℹ 12 more rows

Summarizing

  • More summarize functions.
# Another example
all_mean <- crop_rmd %>% 
    group_by(label) %>% 
    summarise_at(c('temperature', 'humidity', 'precipitation',
                 'N', 'P', 'K', 'ph'), mean)
all_mean
# A tibble: 22 × 8
   label       temperature humidity precipitation     N     P     K    ph
   <chr>             <dbl>    <dbl>         <dbl> <dbl> <dbl> <dbl> <dbl>
 1 apple              22.6     92.3         113.   20.8 134.  200.   5.93
 2 banana             27.4     80.4         105.  100.   82.0  50.0  5.98
 3 blackgram          30.0     65.1          67.9  40.0  67.5  19.2  7.13
 4 chickpea           18.9     16.9          80.1  40.1  67.8  79.9  7.34
 5 coconut            27.4     94.8         176.   22.0  16.9  30.6  5.98
 6 coffee             25.5     58.9         158.  101.   28.7  29.9  6.79
 7 cotton             24.0     79.8          80.4 118.   46.2  19.6  6.91
 8 grapes             23.8     81.9          69.6  23.2 133.  200.   6.03
 9 jute               25.0     79.6         175.   78.4  46.9  40.0  6.73
10 kidneybeans        20.1     21.6         106.   20.8  67.5  20.0  5.75
# ℹ 12 more rows

Homework

  • Assignment 3 due this week.
  • Important: Weekly homework (Spotted Hyenas Distribution).