The code for the alive map:

library(leaflet)

m <- leaflet() %>%
    addTiles() %>%  # Add default OpenStreetMap map tiles
    addMarkers(lng = -74.435019,
               lat = 40.521036,
               popup = "<b>B266, Lucy Stone Hall</b>") # Add a clickable marker
m      # Print the map

This week

  • Data operation with basic R

Today

  • Get help as a skillset
  • Simulation and sampling
  • Indexing, subsetting and replacing

Get help as a skillset

Error message often tells you what is wrong.

  • What’s wrong?

    my_number_checker(12)
    Error in my_number_checker(12) : 
      could not find function "my_number_checker"
  • How about this one?

    ls320::my_number_checker(12, 2)
    Error in ls320::my_number_checker(12, 2) : unused argument (2)
  • Sometimes, it gets trickier

    systemfonts/libs/systemfonts.so.dSYM/Contents/Resources/DWARF/systemfonts.so: truncated gzip input: Unknown error: -1
    tar: Error exit delayed from previous errors.
    Error: file ‘/var/folders/ft/75t6c_c16yq_6yyw38w131nm0000gn/T//RtmpcMt7ru/downloaded_packages/systemfonts_1.3.1.tgz’ is not a macOS binary package
    In addition: There were 16 warnings (use warnings() to see them)

Simulation and sampling

  • runif, generate random number from a uniform distribution.
  • rnorm, generate random number from a normal distribution.
  • sample, get samples from a vector.

Set seed

In terms of reproducibility, it is a MUST to set seed (function set.seed()) every time before using any simulation and sampling functions.

# run one time
runif(5, 0, 1)

# run another time, get the different numbers
runif(5, 0, 1)

# set seed
set.seed(10)
runif(5, 0, 1)

# run again, get the same numbers
set.seed(10)
runif(5, 0, 1)

Set seed

Different seed will control to generate different numbers.

# use seed 10
set.seed(10)
rnorm(5, mean = 1, sd = 2)

# use seed 11
set.seed(11)
rnorm(5, mean = 1, sd = 2)

# use seed 567
set.seed(567)
rnorm(5, mean = 1, sd = 2)

Set seed

set.seed() sets the random number generator state. All random functions draw from and advance this state. It is good practice to call set.seed() immediately before a random operation every time if you want that result to be reproducible.

# Set seed
set.seed(11)
sample(1:100, 10)

sample(1:100, 10)

# Set the same seed
set.seed(11)
sample(1:100, 10)

sample(1:100, 10)

Take a guess: is it completely safe?

Create your own data

Create the following 1-d structures (1):

  • a: a random vector of integers with 10 elements drawn from 1-20:
    • Use the sample function with set.seed(10)
    • Name the elements of a with a vector of names starting with “V1” and ending with “V10”.
      • Use the paste0 function to create those names.
      • Create the identical vector of names using the paste function.

Create your own data

Create the following 1-d structures (2):

  • b: randomly select 10 elements from letters using seed 10.
    • Think about different ways
  • d: a random vector of integers with 10 elements from a normal distribution with a mean = 100 and an sd of 20:
    • Use rnorm function with set.seed(12).

Q: why do we skip c?

Create your own data

Create the following 1-d structures (3):

  • Create a list l from a, b, d.
    • Assign the name “a”, “b”, and “d” for the corresponding l’s element.

Create your own data

Create the following 2-d structures (1):

  • m: a matrix with three integer columns named “V1”, “V2”, “V3”
    • Create each column first as its own vector, then combine
      • V1 is 1:10
      • V2 is a random sample between 1:100 using set.seed(50)
      • V3 is drawn from a random uniform distribution (runif) between 0 and 50. Use the same seed as before.
    • Inspect the str and class of m

Create your own data

Create the following 2-d structures (2):

  • dat, a data.frame built from V1, V2, V3, and V4
    • V4 is a random selection of the letters A-E (Try LETTERS?), use the same seed as above.

Indexing

  • Vectors

    • A vector of positive number(s), e.g. x[c(1, 3, 5)]
    • A vector of negative number(s), e.g. x[-c(1:2)]
    • A vector of names, e.g. x[c('name1', 'name3')]
    • A vector of logical, e.g. x[c(F, T, T, F)]. But be careful of the length.
    • Nothing, e.g. x[]. Got a full vector
    • Zero, e.g. x[0]. Got a vector with length 0

Be careful of index out of bound!!

Indexing

  • Array/Matrix

    • Each dimension is the same as vector indexing.
    • Matrix m[row, col], the thing before comma is row, and the thing after comma is the col.
    • Array a[row, col, higher-dim].
    • E.g. m[1:3, ] or a[c(1, 3, 4), , c(T, F, F, T)].

Indexing

  • List

    • If we take each item within a list as a list, all vector indexing could work on list. E.g. l[1], l[1:3], l[c(T, F, T)], l[c('name1', 'name3')].
    • But every time we will still get a list return.
    • In order to get a simple vector, we could use [[]] or $. E.g. l[[1]] or l[1][[1]], l[['id']], l$id.
    • [[]] only can get one item from a list as a vector each time.

Indexing

Data.frame

  • The matrix syntax df[row, col] works for data.frame.
  • The list syntax also works, but the columns of data.frame correspond to elements of list. E.g. df[col], df[c('col_name')].
  • The [[]] and $ syntax to get the simple vector also works. E.g. df[1][[1]], df[['id']], df$id.

Changing values

All indexing methods to get the values out can also be used to change values by just assigning new values.

1-d Indexing/subsetting/replacing

  • Select the 1st, 2nd, and 10th elements from a
  • Select the elements of a named V1, V2, V3 (use the names)
  • Replace from the second to the last value of a with the word “sasquatch”
    • Use code to find the index value, not count by yourself.

1-d Indexing/subsetting/replacing

Practice using logical for indexing:

  • Select from b the values “c”, “d”, “e” if there is any (%in%)
    • Check how to use %in% by reading its documentation.
  • Identify the index position in b of values “c”, “d”, “e” if there is any
    • Use function which
  • Select the first 5 values of d and the last 5 values of d into two separate vectors and multiply them.
  • Select from d all values > 100:
    • How many values are there?
  • Select from d values between 95 - 105, and replace them with 100

1-d Indexing/subsetting/replacing

Repeat these steps, but do it by accessing a, b, and d from l:

  • Select the 1st, 2nd, and 10th elements from a
  • Replace from the second to the last value of a with the word “sasquatch”
    • Use code to find the index value, not count by yourself.
  • Select from b the values “c”, “d”, “e” if there is any (%in%)
    • Check how to use %in% by reading its documentation.
  • Select from d values between 95 - 105, and replace them with 100

2-d Indexing/subsetting/replacing

  • Select the first 10 values from m.
  • Use a single vector to select the last row, column value from m
    • Think about different ways!
  • Replace the value selected in step 2 with -99
  • Now select row 3, columns 1:2 from m, and replace them with their values multiplied by 10
  • Do the same, but select the columns by their name, and reset the new values by dividing by 10

2-d Indexing/subsetting/replacing

Continue:

  • Select from dat the values of V3, and square them. Do it using index notation in different ways, such as column name in [], and $
  • Subset the first two rows and columns of dat into a new data.frame datss.
  • Replace dat rows 1:2, column 1:2 with the values -1:-4
  • Reset the part of dat you just changed with the values in datss

Useful summary functions

Only works for numbers. Try these for m:

  • rowsSums
  • colSums
  • rowMeans
  • colMeans

Homework