Interactive Session 1B

1. Projects

…one small step for a programmer, one giant leap for reproducibility

Create an R Project:

In R, Session > New Session (make this a frequent habit)
File > New Project…
For now, New Directory (but version control coming up soon…) > New Project
Give your project a name (e.g. eds212-day1b)
Click Browse to choose where to put it (this will create a folder on your computer)
Create Project

Discussion - what does this do? Where does it live on your computer? What does it contain?

It helps to stay organized!

Consider how you’ll want to organize all your MEDS projects / code. This is totally up to you, though one suggestion is to create a MEDS/ folder in your computer’s root directory, then a folder for each course (e.g. EDS-212), then add your R Project (e.g. eds212-day1b) inside the appropriate course folder.

Make a new Quarto doc in your project, then follow along (adding notes in the body of your Quarto doc using markdown) with the rest of the session.

There are a couple ways to create a new Quarto doc…

In the top left corner of RStudio, you should see a file with a green “+” icon. Click on that button, then choose Quarto document…. This will open a popup window which will prompt you to provide a title, and optionally, your name as the author. You can also select which output format (HTML, PDF, Word Doc) you’d like to render your Quarto document to (we recommend sticking with HTML). When you click Create, a Quarto doc with a pre-filled YAML header, along with some example text / code (you’ll want to delete this) will be opened for you to edit.
In the Files pane, click on the New Blank File drop down to select Quarto doc…. This will open a completely blank Quarto doc without a YAML header for you to edit. You can create your own YAML by typing a set of gates (---) and adding any desired YAML options between them.

2. Exponents and logs in R

Recall from lecture that logarithms ask a question

\(log_a(b)\) asks, “to what power do I have to raise a to get a value of b?

Some useful “base R” functions:

log() == natural log, aka:
- \(ln()\)
- \(log_e()\)
log10() == log base 10, aka:
- \(log_{10}()\)
exp() == natural exponential, aka:
- \(f(x) = exp(x)\)
- \(f(x) = e^x\)

What is “base R?”

R is distributed with some helpful base packages, meaning when you install R, a series of packages (containing functions, including those shown above) are also installed. You may hear these referred to as “base R” functions, which makes reference to the fact that they come pre-installed (i.e. you don’t need to install / load additional packages to use them).

Let’s try some! Remember, you’ll need to add a code chunk to write the following code in!

# Euler's number (e) ---
exp(1)

# all three result in the same value (e^2) ----
exp(2)
exp(1)^2
exp(1)*exp(1)

# "to what power do I have to raise `e` to get a value of `e^10.4`?" ---
# recall from lecture slide: https://eds-212-essential-math.github.io/course-materials/slides/day1.2-slides.html#/logarithms
log(exp(1)^10.4)

# "to what power do I have to raise 10 to get a value of 100?" ----
log10(100)

# "to what power do I have to raise 2 to get a value of 16?" ----
logb(x = 16, base = 2)

3. Making sequences in R

Sometimes we’ll want to create sequences of values that we can plug into a function to see how an output value changes over a range of inputs.

We can make a sequence of values, stored as a vector in R, using the seq() function. The general structure looks like this:

seq(from = start_value, to = end_value, by = increment)`

For example, to create a sequence from 2 to 18 by increments of 0.3, I would use:

seq(from = 2, to = 18, by = 0.3)

 [1]  2.0  2.3  2.6  2.9  3.2  3.5  3.8  4.1  4.4  4.7  5.0  5.3  5.6  5.9  6.2
[16]  6.5  6.8  7.1  7.4  7.7  8.0  8.3  8.6  8.9  9.2  9.5  9.8 10.1 10.4 10.7
[31] 11.0 11.3 11.6 11.9 12.2 12.5 12.8 13.1 13.4 13.7 14.0 14.3 14.6 14.9 15.2
[46] 15.5 15.8 16.1 16.4 16.7 17.0 17.3 17.6 17.9

Note that the above sequence ends at 17.9 (the last complete increment). Another option is to specify the length of the output vector instead - like “I want to have 30 values between 2 and 18, evenly spaced.” To do that, use the length = argument within the seq() function.

seq(from = 2, to = 18, length = 30)

 [1]  2.000000  2.551724  3.103448  3.655172  4.206897  4.758621  5.310345
 [8]  5.862069  6.413793  6.965517  7.517241  8.068966  8.620690  9.172414
[15]  9.724138 10.275862 10.827586 11.379310 11.931034 12.482759 13.034483
[22] 13.586207 14.137931 14.689655 15.241379 15.793103 16.344828 16.896552
[29] 17.448276 18.000000

4. Make the logistic growth function. . . function

Let’s make a function of the logistic growth equation. Recall, the expression for population size at any time t following logistic growth is given by:

\[N_t=\frac{K}{1+[\frac{K-N_0}{N_0}]e^{-rt}}\]

Let’s write it out. When in doubt, parentheses! Keep in mind that you may want to make your argument names something a bit more descriptive. Always ask: What will make future me least likely to mess this up? What would make these function arguments clearest to my collaborators?

Solution

pop_logistic <- function(capacity, init_pop, rate, time_yr) {
  capacity / (1 + ((capacity - init_pop) / init_pop) * exp(-rate * time_yr))
}

Logistic population - one time

Let’s say that for a population of chipmunks in one region, the carrying capacity is 2,580 individuals, the exponential growth rate is 0.32 (yr^-1), and time is in years. If the initial population is 230 individuals, what is the estimated population size a time = 2.4 years?

Solution

pop_logistic(capacity = 2580, init_pop = 230, rate = 0.32, time_yr = 2.4)

[1] 449.4572

Logistic population - a lot of times

Now let’s say we want to predict (then plot) the estimated population over a bunch of different times. Based on what we’ve learned today, how do you expect we might do that? A sequence of values as the time input!

Let’s make a sequence of times (0 to 20 years, by 1/2 year increments), then use that vector as our time input in the logistic growth model.

# First, create the vector (a sequence of values) ----
time_vec <- seq(from = 0, to = 20, by = 0.5)

# Then, use that as your time input in the model ----
pop_logistic(capacity = 2580, init_pop = 230, rate = 0.32, time_yr = time_vec)

 [1]  230.0000  265.7962  306.4370  352.3458  403.9105  461.4584  525.2265
 [8]  595.3303  671.7323  754.2132  842.3511  935.5115 1032.8508 1133.3382
[15] 1235.7931 1338.9376 1441.4593 1542.0771 1639.6038 1733.0001 1821.4121
[22] 1904.1943 1980.9141 2051.3424 2115.4329 2173.2940 2225.1574 2271.3465
[29] 2312.2467 2348.2800 2379.8838 2407.4939 2431.5322 2452.3984 2470.4641
[36] 2486.0700 2499.5249 2511.1059 2521.0597 2529.6041 2536.9311

We want to plot those estimated population sizes - but we didn’t store the vector of outputs! Remember - if you want to store an output, using the assignment operator (<-) in R, and check that it exists in your environment.

# assign your model to an object (here, that's called `chipmunk_pop`) ----
chipmunk_pop <- pop_logistic(capacity = 2580, init_pop = 230, rate = 0.32, time_yr = time_vec)

# Then we can call `chipmunk_pop` ----
chipmunk_pop

 [1]  230.0000  265.7962  306.4370  352.3458  403.9105  461.4584  525.2265
 [8]  595.3303  671.7323  754.2132  842.3511  935.5115 1032.8508 1133.3382
[15] 1235.7931 1338.9376 1441.4593 1542.0771 1639.6038 1733.0001 1821.4121
[22] 1904.1943 1980.9141 2051.3424 2115.4329 2173.2940 2225.1574 2271.3465
[29] 2312.2467 2348.2800 2379.8838 2407.4939 2431.5322 2452.3984 2470.4641
[36] 2486.0700 2499.5249 2511.1059 2521.0597 2529.6041 2536.9311

5. Make a plot!

You will learn a lot more about data visualization throughout MEDS. But let’s make a first rough visualization just for fun using the {ggplot2} package, which is part of the {tidyverse} (more on this in EDS 221).

I really want to know what the {tidyverse} is now though!!

The {tidyverse} is actually a collection of R packages that make doing data science in R (think, data cleaning / wrangling / visualizing) really enjoyable. All of the {tidyverse} packages share a similar design philosophy, grammar, and data structures.

While you can install / import each of the tidyverse component packages separately (e.g. install.packages("ggplot2"), then import using library(ggplot2)), you’ll commonly see folks install the entire {tidyverse} by running:

install.packages("tidyverse")

then load all of the {tidyverse} at the top of a script / Quarto doc:

# load libraries ----
library(tidyverse)

We’ll do this together in the next exercise!

Let’s first combine our time sequence (time_vec) and predicted populations (chipmunk_pop) into a single data frame - a table of data where different vectors (we’ll think of these as variables moving forward) are stored in columns.

# combine `time_vec` and `chipmump_pop` into a data frame ----
chipmunk_df <- data.frame(time_vec, chipmunk_pop)

# ALWAYS look ----
head(chipmunk_df)

  time_vec chipmunk_pop
1      0.0     230.0000
2      0.5     265.7962
3      1.0     306.4370
4      1.5     352.3458
5      2.0     403.9105
6      2.5     461.4584

Load the tidyverse using library(tidyverse), then follow along while we rave about the grammar of graphics to make a basic ggplot graph:

Install the {tidyverse} if you haven’t done so already

Before you’re able to load the {tidyverse} in your Quarto doc, it needs to be installed. If you don’t already done so, you can run install.packages("tidyverse"). in your RStudio Console.

# load libraries ----
library(tidyverse)

# create plot ----
ggplot(data = chipmunk_df, aes(x = time_vec, y = chipmunk_pop)) +
  geom_point()

6. No precious objects or outputs!

Save your .qmd, which lives in your project.
Close your whole project (File > Close project)
Restart your R session (Session > Restart R) & check environment
Find wherever your project lives on your computer
Open the .Rproj file (NOT the .qmd on its own - don’t orphan your project files)
Check for clues that you’re in your project
In the Files tab of RStudio, click on the .qmd you saved
Use Command/Control + Option + R to run all code in your .qmd
Check to see that all objects and outputs are automatically reproduced

End interactive session 1B