day5.2-slides

EDS 212: Day 5, Lecture 2

Probability continued, intuition for hypothesis tests, and Boolean algebra

August 8^th, 2025

Abraham Wald and survivorship bias

Hypothesis testing: building intuition, continued

You’ll learn about hypothesis testing in EDS 222. Let’s just build a bit more intuition here.

A common question: are means from two samples so different (considering data spread and sample size) that we think we have enough evidence to reject a null hypothesis that they were drawn from populations with the same mean?

Caveat, assumptions, caveat (EDS 222)…

What is a null hypothesis?

A null hypothesis (\(H_0\)) is the claim that the effect being studied does not exist. It is a hypothesis that proposes that there is no statistically significant difference between the data or variables being studied.

If the null hypothesis is true, any observed difference is due to chance alone.

Boolean logic

“In mathematics and mathematical logic, Boolean algebra is the branch of algebra in which the values of the variables are the truth values true and false, usually denoted 1 and 0, respectively” (Wikipedia)

Computer think

How would a computer order the objects in the following statements?

Nothing is better than a burrito
A loaf of bread is better than nothing

Mathematically…

Nothing > burrito: TRUE

Loaf of bread > nothing: TRUE

To a computer: Loaf of bread > nothing > burrito.

In environmental data science

Conditional statements
Filtering, subsetting, searching
Checking classes and verification
Testing

Logical operators

Logical “and”: &
Logical “or”: |
Logical “negate”: !

Comparison operators

Is equal to? ==
Is less than? <
Is less than or equal to? <=
Is greater than? >
Is greater than or equal to? >=
Is not equal to? !=

A computer evaluates these and the outcome is either TRUE or FALSE, and proceeds accordingly.

An important distinction:

==: This is…equal to?

=: This IS equal to.

5 == 4

> FALSE

Examples:

Elements of a vector are tested separately, and the outcome is returned in a vector:

marmot <- c(1,2,3)
marmot == 2

[1] FALSE  TRUE FALSE

pika <- c(1,2,5,9,10,15)
pika == 1 | pika >= 9

[1]  TRUE FALSE FALSE  TRUE  TRUE  TRUE

Checking data classes works similarly:

More on data types & structures in EDS 221!

bear <- c(1,4,3, NA, 6) # Create a vector
is.na(bear) # Check element by element for == NA?

[1] FALSE FALSE FALSE  TRUE FALSE

is.numeric(bear) # Checks entire *class* of vector

[1] TRUE

Another we’ll see often: %in%

%in%: check for matching elements (not in order)*

Example: We have two vectors of student names, and we want to know if any students (values) in eds212 are also in eds223:

eds212 <- c("John", "Cara", "Will", "Zoe")
eds223 <- c("John", "Melissa", "Cara", "Joe", "Zoe")

eds212 %in% eds223

[1]  TRUE  TRUE FALSE  TRUE

*Keep this in mind - the distinction between %in% and == is major and important.