EDS 212: Day 5, Lecture 2

Probability continued, intuition for hypothesis tests, and Boolean algebra


August 9th, 2024

Abraham Wald and survivorship bias


Hypothesis testing: building intuition, continued



You’ll learn about hypothesis testing in EDS 222. Let’s just build a bit more intuition here.


A common question: are means from two samples so different (considering data spread and sample size) that we think we have enough evidence to reject a null hypothesis that they were drawn from populations with the same mean?


Caveat, assumptions, caveat (EDS 222)…

What is a null hypothesis?



A null hypothesis (\(H_0\)) is the claim that the effect being studied does not exist. It is a hypothesis that proposes that there is no statistically significant difference between the data or variables being studied.


If the null hypothesis is true, any observed difference is due to chance alone.


Boolean logic






“In mathematics and mathematical logic, Boolean algebra is the branch of algebra in which the values of the variables are the truth values true and false, usually denoted 1 and 0, respectively” (Wikipedia)

Computer think




How would a computer order the objects in the following statements?


  • Nothing is better than a burrito
  • A loaf of bread is better than nothing

Mathematically…



  • Nothing > burrito: TRUE


  • Loaf of bread > nothing: TRUE


To a computer: Loaf of bread > nothing > burrito.

In environmental data science



  • Conditional statements
  • Filtering, subsetting, searching
  • Checking classes and verification
  • Testing

Logical operators



  • Logical “and”: &
  • Logical “or”: |
  • Logical “negate”: !

Comparison operators


  • Is equal to? ==
  • Is less than? <
  • Is less than or equal to? <=
  • Is greater than? >
  • Is greater than or equal to? >=
  • Is not equal to? !=

A computer evaluates these and the outcome is either TRUE or FALSE, and proceeds accordingly.

An important distinction:



==: This is…equal to?


=: This IS equal to.


5 == 4

> FALSE

Examples:



Elements of a vector are tested separately, and the outcome is returned in a vector:

marmot <- c(1,2,3)
marmot == 2
[1] FALSE  TRUE FALSE


pika <- c(1,2,5,9,10,15)
pika == 1 | pika >= 9
[1]  TRUE FALSE FALSE  TRUE  TRUE  TRUE

Checking data classes works similarly:



More on data types & structures in EDS 221!


bear <- c(1,4,3, NA, 6) # Create a vector
is.na(bear) # Check element by element for == NA?
[1] FALSE FALSE FALSE  TRUE FALSE
is.numeric(bear) # Checks entire *class* of vector
[1] TRUE

Another we’ll see often: %in%



%in%: check for matching elements (not in order)*


Example: We have two vectors of student names, and we want to know if any students (values) in eds212 are also in eds223:


eds212 <- c("John", "Cara", "Will", "Zoe")
eds223 <- c("John", "Melissa", "Cara", "Joe", "Zoe")

eds212 %in% eds223
[1]  TRUE  TRUE FALSE  TRUE


*Keep this in mind - the distinction between %in% and == is major and important.