class: title-slide <br> <br> .right-panel[ # Simulation and Control Structures ## Dr. Mine Dogucu ] --- class: middle ## Goals - Probability Distribution and simulating data in R - Control structures (if/else, for and while loops and mapping) --- class: middle - Go to course organization on GitHub - Start a new repo called `week-07-simulate-data-username` where username represents your own username. --- #### Probability Distributions - Normal ```r dnorm(x = -1.96, mean = 0, sd = 1) ``` ``` [1] 0.05844094 ``` ```r pnorm(q = -1.96, mean = 0, sd = 1) ``` ``` [1] 0.0249979 ``` ```r qnorm(p = 0.0249979, mean = 0, sd = 1, lower.tail = TRUE) ``` ``` [1] -1.96 ``` ```r rnorm(n = 3, mean = 0, sd = 1) ``` ``` [1] -1.7421388 0.2278699 -0.8605599 ``` --- ## Probability Distributions - Normal .pull-left[ ```r ggplot(data = data.frame(x = c(-3, 3)), aes(x)) + stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") + scale_y_continuous(breaks = NULL) ``` ] .pull-right[ <img src="07a-simulate-check_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] --- class: middle ## Other probability Functions - `dbinom()`, `pbinom()`, `qbinom()`, `rbinom()` - `dbeta()`, `pbeta()`, `qbeta()`, `rbeta()` - `dunif()`, `punif()`, `qunif()`, `runif()` --- class: middle ```r runif(1) ``` ``` [1] 0.1590387 ``` ```r set.seed(92697) runif(1) ``` ``` [1] 0.7773408 ``` `set.seed()` allows reproducibility of results when randomness is introduced. --- ## `while` loops .pull-left[ ```r count <- 0 while(count < 10) { count <- count + 1 print(count) } ``` ``` [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 ``` ] .pull-right[ ```r count <- 0 while(count < 10) { print(count) count <- count + 1 } ``` ``` [1] 0 [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 ``` ] --- ## if/else ```r if(condition) { ## do something } ## Rest of the code ``` ```r if(condition) { ## do something } else { ## do something else } ``` ```r if(condition) { ## do something } else if(another condition) { ## do something different } else { ## do something different } ``` --- ```r count <- 0 while(count < 10) { if(count < 5){ print(paste(count, "small number")) } count <- count + 1 } ``` ``` [1] "0 small number" [1] "1 small number" [1] "2 small number" [1] "3 small number" [1] "4 small number" ``` --- ```r count <- 0 while(count < 10) { if(count %% 2 == 0){ print(paste(count, "even number")) } count <- count + 1 } ``` ``` [1] "0 even number" [1] "2 even number" [1] "4 even number" [1] "6 even number" [1] "8 even number" ``` --- ```r count <- 0 while(count < 10) { if(count %% 2 == 0){ print(paste(count, "even number")) } else { print(paste(count, "odd number")) } count <- count + 1 } ``` ``` [1] "0 even number" [1] "1 odd number" [1] "2 even number" [1] "3 odd number" [1] "4 even number" [1] "5 odd number" [1] "6 even number" [1] "7 odd number" [1] "8 even number" [1] "9 odd number" ``` --- ## for loops ```r for (i in 1:10){ print(i) } ``` ``` [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 ``` --- ## for loops ```r sample_size <- c(30, 60, 100) for (i in 1:3){ print(sample_size[i]) } ``` ``` [1] 30 [1] 60 [1] 100 ``` ```r sample_size <- c(30, 60, 100) for (i in 1:length(sample_size)){ print(sample_size[i]) } ``` ``` [1] 30 [1] 60 [1] 100 ``` --- ### apply : R Documentation Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix. `apply(X, MARGIN, FUN, ..., simplify = TRUE)` `X` an array, including a matrix. `MARGIN` a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names. `FUN` the function to be applied: see ‘Details’. In the case of functions like +, %*%, etc., the function name must be backquoted or quoted. optional arguments to FUN. --- class: middle ```r some_matrix <- matrix(C <- (1:30), nrow = 5, ncol = 6) some_matrix ``` ``` [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 6 11 16 21 26 [2,] 2 7 12 17 22 27 [3,] 3 8 13 18 23 28 [4,] 4 9 14 19 24 29 [5,] 5 10 15 20 25 30 ``` ```r apply(some_matrix, 1, sum) # adding rows ``` ``` [1] 81 87 93 99 105 ``` ```r apply(some_matrix, 2, sum) # adding columns ``` ``` [1] 15 40 65 90 115 140 ``` --- class: middle ### lapply and sapply: R Documentation Apply a Function over a List or Vector Description `lapply` returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. `sapply` is a user-friendly version and wrapper of lapply by default returning a vector, matrix or, if simplify = "array", an array if appropriate, by applying simplify2array(). sapply(x, f, simplify = FALSE, USE.NAMES = FALSE) is the same as lapply(x, f). --- class: middle .pull-left[ ```r sapply(c(0, 1, 2), exp) ``` ``` [1] 1.000000 2.718282 7.389056 ``` ] .pull-right[ ```r lapply(c(0, 1, 2), exp) ``` ``` [[1]] [1] 1 [[2]] [1] 2.718282 [[3]] [1] 7.389056 ``` ] --- class: middle .pull-left[ ```r sapply(c(0, 1, 2), exp) ``` ``` [1] 1.000000 2.718282 7.389056 ``` ] .pull-right[ ```r lapply(c(0, 1, 2), exp) %>% unlist() ``` ``` [1] 1.000000 2.718282 7.389056 ``` ] --- class: middle This week's (long) task: How does missing data (missing completely at random) impact bias and variance in simple linear regression? Design a simulation to answer this question. You are in charge of developing sub-questions.