class: title-slide <br> <br> .right-panel[ # Visaluzing Data: <br>Part I ## Dr. Mine Dogucu ] --- class: middle ## Review --- class: middle [How LGBTQ+ hate crime is committed by young people against young people](https://www.bbc.com/news/uk-46543874) [Why Time Flies](https://maximiliankiener.com/digitalprojects/time/) [Mandatory Paid Vacation](https://www.instagram.com/p/CE1kpM5FhWR/?utm_source=ig_web_copy_link) [Why are K-pop groups so big?](https://pudding.cool/2020/10/kpop/) --- class: middle Data Visualizations - are graphical representations of data -- - use different colors, shapes, and the coordinate system to summarize data -- - tell a story -- - are useful for exploring data --- class:inverse middle .font75[Visuals with a Single Categorical Variable] --- ## Bar plot .pull-left[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] --- class:inverse middle .font75[Visuals with a Single Numeric Variable] --- ## Box plot .pull-left[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] .pull-right[ - The horizontal line inside the box represents the median. - The box itself represents the middle 50% of the data with Q3 on the upper end and Q1 on the lower end. - Whiskers extend from the box. They can extend up to 1.5 IQR away from the box (i.e. away from Q1 and Q3). - The points are potential outliers that represent babies with really low or high birth weight. ] --- ## Histogram .pull-left[ Bin width = 5 ounces <img src="02a-data-viz_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] .pull-right[ Bin width = 20 ounces <img src="02a-data-viz_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] --- class: middle [Exploring Histograms Interactively](http://tinlizzie.org/histograms/) --- class: middle center [There is no "best" number of bins](https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width) --- class: middle ## Etymology __histo__ comes from the Greek word _histos_ that literally means "anything set up right". __gram__: comes from the Greek word _gramma_ which means "that which is drawn". .footnote[Online Etymology Dictionary] --- ## Histogram vs. Boxplot .pull-left[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-8-1.gif" style="display: block; margin: auto;" /> Tail tells the tale. ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-9-1.gif" style="display: block; margin: auto;" /> ] --- class: middle .pull-left[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] .footnote[Note: Violin plots display densities, not counts!] --- class: middle .pull-left[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] .footnote[Note: Violin plots display densities, not counts!] --- class: inverse middle center .font75[Visuals with Two Categorical Variables] --- class: middle ## Standardized Bar Plot <img src="02a-data-viz_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- class: middle ## Dodged Bar Plot <img src="02a-data-viz_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- class: middle inverse .font75[Visuals with a single numerical and single categorical variable.] --- ## Side-by-side box plots <img src="02a-data-viz_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- class: inverse middle .font75[Visuals with Two Numerical Variables] --- ## Scatter plots <img src="02a-data-viz_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> Length of gestation can **possibly** eXplain a baby's birth weight. Gestation is the eXplanatory variable and is shown on the x-axis. Birth weight is the response variable and is shown on the y-axis. --- class: middle __gg__plot is based on __g__rammar of __g__raphics. <img src="img/grammar_graphics.jpeg" style="display: block; margin: auto;" /> --- ## Data ```r glimpse(titanic) ``` ``` Rows: 891 Columns: 6 $ survived <lgl> FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TR~ $ pclass <chr> "Third", "First", "Third", "First", "Third", "Third", "First"~ $ sex <fct> sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, s~ $ age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, 55,~ $ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625, 21~ $ embarked <fct> Southampton, Cherbourg, Southampton, Southampton, Southampton~ ``` .footnote[The data frame has been cleaned for you.] --- class:inverse middle .font75[Visualizing a Single Categorical Variable] --- class: middle **3 Steps of Making a Basic ggplot** 1.Pick data 2.Map data onto aesthetics 3.Add the geometric layer --- class: middle ### Step 1 - Pick Data .pull-left[ ```r ggplot(data = titanic) ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> ] --- class: middle ### Step 2 - Map Data to Aesthetics .pull-left[ ```r ggplot(data = titanic, * aes(x = pclass)) ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] --- class: middle ### Step 3 - Add the Geometric Layer .pull-left[ ```r ggplot(data = titanic, aes(x = pclass)) + * geom_bar() ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ] --- class: middle .panelset[ .panel[ .panel-name[Plot] <img src="02a-data-viz_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Create a ggplot using the `titanic` data frame. - Map the `pclass` to the x-axis. - Add a layer of a bar plot. ] .panel[ .panel-name[R] ```r ggplot(data = titanic, aes(x = pclass)) + geom_bar() ``` ] ] --- class:inverse middle .font75[Visualizing a Single Numeric Variable] --- class: middle .panelset[ .panel[ .panel-name[Plot] ``` `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Create a ggplot using the `titanic` data frame. - Map the `fare` to the x-axis. - Add a layer of a histogram. ] .panel[ .panel-name[R] ```r ggplot(data = titanic, aes(x = fare)) + geom_histogram() ``` ] ] --- class: middle ### Step 1 - Pick Data .pull-left[ ```r ggplot(data = titanic) ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-31-1.png" style="display: block; margin: auto;" /> ] --- class: middle ### Step 2 - Map Data to Aesthetics .pull-left[ ```r ggplot(data = titanic, * aes(x = fare)) ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" /> ] --- class: middle ### Step 3 - Add the Geometric Layer .pull-left[ ```r ggplot(data = titanic, aes(x = fare)) + * geom_histogram() ``` ] .pull-right[ ``` `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" /> ] --- ## What is this warning? ``` `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-36-1.png" style="display: block; margin: auto;" /> --- ```r ggplot(data = titanic, aes(x = fare)) + * geom_histogram(binwidth = 15) ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" /> --- class: middle .panelset[ .panel[.panel-name[binwidth = 15] .left-panel[ ] <img src="02a-data-viz_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> ] .panel[.panel-name[binwidth = 50] <img src="02a-data-viz_files/figure-html/unnamed-chunk-39-1.png" style="display: block; margin: auto;" /> ] .panel[.panel-name[binwidth = 100] <img src="02a-data-viz_files/figure-html/unnamed-chunk-40-1.png" style="display: block; margin: auto;" /> ] ] --- class: middle center [There is no "best" number of bins](https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width) --- class: middle center .font150[ 🌈 ] Pick your favorite color(s) from the list at: [bit.ly/colors-r](https://bit.ly/colors-r) --- ```r ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, * color = "white") ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-41-1.png" style="display: block; margin: auto;" /> --- ```r ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, * fill = "darkred") ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-42-1.png" style="display: block; margin: auto;" /> --- ```r ggplot(data = titanic, aes(x = fare)) + geom_histogram(binwidth = 15, * color = "white", * fill = "darkred") ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-43-1.png" style="display: block; margin: auto;" /> --- class: inverse middle center .font75[Visualizing Two Categorical Variables] --- ## Stacked Bar-Plot .pull-left[ ```r ggplot(data = titanic, aes(x = pclass, * fill = survived)) + geom_bar() ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" /> ] --- ## Standardized Bar Plot .pull-left[ ```r ggplot(data = titanic, aes(x = pclass, fill = survived)) + * geom_bar(position = "fill") ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-47-1.png" style="display: block; margin: auto;" /> ] .footnote[Note that y-axis is no longer count but we will learn how to change that later.] --- ## Dodged Bar Plot .pull-left[ ```r ggplot(data = titanic, aes(x = pclass, fill = survived)) + * geom_bar(position = "dodge") ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-49-1.png" style="display: block; margin: auto;" /> ] .footnote[Note that y-axis is no longer count but we will change that later.] --- ## New Data <img src="img/penguins.png" width="667" style="display: block; margin: auto;" /> .footnote[Artwork by [@allison_horst](https://twitter.com/allison_horst) ] --- ## New Data ```r glimpse(penguins) ``` ``` Rows: 344 Columns: 8 $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel~ $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse~ $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, ~ $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, ~ $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186~ $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, ~ $ sex <fct> male, female, female, NA, female, male, female, male~ $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007~ ``` --- <img src="img/penguin_bill.png" width="1036" style="display: block; margin: auto;" /> .footnote[Artwork by [@allison_horst](https://twitter.com/allison_horst) ] --- class: middle inverse .font75[Visualizing a single numerical and single categorical variable.] --- class: middle .panelset[ .panel[ .panel-name[Plot] ``` Warning: Removed 2 rows containing non-finite values (stat_ydensity). ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-53-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Create a ggplot using the `penguins` data frame. - Map the `species` to the x-axis and `bill_length_mm` to the y-axis. - Add a layer of a violin plot. ] .panel[ .panel-name[R] ```r ggplot(penguins, aes(x = species, y = bill_length_mm)) + geom_violin() ``` ] ] --- class: middle .panelset[ .panel[ .panel-name[Plot] ``` Warning: Removed 2 rows containing non-finite values (stat_boxplot). ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-55-1.png" style="display: block; margin: auto;" /> ] .panel[ .panel-name[English] - Create a ggplot using the `penguins` data frame. - Map the `species` to the x-axis and `bill_length_mm` to the y-axis. - Add a layer of a box plot. ] .panel[ .panel-name[R] ```r ggplot(penguins, aes(x = species, y = bill_length_mm)) + geom_boxplot() ``` ] ] --- class: inverse middle .font75[Visualizing Two Numerical Variables] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm)) + geom_point() ``` ] .right-panel[ ``` Warning: Removed 2 rows containing missing values (geom_point). ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-58-1.png" style="display: block; margin: auto;" /> ] --- ## Linear Relationship .pull-left[ ```r ggplot(babies, aes(x = gestation, y = bwt)) + geom_point() + geom_smooth(method = "lm", se = FALSE) ``` ] .pull-right[ <img src="02a-data-viz_files/figure-html/unnamed-chunk-60-1.png" style="display: block; margin: auto;" /> ] --- class: middle inverse .font75[Considering More Than Two Variables] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, color = species)) + geom_point() ``` ] .right-panel[ ``` Warning: Removed 2 rows containing missing values (geom_point). ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-62-1.png" style="display: block; margin: auto;" /> ] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species)) + geom_point() ``` ] .right-panel[ ``` Warning: Removed 2 rows containing missing values (geom_point). ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-64-1.png" style="display: block; margin: auto;" /> ] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species)) + geom_point() ``` ] .right-panel[ ``` Warning: Removed 2 rows containing missing values (geom_point). ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-66-1.png" style="display: block; margin: auto;" /> ] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species)) + geom_point() ``` ] .right-panel[ ``` Warning: Removed 2 rows containing missing values (geom_point). ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-68-1.png" style="display: block; margin: auto;" /> ] --- .left-panel[ ```r ggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm, shape = species, color = species, size = body_mass_g)) + geom_point() ``` ] .right-panel[ ``` Warning: Removed 2 rows containing missing values (geom_point). ``` <img src="02a-data-viz_files/figure-html/unnamed-chunk-70-1.png" style="display: block; margin: auto;" /> ] --- <img src="img/ggplot-summary.jpeg" width="95%" style="display: block; margin: auto;" /> --- class: middle ## Practice Using either the `babies`, `titanic` or `penguins` data frame ask a question that you are interested in answering. Visualize data to get a visual answer to the question. What is the visual telling you? Note all of this down in your lecture notes.