+ - 0:00:00
Notes for current slide
Notes for next slide



Visaluzing Data:
Part I

Dr. Mine Dogucu

1 / 67

Review

2 / 67

Data Visualizations

  • are graphical representations of data
4 / 67

Data Visualizations

  • are graphical representations of data

  • use different colors, shapes, and the coordinate system to summarize data

5 / 67

Data Visualizations

  • are graphical representations of data

  • use different colors, shapes, and the coordinate system to summarize data

  • tell a story

6 / 67

Data Visualizations

  • are graphical representations of data

  • use different colors, shapes, and the coordinate system to summarize data

  • tell a story

  • are useful for exploring data

7 / 67

Visuals with a Single Categorical Variable

8 / 67

Bar plot

9 / 67

Visuals with a Single Numeric Variable

10 / 67

Box plot

  • The horizontal line inside the box represents the median.
  • The box itself represents the middle 50% of the data with Q3 on the upper end and Q1 on the lower end.
  • Whiskers extend from the box. They can extend up to 1.5 IQR away from the box (i.e. away from Q1 and Q3).
  • The points are potential outliers that represent babies with really low or high birth weight.
11 / 67

Histogram

Bin width = 5 ounces

Bin width = 20 ounces

12 / 67

Etymology

histo comes from the Greek word histos that literally means "anything set up right".

gram: comes from the Greek word gramma which means "that which is drawn".

Online Etymology Dictionary

15 / 67

Histogram vs. Boxplot

Tail tells the tale.

16 / 67

Note: Violin plots display densities, not counts!

17 / 67

Note: Violin plots display densities, not counts!

18 / 67

Visuals with Two Categorical Variables

19 / 67

Standardized Bar Plot

20 / 67

Dodged Bar Plot

21 / 67

Visuals with a single numerical and single categorical variable.

22 / 67

Side-by-side box plots

23 / 67

Visuals with Two Numerical Variables

24 / 67

Scatter plots

Length of gestation can possibly eXplain a baby's birth weight. Gestation is the eXplanatory variable and is shown on the x-axis. Birth weight is the response variable and is shown on the y-axis.

25 / 67

ggplot is based on grammar of graphics.

26 / 67

Data

glimpse(titanic)
Rows: 891
Columns: 6
$ survived <lgl> FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TR~
$ pclass <chr> "Third", "First", "Third", "First", "Third", "Third", "First"~
$ sex <fct> sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, sex, s~
$ age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, 55,~
$ fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625, 21~
$ embarked <fct> Southampton, Cherbourg, Southampton, Southampton, Southampton~

The data frame has been cleaned for you.

27 / 67

Visualizing a Single Categorical Variable

28 / 67

3 Steps of Making a Basic ggplot

1.Pick data

2.Map data onto aesthetics

3.Add the geometric layer

29 / 67

Step 1 - Pick Data

ggplot(data = titanic)

30 / 67

Step 2 - Map Data to Aesthetics

ggplot(data = titanic,
aes(x = pclass))

31 / 67

Step 3 - Add the Geometric Layer

ggplot(data = titanic,
aes(x = pclass)) +
geom_bar()

32 / 67

  • Create a ggplot using the titanic data frame.
  • Map the pclass to the x-axis.
  • Add a layer of a bar plot.
ggplot(data = titanic,
aes(x = pclass)) +
geom_bar()
33 / 67

Visualizing a Single Numeric Variable

34 / 67
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  • Create a ggplot using the titanic data frame.
  • Map the fare to the x-axis.
  • Add a layer of a histogram.
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram()
35 / 67

Step 1 - Pick Data

ggplot(data = titanic)

36 / 67

Step 2 - Map Data to Aesthetics

ggplot(data = titanic,
aes(x = fare))

37 / 67

Step 3 - Add the Geometric Layer

ggplot(data = titanic,
aes(x = fare)) +
geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

38 / 67

What is this warning?

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

39 / 67
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15)

40 / 67

🌈

Pick your favorite color(s) from the list at:

bit.ly/colors-r

43 / 67
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
color = "white")

44 / 67
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
fill = "darkred")

45 / 67
ggplot(data = titanic,
aes(x = fare)) +
geom_histogram(binwidth = 15,
color = "white",
fill = "darkred")

46 / 67

Visualizing Two Categorical Variables

47 / 67

Stacked Bar-Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar()

48 / 67

Standardized Bar Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar(position = "fill")

Note that y-axis is no longer count but we will learn how to change that later.

49 / 67

Dodged Bar Plot

ggplot(data = titanic,
aes(x = pclass,
fill = survived)) +
geom_bar(position = "dodge")

Note that y-axis is no longer count but we will change that later.

50 / 67

New Data

Artwork by @allison_horst

51 / 67

New Data

glimpse(penguins)
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel~
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse~
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, ~
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, ~
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186~
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, ~
$ sex <fct> male, female, female, NA, female, male, female, male~
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007~
52 / 67

Artwork by @allison_horst

53 / 67

Visualizing a single numerical and single categorical variable.

54 / 67
Warning: Removed 2 rows containing non-finite values (stat_ydensity).

  • Create a ggplot using the penguins data frame.
  • Map the species to the x-axis and bill_length_mm to the y-axis.
  • Add a layer of a violin plot.
ggplot(penguins,
aes(x = species,
y = bill_length_mm)) +
geom_violin()
55 / 67
Warning: Removed 2 rows containing non-finite values (stat_boxplot).

  • Create a ggplot using the penguins data frame.
  • Map the species to the x-axis and bill_length_mm to the y-axis.
  • Add a layer of a box plot.
ggplot(penguins,
aes(x = species,
y = bill_length_mm)) +
geom_boxplot()
56 / 67

Visualizing Two Numerical Variables

57 / 67
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm)) +
geom_point()
Warning: Removed 2 rows containing missing values (geom_point).

58 / 67

Linear Relationship

ggplot(babies,
aes(x = gestation,
y = bwt)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)

59 / 67

Considering More Than Two Variables

60 / 67
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
color = species)) +
geom_point()
Warning: Removed 2 rows containing missing values (geom_point).

61 / 67
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species)) +
geom_point()
Warning: Removed 2 rows containing missing values (geom_point).

62 / 67
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species)) +
geom_point()
Warning: Removed 2 rows containing missing values (geom_point).

63 / 67
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species)) +
geom_point()
Warning: Removed 2 rows containing missing values (geom_point).

64 / 67
ggplot(penguins,
aes(x = bill_depth_mm,
y = bill_length_mm,
shape = species,
color = species,
size = body_mass_g)) +
geom_point()
Warning: Removed 2 rows containing missing values (geom_point).

65 / 67

66 / 67

Practice

Using either the babies, titanic or penguins data frame ask a question that you are interested in answering. Visualize data to get a visual answer to the question. What is the visual telling you? Note all of this down in your lecture notes.

67 / 67

Review

2 / 67
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow