Quiz
Observations
Summaries of observations
Aggregating Categorical Data
Categorical data are summarized with counts or proportions
lapd %>% count(employment_type)
# A tibble: 3 × 2 employment_type n <fct> <int>1 Full Time 146642 Part Time 1323 Per Event 28
lapd %>% count(employment_type) %>% mutate(prop = n/sum(n))
# A tibble: 3 × 3 employment_type n prop <fct> <int> <dbl>1 Full Time 14664 0.989 2 Part Time 132 0.008903 Per Event 28 0.00189
summarize(lapd, mean_base_pay = mean(base_pay))
# A tibble: 1 × 1 mean_base_pay <dbl>1 85149.
Recall that we use multiple functions such as mean()
and median
within the summarize function
summarize(lapd, mean_base_pay = mean(base_pay), med_base_pay = median(base_pay))
# A tibble: 1 × 2 mean_base_pay med_base_pay <dbl> <dbl>1 85149. 97601.
Aggregating Data by Groups
group_by()
group_by()
separates the data frame by the groups. Any action following group_by()
will be completed for each group separately.
Q. What is the median salary for each employment type?
lapd %>% group_by(employment_type)
# A tibble: 14,824 × 4# Groups: employment_type [3] job_class_title employment_type base_pay base_pay_level <fct> <fct> <dbl> <chr> 1 Police Detective II Full Time 119322. Greater than Median2 Police Sergeant I Full Time 113271. Greater than Median3 Police Lieutenant II Full Time 148116 Greater than Median4 Police Service Representative II Full Time 78677. Greater than Median5 Police Officer III Full Time 109374. Greater than Median6 Police Officer II Full Time 95002. Greater than Median# … with 14,818 more rows
lapd %>% group_by(employment_type) %>% summarize(med_base_pay = median(base_pay))
# A tibble: 3 × 2 employment_type med_base_pay <fct> <dbl>1 Full Time 97996.2 Part Time 14474.3 Per Event 4275
We can also remind ourselves how many staff members there were in each group.
lapd %>% group_by(employment_type) %>% summarize(med_base_pay = median(base_pay), count = n())
# A tibble: 3 × 3 employment_type med_base_pay count <fct> <dbl> <int>1 Full Time 97996. 146642 Part Time 14474. 1323 Per Event 4275 28
Note that n()
does not take any arguments.
Image from R for Data Science book licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License
Image from R for Data Science book licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License
Image from R for Data Science book licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License
Image from R for Data Science book licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License
artists
# A tibble: 4 × 2 name followers <chr> <dbl>1 Beyoncé 247579582 Taylor Swift 330981163 Ariana Grande 518071314 Drake 50252529
songs
# A tibble: 5 × 4 name song_name album_name song_popularity <chr> <chr> <chr> <dbl>1 Beyoncé Savage Remix (feat. Beyoncé) Savage Rem… 832 Taylor Swift cardigan folklore 853 Drake Laugh Now Cry Later (feat. Lil Durk) Laugh Now … 954 Beyoncé Halo I AM…SASHA… NA5 Ariana Grande Stuck with U (with Justin Bieber) Stuck with… NA
albums
# A tibble: 4 × 2 album_name album_release_date <chr> <date> 1 Savage Remix (feat. Beyoncé) 2020-04-29 2 I AM…SASHA FIERCE 2008-11-14 3 Stuck with U 2020-05-08 4 Laugh Now Cry Later (feat. Lil Durk) 2020-08-14
something_join(x, y)
All the join functions in dplyr package are in this fashion where x represents the first data frame and y represents the second data frame.
songs
# A tibble: 5 × 4 name song_name album_name song_popularity <chr> <chr> <chr> <dbl>1 Beyoncé Savage Remix (feat. Beyoncé) Savage Rem… 832 Taylor Swift cardigan folklore 853 Drake Laugh Now Cry Later (feat. Lil Durk) Laugh Now … 954 Beyoncé Halo I AM…SASHA… NA5 Ariana Grande Stuck with U (with Justin Bieber) Stuck with… NA
albums
# A tibble: 4 × 2 album_name album_release_date <chr> <date> 1 Savage Remix (feat. Beyoncé) 2020-04-29 2 I AM…SASHA FIERCE 2008-11-14 3 Stuck with U 2020-05-08 4 Laugh Now Cry Later (feat. Lil Durk) 2020-08-14
left_join(songs, albums, by = "album_name")
# A tibble: 5 × 5 name song_name album_name song_popularity album_release_d… <chr> <chr> <chr> <dbl> <date> 1 Beyoncé Savage Remix (… Savage Remix (… 83 2020-04-29 2 Taylor Swift cardigan folklore 85 NA 3 Drake Laugh Now Cry … Laugh Now Cry … 95 2020-08-14 4 Beyoncé Halo I AM…SASHA FIE… NA 2008-11-14 5 Ariana Grande Stuck with U (… Stuck with U NA 2020-05-08
left_join()
includes all rows from x
songs
# A tibble: 5 × 4 name song_name album_name song_popularity <chr> <chr> <chr> <dbl>1 Beyoncé Savage Remix (feat. Beyoncé) Savage Rem… 832 Taylor Swift cardigan folklore 853 Drake Laugh Now Cry Later (feat. Lil Durk) Laugh Now … 954 Beyoncé Halo I AM…SASHA… NA5 Ariana Grande Stuck with U (with Justin Bieber) Stuck with… NA
albums
# A tibble: 4 × 2 album_name album_release_date <chr> <date> 1 Savage Remix (feat. Beyoncé) 2020-04-29 2 I AM…SASHA FIERCE 2008-11-14 3 Stuck with U 2020-05-08 4 Laugh Now Cry Later (feat. Lil Durk) 2020-08-14
right_join(songs, albums, by = "album_name")
# A tibble: 4 × 5 name song_name album_name song_popularity album_release_d… <chr> <chr> <chr> <dbl> <date> 1 Beyoncé Savage Remix (… Savage Remix (… 83 2020-04-29 2 Drake Laugh Now Cry … Laugh Now Cry … 95 2020-08-14 3 Beyoncé Halo I AM…SASHA FIE… NA 2008-11-14 4 Ariana Grande Stuck with U (… Stuck with U NA 2020-05-08
right_join()
includes all rows from y
songs
# A tibble: 5 × 4 name song_name album_name song_popularity <chr> <chr> <chr> <dbl>1 Beyoncé Savage Remix (feat. Beyoncé) Savage Rem… 832 Taylor Swift cardigan folklore 853 Drake Laugh Now Cry Later (feat. Lil Durk) Laugh Now … 954 Beyoncé Halo I AM…SASHA… NA5 Ariana Grande Stuck with U (with Justin Bieber) Stuck with… NA
albums
# A tibble: 4 × 2 album_name album_release_date <chr> <date> 1 Savage Remix (feat. Beyoncé) 2020-04-29 2 I AM…SASHA FIERCE 2008-11-14 3 Stuck with U 2020-05-08 4 Laugh Now Cry Later (feat. Lil Durk) 2020-08-14
inner_join(songs, albums, by = "album_name")
# A tibble: 4 × 5 name song_name album_name song_popularity album_release_d… <chr> <chr> <chr> <dbl> <date> 1 Beyoncé Savage Remix (… Savage Remix (… 83 2020-04-29 2 Drake Laugh Now Cry … Laugh Now Cry … 95 2020-08-14 3 Beyoncé Halo I AM…SASHA FIE… NA 2008-11-14 4 Ariana Grande Stuck with U (… Stuck with U NA 2020-05-08
inner_join()
includes all rows that are in x and y
songs
# A tibble: 5 × 4 name song_name album_name song_popularity <chr> <chr> <chr> <dbl>1 Beyoncé Savage Remix (feat. Beyoncé) Savage Rem… 832 Taylor Swift cardigan folklore 853 Drake Laugh Now Cry Later (feat. Lil Durk) Laugh Now … 954 Beyoncé Halo I AM…SASHA… NA5 Ariana Grande Stuck with U (with Justin Bieber) Stuck with… NA
albums
# A tibble: 4 × 2 album_name album_release_date <chr> <date> 1 Savage Remix (feat. Beyoncé) 2020-04-29 2 I AM…SASHA FIERCE 2008-11-14 3 Stuck with U 2020-05-08 4 Laugh Now Cry Later (feat. Lil Durk) 2020-08-14
full_join(songs, albums, by = "album_name")
# A tibble: 5 × 5 name song_name album_name song_popularity album_release_d… <chr> <chr> <chr> <dbl> <date> 1 Beyoncé Savage Remix (… Savage Remix (… 83 2020-04-29 2 Taylor Swift cardigan folklore 85 NA 3 Drake Laugh Now Cry … Laugh Now Cry … 95 2020-08-14 4 Beyoncé Halo I AM…SASHA FIE… NA 2008-11-14 5 Ariana Grande Stuck with U (… Stuck with U NA 2020-05-08
full_join()
includes all rows that are in x or y
artists
# A tibble: 4 × 2 name followers <chr> <dbl>1 Beyoncé 247579582 Taylor Swift 330981163 Ariana Grande 518071314 Drake 50252529
songs
# A tibble: 5 × 4 name song_name album_name song_popularity <chr> <chr> <chr> <dbl>1 Beyoncé Savage Remix (feat. Beyoncé) Savage Rem… 832 Taylor Swift cardigan folklore 853 Drake Laugh Now Cry Later (feat. Lil Durk) Laugh Now … 954 Beyoncé Halo I AM…SASHA… NA5 Ariana Grande Stuck with U (with Justin Bieber) Stuck with… NA
albums
# A tibble: 4 × 2 album_name album_release_date <chr> <date> 1 Savage Remix (feat. Beyoncé) 2020-04-29 2 I AM…SASHA FIERCE 2008-11-14 3 Stuck with U 2020-05-08 4 Laugh Now Cry Later (feat. Lil Durk) 2020-08-14
full_join(artists, songs, by = "name") %>% full_join(albums, by = "album_name")
# A tibble: 5 × 6 name followers song_name album_name song_popularity album_release_d… <chr> <dbl> <chr> <chr> <dbl> <date> 1 Beyoncé 24757958 Savage Re… Savage Re… 83 2020-04-29 2 Beyoncé 24757958 Halo I AM…SASH… NA 2008-11-14 3 Taylor Swift 33098116 cardigan folklore 85 NA 4 Ariana Grande 51807131 Stuck wit… Stuck wit… NA 2020-05-08 5 Drake 50252529 Laugh Now… Laugh Now… 95 2020-08-14
Complete the questions provided to you in the lecture notes.
Quiz
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |