Quiz
Naming files
Three principles of naming files
(Jenny Bryan)
for the purposes of this class an additional principle is that file names follow
README.md
README file is the first file users read. In our case a user might be our future self, a teammate, or (if open source) anyone.
There can be multiple README files within a single directory: e.g. for the general project folder and then for a data subfolder. Data folder README's can possibly contain codebook (data dictionary).
README file is the first file users read. In our case a user might be our future self, a teammate, or (if open source) anyone.
There can be multiple README files within a single directory: e.g. for the general project folder and then for a data subfolder. Data folder README's can possibly contain codebook (data dictionary).
It should be brief but detailed enough to help user navigate.
README file is the first file users read. In our case a user might be our future self, a teammate, or (if open source) anyone.
There can be multiple README files within a single directory: e.g. for the general project folder and then for a data subfolder. Data folder README's can possibly contain codebook (data dictionary).
It should be brief but detailed enough to help user navigate.
a README should be up-to-date (e.g. from proposal to presentation stage of final projects they need to be updated).
README file is the first file users read. In our case a user might be our future self, a teammate, or (if open source) anyone.
There can be multiple README files within a single directory: e.g. for the general project folder and then for a data subfolder. Data folder README's can possibly contain codebook (data dictionary).
It should be brief but detailed enough to help user navigate.
a README should be up-to-date (e.g. from proposal to presentation stage of final projects they need to be updated).
On GitHub we use markdown for README file (README.md
). Good news: emojis are supported.
A .gitignore
file contains the list of files which Git has been explicitly told to ignore.
A .gitignore
file contains the list of files which Git has been explicitly told to ignore.
For instance README.html
can be git ignored.
A .gitignore
file contains the list of files which Git has been explicitly told to ignore.
For instance README.html
can be git ignored.
You may consider git ignoring confidential files (e.g. some datasets) so that they would not be pushed by mistake to GitHub.
A .gitignore
file contains the list of files which Git has been explicitly told to ignore.
For instance README.html
can be git ignored.
You may consider git ignoring confidential files (e.g. some datasets) so that they would not be pushed by mistake to GitHub.
A file can be git ignored either by point-and-click using RStudio's Git pane or by adding the file path to the .gitignore
file. For instance weather.csv
data file in a data
folder need to be added as data/weather.csv
A .gitignore
file contains the list of files which Git has been explicitly told to ignore.
For instance README.html
can be git ignored.
You may consider git ignoring confidential files (e.g. some datasets) so that they would not be pushed by mistake to GitHub.
A file can be git ignored either by point-and-click using RStudio's Git pane or by adding the file path to the .gitignore
file. For instance weather.csv
data file in a data
folder need to be added as data/weather.csv
Files with certain files (e.g. all .log
files) can also be ignored. See git ignore patterns.
Importing data
readr::read_csv("dataset.csv")
readxl::read_excel("dataset.xlsx")
readxl::read_excel("dataset.xlsx", sheet = 2)
library(haven)# SASread_sas("dataset.sas7bdat")# SPSSread_sav("dataset.sav")# Stataread_dta("dataset.dta")
Importing data will depend on where the dataset is on your computer. However we use the help of here::here()
function.
This function sets the working directory to the project folder (i.e. where the .Rproj
file is).
read_csv(here::here("data/dataset.csv"))
Collaborating on GitHub
If each change is made by one collaborator at a time, this would not be an efficient workflow.
1 - commit
2 - pull (very important)
3 - push
We can create an issue to keep a list of mistakes to be fixed, ideas to check with teammates, or note a to-do task. You can assign tasks to yourself or teammates.
If you are working on an issue, it makes sense to refer to issue number in your commit message (e.g. "add first draft of alternate texts for #4"). If your commit resolves the issue then you can use key words such as "fixes #4" or "closes #4" to close the issue. Issues can also be manually closed.
It is also a good practice to save session information as package versions change, in order to be able to reproduce results from an analysis we need to know under what technical conditions the analysis was conducted.
sessionInfo()
R version 4.1.0 (2021-05-18)Platform: x86_64-apple-darwin17.0 (64-bit)Running under: macOS Big Sur 10.16Matrix products: defaultBLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylibLAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dyliblocale:[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8attached base packages:[1] stats graphics grDevices utils datasets methods base other attached packages:[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 [5] readr_2.0.2 tidyr_1.1.4 tibble_3.1.5 ggplot2_3.3.5 [9] tidyverse_1.3.1loaded via a namespace (and not attached): [1] tidyselect_1.1.1 xfun_0.26 bslib_0.3.1 haven_2.4.3 [5] colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0 htmltools_0.5.2 [9] yaml_2.2.1 utf8_1.2.2 rlang_0.4.11 jquerylib_0.1.4 [13] pillar_1.6.3 withr_2.4.2 glue_1.4.2 DBI_1.1.1 [17] dbplyr_2.1.1 modelr_0.1.8 readxl_1.3.1 lifecycle_1.0.1 [21] cellranger_1.1.0 munsell_0.5.0 gtable_0.3.0 rvest_1.0.1 [25] evaluate_0.14 knitr_1.36 tzdb_0.1.2 fastmap_1.1.0 [29] fansi_0.5.0 highr_0.9 broom_0.7.9 Rcpp_1.0.7 [33] scales_1.1.1 backports_1.2.1 jsonlite_1.7.2 fs_1.5.0 [37] hms_1.1.1 digest_0.6.28 stringi_1.7.5 xaringan_0.22.1 [41] grid_4.1.0 cli_3.0.1 tools_4.1.0 magrittr_2.0.1 [45] sass_0.4.0 crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.2 [49] xml2_1.3.2 reprex_2.0.1 lubridate_1.8.0 rstudioapi_0.13 [53] assertthat_0.2.1 rmarkdown_2.11 httr_1.4.2 R6_2.5.1 [57] compiler_4.1.0
A better way to keep track of package versions, system settings during compiling a project is by using renv::snapshot()
. This function will create a renv.lock
and will take a snapshot of packages to be stored in this file.
Even a better approach for reproducible versions would be using Docker.
Quiz
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |