This lesson covers some tips on managing your code, datasets, and other files with RStudio. The sections below give a tour of the IDE, and some of the customizations that you can do to increase your productivity.
I’ve included some comments in italics–these are just my personal observations, feel free to set up the IDE in a way that works best for you!
View the slides for this section here.
These can be found under Tools > Global Options…
Click on Tools > Global Options…, in the General section, you will see settings for Workspace
Un-check the option for “Restore .RData into workspace at startup”
.RData
We also don’t want the workspace to .RData
on exit, so we will set this to Never
How your General settings should look:
Under Tools > Global Options… click on Code
Under Tools > Global Options… click on Console
The console is where we can enter code directly, and where we’ll see output. We should consider the following settings:
Display? IMO, syntax highlighting makes sense pretty much everywhere
Debugging? this makes sense
Other? this is a personal preference
Under Tools > Global Options… click on Appearance
⌘
and press +
on macOSctrl
and press +
on Windowsrsthemes
packageUnder Tools > Global Options… click on Pane Layout
Now you see Source, Tutorial, and Console panes on a single screen!
Keep track of all your files with RStudio project files (.Rproj
).
Self contained Using R projects keeps track or your current working directory!
Project orientated .Rproj
files make bundling and shipping files and folders easier!
Avoid removing all the files
see tweets below and tidyverse article
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")
I will come into your office and SET YOUR COMPUTER ON FIRE 🔥
If the first line of your R script isrm(list = ls())
I will come into your office and SET YOUR COMPUTER ON FIRE 🔥
Click on ‘Project: (New)’ > ‘New Project’
Click on ‘Browse’ > ‘Create Project’
Click on ‘Project: (New)’ > ‘New Project’
Select project type
Create new folder name
Choose parent folder
Use Git (if installed)
Good folder and file organization saves time and headaches.
See the tabs below for some basic guidelines on folder structure and file naming.
separate raw and cleaned data
keep documents and code separate
keep figures separate
name files appropriately (preferably 2 digit prefix)
structure is reusable and easy to understand
Adapted from from ‘Good enough practices in scientific computing’
project-name/
|-- CITATION
|-- project-name.Rproj
|-- README.md
|-- LICENSE
|-- requirements.txt
|--data/
|--raw/
|--raw-birds-data.csv
|--processed/
|--processed-birds-data.csv
|--doc/
|-- notebook.Rmd
|-- manuscript.Rmd
|-- changelog.txt
|-- results/
|-- summarized-results.csv
|-- code/
|-- 01-sightings-import.R
|-- 02-sightings-wrangle.R
|-- 03-sightings-model.R
|-- runall.R
Basic rules to follow:
human readable -> (makes sense)
machine readable -> (regex)
sort/order well -> (ISO 8601 date)
These are handy names:
2020-10-12-270-301-central-lab-metrics.csv
2020-10-12-270-301-central-lab-metrics.csv
2020-10-12-270-301-central-lab-metrics.csv
So are these:
01.0-import_270-301_central-lab-metrics.R
02.0-wrangle_270-301_central-lab-metrics.R
03.0-eda_270-301_central-lab-metrics.R
04.0-model_270-301_central-lab-metrics.R
*Adapted from Jenny Byran’s slides
We can use regular expressions to find ‘270-301’ files!!
grepl(pattern = "270-301",
x = "2020-10-12-270-301-central-lab-metrics.csv")
#> [1] TRUE
*Adapted from Jenny Byran’s slides
Logical order and underscores _
also make it easier to sort files
# writeLines(files)
files
#> [1] "01.0-import_270-301_central-lab-metrics.R"
#> [2] "02.0-wrangle_270-301_central-lab-metrics.R"
#> [3] "03.0-eda_270-301_central-lab-metrics.R"
#> [4] "04.0-model_270-301_central-lab-metrics.R"
stringr::str_split_fixed(string = files, pattern = "_", 3)
#> [,1] [,2] [,3]
#> [1,] "01.0-import" "270-301" "central-lab-metrics.R"
#> [2,] "02.0-wrangle" "270-301" "central-lab-metrics.R"
#> [3,] "03.0-eda" "270-301" "central-lab-metrics.R"
#> [4,] "04.0-model" "270-301" "central-lab-metrics.R"
*Adapted from Jenny Byran’s slides
Use relative rather than absolute file paths
These are specific to a system
/project-name/data
-> absolute path in macOS
\\project-name\\data
-> absolute path in Windows
These are specific to a folder
project-name/data
-> relative path in macOS
project-name\\data
-> relative path in Windows
This section covers some packages to help manage files and folders.
here
packageThe here::set_here()
function solves a lot of file path problems (especially if you’re not using R projects)
here()
library(here)
#> here() starts at /Users/mjfrigaard/Documents/@BioMarin/r-meetup-tutorials
set_here(".")
This creates a .here
file (similar to .Rproj
files)
here::set_here(".")
#> Created file .here in /Users/mjfrigaard/Documents/@BioMarin/r-meetup-tutorials . Please start a new R session in the new project directory.
list.files(all.files = TRUE, pattern = "here")
#> [1] ".here"
fs
packageThe fs
package stands for file system and is great for locating and accessing files.
View a tree layout of your files with fs::dir_tree()
.
fs::dir_tree("../data")
#> ../data
#> ├── 2021-11-07-NFL-TweetsRaw.rds
#> ├── AppleMobRaw.csv
#> ├── DailyShowSample.csv
#> ├── EndorseSample.csv
#> ├── FandangoSample.csv
#> ├── FasterCures.csv
#> ├── Infected.csv
#> ├── LabData.csv
#> ├── LabProc.csv
#> ├── Netflix data
#> │ ├── netflix_directors.rds
#> │ ├── netflix_duration.rds
#> │ └── netflix_series.rds
#> ├── SmallLabData.csv
#> ├── TopPharmComp.csv
#> ├── UsadaBadDates.csv
#> ├── UsadaRaw.csv
#> ├── VisitNAData.csv
#> ├── airquality.csv
#> ├── data-cache
#> │ ├── 2020-11-24-TopPharmCompRaw.csv
#> │ ├── 2020-12-11-BioTechDrugStocks.csv
#> │ ├── 2020-12-11-BmrmGoogle.rds
#> │ ├── 2020-12-20-BioTechStocks.csv
#> │ ├── 2020-12-24-BioTechStocks.csv
#> │ ├── 2020-12-29-PricesWide.csv
#> │ ├── 2020-12-30-PricesWide.csv
#> │ ├── 2021-05-02-PricesWide.csv
#> │ ├── 2021-05-02-TidyApple.csv
#> │ ├── 2021-05-02-TopUSCities.csv
#> │ ├── 2021-05-02-USCities.csv
#> │ ├── 2021-08-31-TidyApple.csv
#> │ ├── 2021-08-31-TopUSCities.csv
#> │ ├── 2021-08-31-USCities.csv
#> │ ├── 2021-09-04-TidyApple.csv
#> │ ├── 2021-09-04-TopUSCities.csv
#> │ ├── 2021-09-04-USCities.csv
#> │ ├── 2021-09-19-TidyApple.csv
#> │ ├── 2021-09-19-TopUSCities.csv
#> │ └── 2021-09-19-USCities.csv
#> ├── imdb-movies.csv
#> ├── original-starwars.csv
#> ├── palmerpenguins.csv
#> ├── penguins.csv
#> ├── starwars.rds
#> ├── wk10-dont-mess-with-texas
#> │ ├── 2021-11-21-ExecutedOffenders.csv
#> │ └── processed
#> │ └── 2021-11-21
#> │ ├── 2021-11-21-ExExOffndrshtml.csv
#> │ ├── 2021-11-21-ExExOffndrsjpg.csv
#> │ └── ExOffndrsComplete.csv
#> ├── wk11-01_intro-to-maps
#> │ └── raw
#> │ └── 2021-11-07-NFL-TweetsRaw.csv
#> ├── wk5-01-intro-to-ggp2-part-02
#> │ └── TidyApple.csv
#> ├── wk6-02_scrape-wikipedia-data
#> │ ├── 2020-11-24-TopPharmCompRaw.csv
#> │ ├── 2021-11-20-TopPharmComp.csv
#> │ └── 2021-11-20-TopPharmCompRaw.csv
#> └── wk7-01_intro-to-ggp2-part-03
#> └── 2021-11-20-TopUSCities.csv
Get the complete path to files using fs::dir_ls()
.
fs::dir_ls("../data")
#> ../data/2021-11-07-NFL-TweetsRaw.rds ../data/AppleMobRaw.csv
#> ../data/DailyShowSample.csv ../data/EndorseSample.csv
#> ../data/FandangoSample.csv ../data/FasterCures.csv
#> ../data/Infected.csv ../data/LabData.csv
#> ../data/LabProc.csv ../data/Netflix data
#> ../data/SmallLabData.csv ../data/TopPharmComp.csv
#> ../data/UsadaBadDates.csv ../data/UsadaRaw.csv
#> ../data/VisitNAData.csv ../data/airquality.csv
#> ../data/data-cache ../data/imdb-movies.csv
#> ../data/original-starwars.csv ../data/palmerpenguins.csv
#> ../data/penguins.csv ../data/starwars.rds
#> ../data/wk10-dont-mess-with-texas ../data/wk11-01_intro-to-maps
#> ../data/wk5-01-intro-to-ggp2-part-02 ../data/wk6-02_scrape-wikipedia-data
#> ../data/wk7-01_intro-to-ggp2-part-03
The reprex
package was designed to help you create a reproducible example
To copy + paste actual data in a reproducible example, try datapasta
!
RStudio also comes with access to a Terminal and Help pane.
Learn a handful of command-line tools to make life easier
Know how to use cd
, pwd
, mkdir
, rm
, ls
, etc.
RStudio comes with a Terminal pane for quick access to the command-line
R comes with a ton of accessible help files
?read.csv