4 R basics and workflows
4.1 Basics of working with R at the command line and RStudio goodies
Launch RStudio/R.
You will first intall R and then. RStudio.
4.2 In Rstudio - where we will live
Notice the default panes:
- Console (entire left)
- Environment/History (tabbed in upper right)
- Files/Plots/Packages/Help (tabbed in lower right)
4.3
FYI: you can change the default location of the panes, among many other things: Customizing RStudio.
Go into the Console, where we interact with the live R process.
You can make an object by assigning a value or statement to a letter or string. We use <-
to assign objects meaning. Create and inspect the following object:
Your first analysis in R:
x <- 3 * 4
x
#> [1] 12
All R statements where you create objects – “assignments” – have this form:
objectName <- value
and in my head I hear, e.g., “x gets 12”.
You will make lots of assignments and the operator <-
is a pain to type. Don’t be lazy and use =
, although it would work, because it will just sow confusion later. Instead, utilize RStudio’s keyboard shortcut: Alt + - (the minus sign).
I want to be your friend. As a friend, I implore you, learn this:
In RStudio insert the <- assignment operator with Option + - (the minus sign)on a Mac, or Alt + - (the minus sign) on Windows.
Notice that RStudio automatically surrounds <-
with spaces, which demonstrates a useful code formatting practice. Code is miserable to read on a good day. Give your eyes a break and use spaces.
RStudio offers many handy RStudio Quick keys. Also, Alt+Shift+K brings up a keyboard shortcut reference card.
Object names cannot start with a digit and cannot contain certain other characters such as a comma or a space. You will be wise to adopt a convention for demarcating words in names, but note that best practice is to choose ONE convention and stay true to it throughout your code.
i_use_snake_case
other.people.use.periods
evenOthersUseCamelCase
Make another assignment:
this_is_a_really_long_name <- 2.5
To inspect this, try out RStudio’s completion facility: type the first few characters, press TAB, add characters until you disambiguate, then press return.
Make another assignment:
turner_rocks <- 2 ^ 3
When making assignments, it is best practice to keep the names brief, yet descriptive. For instance, while the name “this_is_a_really_long_name” is accurate, so is “long_name” and this is much more intuitive and easy to read/type over and over.
Let’s try to inspect:
turnerrocks
#> Error in eval(expr, envir, enclos): object 'turnerrocks' not found
Turner_rocks
#> Error in eval(expr, envir, enclos): object 'Turner_rocks' not found
turner_rocks
#> [1] 8
Implicit contract with the computer / scripting language: Computer will do tedious computation for you. In return, you will be completely precise in your instructions. Typos matter. Case matters. Get better at typing.
R has a mind-blowing collection of built-in functions that are accessed like so:
functionName(arg1 = val1, arg2 = val2, and so on)
Let’s try using seq()
which makes regular sequences of numbers and, while we’re at it, demo more helpful features of RStudio.
Type se
and hit TAB. A pop up shows you possible completions. Specify seq()
by typing more to disambiguate or using the up/down arrows to select. Notice the floating tool-tip-type help that pops up, reminding you of a function’s arguments. If you want even more help, press F1 as directed to get the full documentation in the help tab of the lower right pane. Now open the parentheses and notice the automatic addition of the closing parenthesis and the placement of cursor in the middle. Type the arguments 1, 10
and hit return. RStudio also exits the parenthetical expression for you. IDEs are great.
seq(1, 10)
#> [1] 1 2 3 4 5 6 7 8 9 10
The above also demonstrates something about how R resolves function arguments. You can always specify in name = value
form. But if you do not, R attempts to resolve by position. So above, it is assumed that we want a sequence from = 1
that goes to = 10
. Since we didn’t specify step size, the default value of by
in the function definition is used, which ends up being 1 in this case. For functions I call often, I might use this resolve by position for the first
argument or maybe the first two. After that, I always use name = value
.
Make this assignment and notice similar help with quotation marks.
yo <- "hello world"
If you just make an assignment, you don’t get to see the value, so then you’re tempted to immediately inspect.
y <- seq(1, 10)
y
#> [1] 1 2 3 4 5 6 7 8 9 10
This common action can be shortened by surrounding the assignment with parentheses, which causes assignment and “print to screen” to happen.
It is best practice to always attempt to “print” your assignments after creating them. This will help leviate the issue of searching 200+ lines of code for that one error causing argument.
(y <- seq(1, 10))
#> [1] 1 2 3 4 5 6 7 8 9 10
Not all functions have (or require) arguments:
date()
#> [1] "Thu Feb 23 08:02:51 2023"
Now look at your workspace – in the upper right pane. The workspace is where user-defined objects accumulate. You can also get a listing of these objects with commands:
objects()
#> [1] "this_is_a_really_long_name"
#> [2] "turner_rocks"
#> [3] "x"
#> [4] "y"
#> [5] "yo"
ls()
#> [1] "this_is_a_really_long_name"
#> [2] "turner_rocks"
#> [3] "x"
#> [4] "y"
#> [5] "yo"
If you want to remove the object named y
, you can do this:
rm(y)
To remove everything:
or click the broom in RStudio’s Environment pane.
4.4 Workspace and working directory
One day you will need to quit R, go do something else and return to your analysis later (this is a very happy day).
One day you will have multiple analyses going that use R and you want to keep them separate (a not so happy day).
One day you will need to bring data from the outside world into R and send numerical results and figures from R back out into the world (the happiest of days).
To handle these real life situations, you need to make two decisions:
- What about your analysis is “real”, i.e. will you save it as your lasting record of what happened?
- Where does your analysis “live”?
4.4.1 Workspace, .RData
As a beginning R user, it’s OK to consider your workspace “real”. Very soon, I urge you to evolve to the next level, where you consider your saved R scripts as “real”. (In either case, of course the input data is very much real and requires preservation!) With the input data and the R code you used, you can reproduce everything. You can make your analysis fancier. You can get to the bottom of puzzling results and discover and fix bugs in your code. You can reuse the code to conduct similar analyses in new projects. You can remake a figure with different aspect ratio or save is as TIFF instead of PDF. You are ready to take questions. You are ready for the future.
If you regard your workspace as “real” (saving and reloading all the time), if you need to redo analysis … you’re going to either redo a lot of typing (making mistakes all the way) or will have to mine your R history for the commands you used. Rather than [becoming an expert on managing the R history][rstudio-command-history], a better use of your time and psychic energy is to keep your “good” R code in a script for future reuse.
Because it can be useful sometimes, note the commands you’ve recently run appear in the History pane.
But you don’t have to choose right now and the two strategies are not incompatible. Let’s demo the save / reload the workspace approach.
Upon quitting R, you have to decide if you want to save your workspace, for potential restoration the next time you launch R. Depending on your set up, R or your IDE, e.g. RStudio, will probably prompt you to make this decision.
Quit R/RStudio, either from the menu, using a keyboard shortcut, or by typing q()
in the Console. You’ll get a prompt like this:
Save workspace image to ~/.Rdata?
Note where the workspace image is to be saved and then click “Save”.
Using your favorite method, visit the directory where image was saved and verify there is a file named .RData
. You will also see a file .Rhistory
, holding the commands submitted in your recent session.
Restart RStudio. In the Console you will see a line like this:
[Workspace loaded from ~/.RData]
indicating that your workspace has been restored. Look in the Workspace pane and you’ll see the same objects as before. In the History tab of the same pane, you should also see your command history. You’re back in business. This way of starting and stopping analytical work will not serve you well for long but it’s a start.
4.4.2 Working directory
Any process running on your computer has a notion of its “working directory”. In R, this is where R will look, by default, for files you ask it to load. It also where, by default, any files you write to disk will go. Chances are your current working directory is the directory we inspected above, i.e. the one where RStudio wanted to save the workspace.
You can explicitly check your working directory with:
getwd()
It is also displayed at the top of the RStudio console.
As a beginning R user, it’s OK let your home directory or any other weird directory on your computer be R’s working directory. Very soon, I urge you to evolve to the next level, where you organize your analytical projects into directories and, when working on project A, set R’s working directory to the associated directory.
Although I do not recommend it, in case you’re curious, you can set R’s working directory at the command line like so:
setwd("~/myCoolProject")
Although I do not recommend it, you can also use RStudio’s Files pane to navigate to a directory and then set it as working directory from the menu: Session > Set Working Directory > To Files Pane Location. (You’ll see even more options there). Or within the Files pane, choose “More” and “Set As Working Directory”.
But there’s a better way. A way that also puts you on the path to managing your R work like an expert.
4.5 Exercises
Create an object called “cool_object” and assign it the number 100.
Create a new object called “big_brain” and multiply the object from question one by 15.
Print both objects.
Use base R functions to return today’s date and print it.
Create a sequence of numbers counting from 10 to 100 by 2.
Identify your working directory. What is it? Change it to where you want it.
Save the R script that answers questions 1 through 5 above. Save it; clean and close Rstudio; reopen your script and run it. Make sense?