read.csv("data.csv")Lesson 2: Set Working Directory and Load Data
In this lecture, we move from running self-contained examples to working with real data files. To do that reliably, you need to understand where R looks for files and how to tell it where your class materials live on your computer.
The key ideas in this lecture are:
Files live in folders on your computer
R must be told which folder to look in
Once the working directory is set, loading data is straightforward and reproducible
What Is a Working Directory?
The working directory is the folder on your computer where R looks for files by default.
When you ask R to load a file like this:
R interprets that as:
“Look for a file name
data.csvinside the current working directory.”
If R is pointed at the wrong folder, it will not find your file—even if the file exists somewhere else on your computer.
Finding Your Class Folder
Before setting the working directory, you need to know where your class folder is located. These are directions for R to find the folder.
You should create a folder called something like Regression or ECON 355. In this folder, you should keep the data and scripts for this class. Keeping everything in one folder helps you stay organized and prevents losing your work. (You class folder should not be your downloads folder.)
Finding Folder on a Mac
- Open Finder
- Navigate to your class folder
- Right-click (or Control-click) the folder
- Hold down the Option key
- Click Copy “” as Pathname
This copies the full file path to your clipboard.
A Mac file path looks like this:
/Users/yourname/Documents/ECON355Finding Folder on Windows
- Open File Explorer
- Navigate to your class folder
- Click once in the address bar at the top
- The folder path will appear
- Copy the path
A Windows file path looks like this:
C:/Users/YourName/Documents/ECON355When you copy the file path, it will have backwards slashes (\), which you need to change to forward slashes (/) for the path to work in R.
Setting the Working Directory in R
Once you have copied the folder path, you can tell R to use it as the working directory.
setwd("C:/Users/YourName/Documents/ECON355")Or, on a Mac:
setwd("/Users/yourname/Documents/ECON355")After running this line, R will look in that folder for all files unless told otherwise.
You can confirm your working directory by running:
getwd()Alternatively, in the Files window in RStudio, navigate to your class folder. Then, on the “More” drop down menu, click “Set As Working Directory”. Be sure to copy that code to your .R script so you can run it again later.
A Note on Reproducibility
In this course, you should:
Keep
setwd()at the top of your scriptUse relative file names (e.g.,
data.csv, not long paths)Keep scripts and data in the same folder
This makes your work easier to rerun and easier for others to understand
Loading a CSV File
A csv file (comma-separated values) is a common format for data.
Try reading in some data. Download this data, which has county population, income, and citizenship data, and put it in your class folder.
Load the data with the code:
county <- read.csv("county_citizenship.csv")This reads the file into R and stores it as a data frame called county.
You can check that it loaded correctly by running:
head(county) # Shows first 5 rows geoid name tot_pop med_hh_inc total_hh
1 1001 Autauga County, Alabama 59285 69841 22523
2 1003 Baldwin County, Alabama 239945 75019 94642
3 1005 Barbour County, Alabama 24757 44290 9080
4 1007 Bibb County, Alabama 22152 51215 7571
5 1009 Blount County, Alabama 59292 61096 21977
6 1011 Bullock County, Alabama 10157 36723 3453
total__hh_w_assistance total_hh_no_assistance citizen_born
1 2077 20446 57756
2 6940 87702 230809
3 2071 7009 23981
4 1658 5913 21872
5 2968 19009 56534
6 1137 2316 9883
citizen_naturalized non_citizen non_citizen_share urban
1 827 702 0.01184111 1
2 3991 5145 0.02144241 1
3 353 423 0.01708608 0
4 48 232 0.01047309 0
5 1015 1743 0.02939688 1
6 63 211 0.02077385 0
summary(county) # Shows summary statistics for each variable geoid name tot_pop med_hh_inc
Min. : 1001 Length:3220 Min. : 43 Min. : 16170
1st Qu.:19029 Class :character 1st Qu.: 11037 1st Qu.: 54113
Median :30020 Mode :character Median : 26015 Median : 63162
Mean :31372 Mean : 104236 Mean : 65047
3rd Qu.:46104 3rd Qu.: 67575 3rd Qu.: 73216
Max. :72153 Max. :9848406 Max. :178707
total_hh total__hh_w_assistance total_hh_no_assistance
Min. : 22 Min. : 0 Min. : 22
1st Qu.: 4280 1st Qu.: 548 1st Qu.: 3560
Median : 10218 Median : 1430 Median : 8420
Mean : 39976 Mean : 5248 Mean : 34728
3rd Qu.: 26384 3rd Qu.: 3729 3rd Qu.: 22500
Max. :3390254 Max. :507314 Max. :2882940
citizen_born citizen_naturalized non_citizen non_citizen_share
Min. : 0 Min. : 0 Min. : 0.0 Min. :0.00000
1st Qu.: 9706 1st Qu.: 67 1st Qu.: 75.8 1st Qu.:0.00594
Median : 24179 Median : 271 Median : 335.0 Median :0.01440
Mean : 88906 Mean : 7495 Mean : 6824.6 Mean :0.02597
3rd Qu.: 64720 3rd Qu.: 1291 3rd Qu.: 1624.0 3rd Qu.:0.03296
Max. :6563145 Max. :1794962 Max. :1490299.0 Max. :0.25818
urban
Min. :0.0000
1st Qu.:0.0000
Median :0.0000
Mean :0.3146
3rd Qu.:1.0000
Max. :1.0000
Common Errors
Error: cannot open the connection
R cannot find the file
Most likely causes:
Working directory is incorrect
File name is misspelled
Missing “.csv” at the end of the file name
Error: no such file or directory
- The folder path in
setwd()is wrong
These errors are about file location, not about your data.
One nice thing about working in RStudio is that it will help you load things.
Exercise
- Locate your class folder on your computer
- Set your working directory in R
- Download the csv file into that folder
- Load the csv into R using
read.csv() - Run
head()andsummary()on the data