Lesson 2: Set Working Directory and Load Data

In this lecture, we move from running self-contained examples to working with real data files. To do that reliably, you need to understand where R looks for files and how to tell it where your class materials live on your computer.

The key ideas in this lecture are:

What Is a Working Directory?

The working directory is the folder on your computer where R looks for files by default.

When you ask R to load a file like this:

read.csv("data.csv")

R interprets that as:

“Look for a file name data.csv inside the current working directory.”

If R is pointed at the wrong folder, it will not find your file—even if the file exists somewhere else on your computer.

Finding Your Class Folder

Before setting the working directory, you need to know where your class folder is located. These are directions for R to find the folder.

You should create a folder called something like Regression or ECON 355. In this folder, you should keep the data and scripts for this class. Keeping everything in one folder helps you stay organized and prevents losing your work. (You class folder should not be your downloads folder.)

Finding Folder on a Mac

  1. Open Finder
  2. Navigate to your class folder
  3. Right-click (or Control-click) the folder
  4. Hold down the Option key
  5. Click Copy “” as Pathname

This copies the full file path to your clipboard.

A Mac file path looks like this:

/Users/yourname/Documents/ECON355

Finding Folder on Windows

  1. Open File Explorer
  2. Navigate to your class folder
  3. Click once in the address bar at the top
  4. The folder path will appear
  5. Copy the path

A Windows file path looks like this:

C:/Users/YourName/Documents/ECON355
Important

When you copy the file path, it will have backwards slashes (\), which you need to change to forward slashes (/) for the path to work in R.

Setting the Working Directory in R

Once you have copied the folder path, you can tell R to use it as the working directory.

setwd("C:/Users/YourName/Documents/ECON355")

Or, on a Mac:

setwd("/Users/yourname/Documents/ECON355")

After running this line, R will look in that folder for all files unless told otherwise.

You can confirm your working directory by running:

getwd()

Alternatively, in the Files window in RStudio, navigate to your class folder. Then, on the “More” drop down menu, click “Set As Working Directory”. Be sure to copy that code to your .R script so you can run it again later.

A Note on Reproducibility

In this course, you should:

  • Keep setwd() at the top of your script

  • Use relative file names (e.g., data.csv, not long paths)

  • Keep scripts and data in the same folder

This makes your work easier to rerun and easier for others to understand

Loading a CSV File

A csv file (comma-separated values) is a common format for data.

Try reading in some data. Download this data, which has county population, income, and citizenship data, and put it in your class folder.

Load the data with the code:

county <- read.csv("county_citizenship.csv")

This reads the file into R and stores it as a data frame called county.

You can check that it loaded correctly by running:

head(county) # Shows first 5 rows
  geoid                    name tot_pop med_hh_inc total_hh
1  1001 Autauga County, Alabama   59285      69841    22523
2  1003 Baldwin County, Alabama  239945      75019    94642
3  1005 Barbour County, Alabama   24757      44290     9080
4  1007    Bibb County, Alabama   22152      51215     7571
5  1009  Blount County, Alabama   59292      61096    21977
6  1011 Bullock County, Alabama   10157      36723     3453
  total__hh_w_assistance total_hh_no_assistance citizen_born
1                   2077                  20446        57756
2                   6940                  87702       230809
3                   2071                   7009        23981
4                   1658                   5913        21872
5                   2968                  19009        56534
6                   1137                   2316         9883
  citizen_naturalized non_citizen non_citizen_share urban
1                 827         702        0.01184111     1
2                3991        5145        0.02144241     1
3                 353         423        0.01708608     0
4                  48         232        0.01047309     0
5                1015        1743        0.02939688     1
6                  63         211        0.02077385     0
summary(county) # Shows summary statistics for each variable
     geoid           name              tot_pop          med_hh_inc    
 Min.   : 1001   Length:3220        Min.   :     43   Min.   : 16170  
 1st Qu.:19029   Class :character   1st Qu.:  11037   1st Qu.: 54113  
 Median :30020   Mode  :character   Median :  26015   Median : 63162  
 Mean   :31372                      Mean   : 104236   Mean   : 65047  
 3rd Qu.:46104                      3rd Qu.:  67575   3rd Qu.: 73216  
 Max.   :72153                      Max.   :9848406   Max.   :178707  
    total_hh       total__hh_w_assistance total_hh_no_assistance
 Min.   :     22   Min.   :     0         Min.   :     22       
 1st Qu.:   4280   1st Qu.:   548         1st Qu.:   3560       
 Median :  10218   Median :  1430         Median :   8420       
 Mean   :  39976   Mean   :  5248         Mean   :  34728       
 3rd Qu.:  26384   3rd Qu.:  3729         3rd Qu.:  22500       
 Max.   :3390254   Max.   :507314         Max.   :2882940       
  citizen_born     citizen_naturalized  non_citizen        non_citizen_share
 Min.   :      0   Min.   :      0     Min.   :      0.0   Min.   :0.00000  
 1st Qu.:   9706   1st Qu.:     67     1st Qu.:     75.8   1st Qu.:0.00594  
 Median :  24179   Median :    271     Median :    335.0   Median :0.01440  
 Mean   :  88906   Mean   :   7495     Mean   :   6824.6   Mean   :0.02597  
 3rd Qu.:  64720   3rd Qu.:   1291     3rd Qu.:   1624.0   3rd Qu.:0.03296  
 Max.   :6563145   Max.   :1794962     Max.   :1490299.0   Max.   :0.25818  
     urban       
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.3146  
 3rd Qu.:1.0000  
 Max.   :1.0000  

Common Errors

Error: cannot open the connection

  • R cannot find the file

  • Most likely causes:

    • Working directory is incorrect

    • File name is misspelled

    • Missing “.csv” at the end of the file name

Error: no such file or directory

  • The folder path in setwd() is wrong

These errors are about file location, not about your data.

One nice thing about working in RStudio is that it will help you load things.

Exercise

  1. Locate your class folder on your computer
  2. Set your working directory in R
  3. Download the csv file into that folder
  4. Load the csv into R using read.csv()
  5. Run head() and summary() on the data