Appendix B — Downloading and Preparing the Data

To fully engage with the exercises and examples in this book, you’ll need to download the datasets provided. The data is organized in a folder named r-data, which contains all the files we’ll use throughout the chapters.

B.1 Downloading the Data

  1. Access the Data Folder

    Visit the following link to access the r-data folder on Google Drive:
    https://bit.ly/r-data-directory or https://drive.google.com/drive/folders/1ZhI-t94uZa82KD8hEN0f1WALfCiRFWCP

  2. Download the r-data Folder

    • Once you’re on the Google Drive page, you should see the r-data folder listed.
    • Right-click on the r-data folder and select Download.
    • Google Drive will compress the folder into a ZIP file before downloading it to your computer.
  3. Unzip the Folder

    • After the download is complete, locate the ZIP file on your computer (usually in your Downloads folder).
    • Extract the contents of the ZIP file:
      • Windows: Right-click the ZIP file and select Extract All, then follow the prompts.
      • macOS: Double-click the ZIP file to extract it.
      • Linux: Right-click and select Extract Here, or use the command line unzip filename.zip.
  4. Verify the Contents

    • Open the extracted r-data folder to ensure all files are present.
    • You should see various datasets in formats like CSV, Excel, and others, which we’ll use in different labs.

B.2 Setting Up Your Working Directory

To keep your work organized and ensure consistency across exercises, we’ll create a dedicated RStudio Project for each lab or exercise that uses data from the r-data folder. This approach helps manage your files efficiently and ensures that your working directory is correctly set for each task.

B.2.1 Creating a New RStudio Project for Each Exercise

  1. Identify the Lab or Exercise

    • Determine which lab or exercise you’re working on (e.g., Lab 2, Exercise 4.1).
  2. Create a Directory for the Project

    • On your computer, create a new folder with a meaningful name for the lab or exercise, such as Lab2_Project or Exercise4_1_Project.
  3. Copy Necessary Data Files

    • From the extracted r-data folder, copy the specific data files needed for the exercise into your new project folder.

    • Alternatively, you can copy the entire r-data folder into your project directory if multiple datasets are required.

  4. Create a New RStudio Project

    • Open RStudio.

    • Go to File > New Project.

    • Choose Existing Directory.

    • Browse to the directory you just created for the lab or exercise.

    • Select the folder and click Create Project.

  5. Organize Your Project Files

    • Within your project directory, consider creating subfolders such as data, scripts, and output to further organize your work.

      • Place your data files in the data folder.

      • Save your R scripts in the scripts folder.

      • Direct any output files (like graphs or reports) to the output folder.

  6. Working Within the Project

    • When you open the RStudio Project, your working directory is automatically set to the project’s root directory.

    • When reading or writing files, use relative paths starting from the project directory to ensure your code works on any system where the project folder is set as the working directory.

# Example of reading a CSV file from the data folder
data <- read_csv("r-data/your-dataset.csv")
Note

Make sure to use forward slashes / in the file path, even on Windows.

B.2.2 Benefits of Using Separate Projects for Each Exercise

  • Organization: Keeps your work for each lab or exercise neatly contained, preventing files from different tasks from mixing.

  • Reproducibility: By maintaining all necessary files within each project, you make it easier to revisit or share your work without missing dependencies.

  • Clarity: Helps you focus on the specific objectives of each exercise without distractions from other projects.

B.3 Data Usage and Ethics

The datasets and link provided are safe and intended for educational use in conjunction with this book to help you practice and apply the concepts covered. Please use the data responsibly and refrain from using it for any unauthorized purposes.

  • Privacy: Be mindful that while the datasets are fictional or anonymized, they may represent sensitive topics. Handle all data with respect and confidentiality.

  • Attribution: If you use the datasets in any presentations or projects outside of this book’s exercises, please acknowledge the source appropriately.

B.4 Getting Help

If you encounter any issues downloading or accessing the data:

  • Check Your Internet Connection: Ensure you have a stable connection when downloading the data.

  • Try a Different Browser: Sometimes switching browsers can resolve download issues.


By setting up the data as described, you’ll be ready to dive into the hands-on labs and fully engage with the practical exercises. Having the data organized and accessible will streamline your workflow and enhance your learning experience.

Happy analyzing!