Git + RStudio Tutorial

Imagine you have a piece of code, and you’re keen on tracking its changes without losing the original version. The conventional method involves saving scripts as new files, often labeled with indicators like ‘v0’ or a timestamp. Git offers a more seamless way to version your code without the hassle of managing different version files manually. It not only tracks changes made to your files but also equips you with tools to document those changes.


1. Git vs GitHub

Git is a version control system that runs locally on your computer. It tracks changes to files over time, allows you to document those changes, and lets you return to any previous version of your project. Git works completely offline.

GitHub is an online platforms that host Git repositories. They allow you to store your project remotely, collaborate with others, and access your work from anywhere.

Several platforms provide Git hosting:

Platform Owner / Country
GitHub Microsoft, USA
GitLab GitLab Inc., USA
Bitbucket Atlassian, Australia
Codeberg Codeberg e.V., Germany
University of Amsterdam GitLab

The University of Amsterdam provides a self-hosted GitLab instance, allowing you to keep data within institutional infrastructure. You can sign up here

Features of Git platforms

GitHub and GitLab extend Git with additional features:

Feature Description
Backup Your project is stored remotely, keeping it safe even if your local machine fails.
Collaboration Sharing, viewing, and editing repositories with collaborators.
Issue tracking Track bugs, feature requests, and tasks. Keep track of changes that still need to be implemented
Software distribution For example, R packages can be installed directly from a Git repository.
Website hosting Host static websites for documentation, tutorials, or reports.
Zenodo / Figshare integration Archive releases and obtain a citable DOI.

2. Git platforms vs Data Repositories

Git platforms (like GitHub or GitLab) and data repositories (like Zenodo or Figshare) serve different roles, but are often used interchangeably in research.

Git platforms are used for active development: it is best suited for code, documents, and small datasets that change over time, allowing you to track changes, collaborate with others, and update your work frequently.

Data repositories are used for archiving: they store final versions of data and code, support large or static datasets, and provide long-term preservation through citable DOIs.

Tip

The recommended workflow for reproducible research is to develop on GitHub/GitLab throughout the project, then archive a release on Zenodo/Figshare when submitting a paper. Cite the Zenodo/Figshare DOI in your paper, not the GitHub/GitLab URL, because GitHub URLs are not guaranteed to be stable.

Exercise: Which Tool Should You Use?

For each situation below, choose the most appropriate tool from the drop-down. Correct answers turn green.

Situation Recommended tool
Writing and versioning R scripts
Collaborating on code with a colleague
Storing large raw sensor / sequencing datasets
Depositing data alongside a journal submission
Obtaining a citable DOI for your analysis
Distributing an R package for others to install
Hosting a project website or documentation

3. Setup

If everything went well we are all set for using git in rstudio by following the following instruction manual: rstudio git setup


4. Git Vocabulary

  • Repository (repo): The project folder + full history.
  • Remote: The online copy (e.g., GitHub, GitLab).
  • Commit: A snapshot of changes in your repository with a message.
  • Stage: Select files you want to include in a commit.
  • Push: Upload your local commits to the online repo.
  • Pull: Download commits from the online repo.
  • Branch: A parallel workspace for isolated development.
  • Merge: Combining that workspace back into the main project.
  • Merge conflict: When Git can’t auto-merge differences.

5. Creating a Project with Git

Start a New Project with Git (option 1)

With this option git is enabled. However, git is just used locally.

  1. File > New Project
  2. Choose New Directory > New Project
  3. Check Create a git repository

→ Terminal equivalent

Clone an Existing Remote Repository (option 2)

With this option git is enabled and able to interact with the remote.

  1. Create a repository in GitHub / GitLab
  2. Copy the repo URL
  3. In RStudio: File > New Project > Version Control > Git
  4. Paste the repo URL
  5. Select a local location and click Create Project

→ Terminal equivalent

Exercise 1: Create Your First Repository

  1. Create an empty remote git repository (GitHub / GitLab).
  2. Clone the repository locally (option 2)

6. Basic Git Workflow

Now that we have a git setup for a project we can learn how versioning with git works. The basic workflow consists out of four steps:

pullstagecommitpush.

Pull

Click Pull in RStudio to ensure your local files are up‑to‑date. Pull downloads the latest commits from the repo.

→ Terminal equivalent

Stage and Commit

  1. Tick the file(s) to stage
  2. Use Diff button to inspect changes
  3. Write a clear message
  4. Click Commit

Changes are highlighted in red (removal) and green (addition).

→ Terminal equivalent

Push

Send your commits to the remote repository.

→ Terminal equivalent


Exercise 2: Create Your First Commit

  1. Open your previously created git-enabled project.
  2. Create a code directory and add a quarto document named 00_data_preparation.qmd (File > New File > Quarto Document).
  3. Create a data directory.
  4. Stage → Commit → Push the new quarto document.
  5. Go to GitHub / GitLab and check the history.

Goal: See your commit appear on GitHub / GitLab.

Exercise 3: Make Multiple Commits

  1. Create, if not exist, a README.md in the root of the directory.
  2. Add the following section titles in the README.md:
    # introduction
    # directory structure
    # data and code description
    # usage
  3. Commit with a meaningful description of the change.
  4. Add a placeholder text in the subsection “introduction”.
  5. Commit with a meaningful description of the change.
  6. Add a placeholder table in the subsection “data and code description”.
  7. Commit with a meaningful description of the change.
  8. Add an image in the subsection “introduction”.
  9. Commit with a meaningful description of the change.
  10. Push
  11. Go to GitHub / GitLab and check the history.

Goal: Familiarise with commit → push. Learn that multiple commits can be made before changes are pushed.

Exercise 4: View History Online

  1. Go to GitHub / GitLab → Commits tab.
  2. Inspect messages and diffs.
  3. Click the directory / folder icon on the right to explore the repository at different commits.

Goal: Understand how your work appears remotely. Understand that you have a complete snapshot of your repository for every commit.


7. The .gitignore file

The .gitignore file tells Git which files should not be tracked or shared (e.g. sensitive information, large or unnecessary files). The file is by default created at the root of the project directory.

Case study: Protecting secrets

It might happen that you need to access external or private data repositories or databases with credentials. If that is the case, never save your credentials (passwords, credentials, API keys) in your R scripts. Instead, you can create a config.yml file in the root directory.

default:
  database1:
    driver: 'RPostgres::Postgres()'
    server: 'server.address.com'
    uid: 'username'
    pwd: 'password'
    port: 5432
  APIKEY:
    key: 'api-key'
    url: 'api.access.point.com'

Then, add the config.yml to .gitignore. This ensures that credentials are not shared in scripts and not committed to GitHub / GitLab. Navigate to .gitignore at the root of your project directory and add:

config.yml
.Renviron

To call the configurations in an R script use the config::get() function.

config::get()
db_credentials <- config::get('database1')
uid <- db_credentials$uid # username
pwd <- db_credentials$pw # password

Exercise 5: add sensitive files to .gitignore

  1. Create the above config.yml file in the root of the project directory.
  2. Check the Git pane and see if the config.yml file appears.
  3. Edit the .gitignore file by adding config.yml and save the changes.
  4. Verify that config.yml disappears from the Git pane.
  5. Commit the changes
  6. Add the following section to 00_data_preparation.
config::get()
db_credentials <- config::get('database1')
uid <- db_credentials$uid # username
pwd <- db_credentials$pw # password
  1. Commit the changes

Goal: Learn how to keep credentials out of Git. Learn how to use .gitignore.


8. Version history

Git’s ability to inspect and recover older versions is one of its most powerful features. Let’s explore!

Viewing the project history

Every commit is a complete snapshot of the repository at that moment. You can explore the entire history using:

  • GitHubCommits tab
  • RStudio → Git pane → History

From the history view, you can inspect:

  • Added lines (green)
  • Removed lines (red)
  • Commit messages
  • Who made the change
  • When it happened

→ Terminal equivalent: §14.3


Check out an old version

If you need to revisit a previous state of the project:

git checkout <commit-hash>

This lets you explore the repo as it was at that moment. You can run code and inspect files. This puts you in a “detached HEAD” state, a temporary view.

Check out of previous version in RStudio

To return to normal:

git checkout main

→ Terminal equivalent: §14.3


Downloading an old version (GitHub / GitLab)

On GitHub / GitLab:

1. Go to Commits
2. Click the desired commit
3. Choose Browse files at that commit
4. Click Download ZIP

This makes retrieving old versions extremely accessible without any command-line knowledge.

Exercise 6: Explore history

  1. Open Git → History in RStudio.
  2. Pick a past commit.
  3. Inspect its differences.
  4. Find which lines changed.

Goal: Understand change tracking and how to view it in RStudio

Exercise 7: Check out older versions

  1. Copy a commit hash from the history.
  2. Run:
git checkout <hash>
  1. Inspect the project at that moment.
  2. Return to main:
git checkout main

Goal: Learn to navigate past states.


9. Tags & Releases

Tags (Marking important versions)

Tags are labels you attach to specific commits. They are perfect for marking an important status of the repository. Between two important versions there might be a lot of intermediate small commits. With tags you could mark the more important changes. A tag can be created in Github / Gitlab or from the terminal.

git tag -a v1.0.0 -m "First stable version"
git push origin v1.0.0

→ Terminal equivalent: §14.5 Tags


Releases (Tags + Documentation)

GitHub / GitLab lets you turn tags into releases, adding version notes and a clearer, more official way to share a version with the community. Releases are used for automatic archiving to Zenodo or Figshare.


Exercise 8: Create a tag and release

  1. create a tag
git tag -a v0.1.0 -m "Initial milestone"
git push origin v0.1.0
  1. Explore in github
  2. Turn a tag into a release

Goal: Familiarize with marking important project versions and releasing an official version.


10. Daily Workflow

Warning

If you make changes directly in the remote (GitHub / GitLab web editor), make sure to pull before editing locally.

11. Branches

What Is a Branch?

Think of your project as a timeline of commits on the main branch. When you create a branch, you are spinning off a copy of that timeline where you can work freely, without touching the stable main code. Once you are happy with your changes, you merge the branch back into main.

Use branches for new features or analyses you are not sure about yet, experiments, bug fixes (issues), or collaborating without overwriting each other’s work.

Creating and Switching Branches in RStudio

In the Git pane, click the branch icon (top-right), type a short descriptive name (e.g. new-feature), ensure Sync branch with remote is checked, and click Create. The current branch is always shown in the top-right of the Git pane.

Warning

Always commit or stash your changes before switching branches. Uncommitted changes can carry over or cause conflicts.

Merging a Branch

Once your work is ready, you could merge the branch back into main. The recommended way is via a Pull Request (GitHub) or Merge Request (GitLab):

  1. Push your branch.
  2. On GitHub / GitLab, you will see a prompt to open a Pull / Merge Request.
  3. Add a title and description explaining what changed and why.
  4. Invite a colleague to review if collaborating.
  5. Click Merge and remove the branch.
  6. Back in RStudio, switch to main and click Pull to get the merged changes locally.
  7. In RStudio the branch that was merged may remain visible in the branches. To clean this branch run:
git fetch --prune     
git branch -vv
git branch -d new-function-add_one    

→ Terminal equivalent: §14.5 Branches


Exercise 9: Create and Merge a Branch

  1. In RStudio, create a new branch called new-function-add_one.
  2. On that branch, add a new directory functions in the code directory and an R-script functions.R.
  3. Copy the following function in the file functions.R:
add_one <- function(x){
y <- x + 1 
return(y)
}
  1. Commit and push the changes.
  2. Go to GitHub / GitLab and open a Pull / Merge Request.
  3. Merge the branch into main and remove the branch new-function-add_one.
  4. Back in RStudio, switch to main and pull to confirm the changes arrived.
  5. Also make sure the branch new-function-add_one is removed from the branch panel.

Goal: Understand isolated development and the full branch → merge cycle.

Merge Conflicts

A merge conflict happens when the same line was changed differently on two branches that you are trying to merge. Git cannot decide which version to keep, so it asks you to resolve it manually.

What conflict markers look like:

<<<<<<< HEAD
This is the version on main.
=======
This is the version on my branch.
>>>>>>> new-feature

Resolving a conflict:

  1. After doing a pull request in github for two conflicting branches github will notify you.
  2. Find the <<<<<<<, =======, and >>>>>>> markers.
  3. Decide which version to keep (or write a new version combining both).
Tip

Conflicts are less common if you keep branches short-lived and merge often. The longer a branch lives without syncing with main, the more likely conflicts occur.


Exercise 10: Merge Conflict

  1. Create a new branch conflict-test.
  2. On main, change the function name to add_two and 1 to 2 and commit changes:
add_two <- function(x){
y <- x + 2
return(y)
}
  1. on the conflict-test branch change the same function but change 1 to 3 and commit changes:
add_one <- function(x){
y <- x + 3
return(y)
}
  1. In GitHub make a pull request and merge conflict-test into main.
  2. Observe the conflict markers in the file and resolve.

Goal: See conflict markers and learn how to resolve them.

→ Terminal equivalent: §14.5 Branches


12. Managing Large Datasets

The Problem with Large Files in Git

Git is designed for text files and small data files. When you add large binary files to a repository, several problems arise:

  • The repository grows very large and slow to clone
  • Every commit retains a full copy of binary files, even unchanged ones
  • GitHub enforces a hard limit of 100 MB per file, and recommends staying under 50 MB

There are three strategies for handling large files.


.gitignore: exclude files entirely (option 1)

The simplest solution is to exclude large files from git using .gitignore. This is appropriate when the data is also stored and backed up elsewhere.

Warning

Files listed in .gitignore are not tracked at all. There is no history of changes to them, and they are not shared with collaborators via the repository.


Git LFS: version large files (option 2)

Git Large File Storage (LFS) is an extension that keeps large files out of the repository history while still allowing you to version them. Instead of storing the file directly in git, it stores a small text pointer, while the actual file lives on the remote server (GitHub / GitLab).

Use Git LFS when you want to track versions of large files, for instance html files rendered from markdown, without bloating the repository.

Setting up Git LFS

Step 1 — Install LFS (once per machine, download from git-lfs.com):

git lfs install

Step 2 — Track file types (once per repository):

git lfs track "*.html"

This creates / updates a .gitattributes file in your repository:

*.html filter=lfs diff=lfs merge=lfs -text

Step 3 — Commit .gitattributes and proceed as normal:

git add .gitattributes
git commit -m "Set up Git LFS for website content"

From this point on, git add, git commit, and git push work exactly as before — LFS handles the large files transparently.

Note

Storage limits: GitHub free accounts include 1 GB of LFS storage and 1 GB/month bandwidth. GitLab free accounts include 5 GB LFS storage. For larger needs, institutional or paid plans apply.


External storage (option 3)

For very large, continuously growing, or highly structured datasets that cannot be managed on your personal computer the best approach is to keep the data completely outside the repository (see table). In this case your git project repository will host the code and queries how to access the datasets and possible sample datasets.

Table. Examples of external storage
Storage type Examples
cloud storage Nextcloud (research drive), AWS S3
Relational databases PostgreSQL, MySQL, SQLite
Tape storage SURF Archive
External hard drives Secundary Backup

Real-world example: Birdradar

Radar continuously monitors and tracks the presence of birds. This generates large datasets including stored in a relational PostgreSQL database. The git repository contains the SQL queries used to extract and filter the data, along with the R scripts for analysis.

# This query script IS in git — the data it retrieves is NOT
dbname <- 'database1'
db1 <- config::get(dbname)
con <- DBI::dbConnect(
  RPostgres::Postgres(),
  host     = db1$host,      # stored in .Renviron
  dbname   = dbname,
  user     = db1$uid,
  password = db1$pwd
)

radar_data <- DBI::dbGetQuery(con, "
  SELECT track_id, classification_id, timestamp_start, timestamp_end, trajectory
  FROM tracks
  WHERE timestamp_start BETWEEN '2024-08-01' AND '2024-08-31'
")

13. Hosting a Website with Quarto + GitHub Pages

GitHub Pages lets you host a free, public website directly from a GitHub repository. Combined with Quarto it allows you to deploy a website within 5 minutes.

Exercise 11: hosting a website on github with quarto

  1. Create a git repository online
  2. Clone the remote repository (← See §5)
  3. In Rstudio: New Project > New Directory > Quarto Website > Create Project
  4. Navigate to the created Quarto Website Project Directory and copy-paste all content in the cloned git repository, which will host your website
  5. Open _quarto.yml and add the output-directory where the site will be rendered
project:
  type: website
  output-dir: docs
  1. Open about.qmd and click render (this will generate the website in a directory named docs)
  2. Commit all changes (your website is online)
  3. In github, in your repository, navigate to settings > pages. Specify the website branch (main) and the folder in which the website is stored.
  4. Navigate to https://<USERNAME>.github.io/<REPOSITORY-NAME>

14. Using Git Programmatically (Terminal)

Everything covered in this tutorial can also be done via the Terminal (or RStudio’s built-in Terminal tab: Tools → Terminal → New Terminal). This is useful when you want to automate steps, work on a remote server, or simply prefer the command line.

Below is a complete reference that mirrors the RStudio workflow sections above.


14.1 Creating / Cloning a Repository

New local repo

← RStudio version (§5)

# creates a new git repo in ./my-project
git init <my-project>
cd <my-project>

Clone an existing remote repo

← RStudio version (§5)

git clone https://github.com/<user>/<repo-name>.git
cd <repo-name>

14.2 Basic Workflow: Pull → Stage → Commit → Push

Pull

Download latest changes from the remote (Section 6):

git pull

← RStudio version: §6

Stage

Select which files to include in the next commit (Section 6):

git status           # see which files changed
git add <filename>.R # stage a specific file
git add .            # stage all changed files
git diff --staged    # inspect staged changes before committing

Commit

Save a snapshot with a message (Section 6):

git commit -m "Add species count analysis"

← RStudio version: §6

Push

Upload commits to the remote (Section 6):

git push
# First push on a new branch:
git push -u origin main

← RStudio version: §6


14.3 Version History

View commit history (Section 8):

git log --oneline            # compact one-line view
git log --oneline --graph    # with branch graph

← RStudio version: §8 ·

Check out an old version (Section 8):

git checkout <commit-hash>   # enter detached HEAD state
git checkout main            # return to normal

← RStudio version: §8


14.4 Tags

Create and push a tag (Section 9):

git tag -a v1.0.0 -m "First stable version"
git push origin v1.0.0

List existing tags:

git tag

← RStudio version: §9 Tags & Releases


14.5 Branches

← RStudio version: §11 Branches

Create a new branch and switch to it (Section 11):

git checkout -b new-feature
# also push it to the remote immediately:
git push -u origin new-feature

Switch between branches (Section 11):

git checkout main
git checkout new-feature

Merge a branch into main (Section 11):

git checkout main
git merge new-feature
git push

Delete a branch after merging:

git branch -d new-feature             # local
git push origin --delete new-feature  # remote

Resolve a merge conflict (Section 11): open the file, remove the conflict markers, then:

git add conflicted-file.R
git commit -m "Resolve merge conflict in conflicted-file.R"
git push

14.7 Quick Reference Card

Action Command
Clone repo git clone <url>
Pull latest changes git pull
Check status git status
Stage a file git add <file>
Stage everything git add .
Commit git commit -m "message"
Push git push
Create branch git checkout -b <name>
Switch branch git checkout <name>
Merge branch git merge <name>
View history git log --oneline
Create tag git tag -a v1.0.0 -m "msg"
Push tag git push origin v1.0.0
Set up LFS git lfs install
Track type with LFS git lfs track "*.tif"