3 Version Control
These notes where originally developed for the class 971001 Software at the University of Innsbruck and are slightly adapted for this notes. See Kandolf (2024) for the original source and github pages for the notes.
A Version Control Software (VCS) is, in its most abstract form, designed to keep track of changes to a file system. This means it tracks changes in files, such as addition or deletion, but also more basic modifications to files or directories. A VCS also allows teams to collaborate and therefore inform each other of changes made by others. Some of the main features of a VCS are:
- Revert a file or an entire directory to a previous state.
- Compare files or directories to a previous version.
- Help resolving conflicts if multiple changes on the same resource collide with each other.
- Allows you to tag specific versions of a file for alter use.
- Keep track who changed what and when.
- Much more
In the scope of an entire file system, this is used by operation systems to minimize file loss and allowing users to restore unwanted changes. Most often, and that is also the main focus here, a VCS is used to track source code. This might be a software project or a term paper. Nevertheless, the VCS will keep track of additions, deletions, modifications and so forth of individual lines of text within these files. We will focus on Git but there are alternatives around such as Mercurial, CVS, or SVN. All of them have individual strengths and weaknesses but, in terms of widespread use, Git is in the lead.
What is git and a bit of history
Git was developed as a free and open source software by Linus Torvalds in 2005 for the development of the Linux kernel. The main goals in the development were (according to wikipedia):
- Take Concurrent Versions System (CVS) as an example of what not to do; if in doubt, make the exact opposite decision.
- Support a distributed, BitKeeper-like workflow.
- Include very strong safeguards against corruption, either accidental or malicious.
The development of Git began on 3 April 2005. Torvalds announced the project on 6 April and became self-hosting the next day. The first merge of multiple branches took place on 18 April. Torvalds achieved his performance goals; on 29 April, the nascent Git was benchmarked recording patches to the Linux kernel tree at the rate of 6.7 patches per second. On 16 June, Git managed the kernel 2.6.12 release.
Source: wikipedia on Git, as of 9th of September 2024
Regarding the name:
Torvalds sarcastically quipped about the name git (which means “unpleasant person” in British English slang): “I’m an egotistical bastard, and I name all my projects after myself. First ‘Linux’, now ‘git’.” The man page describes Git as “the stupid content tracker”. The read-me file of the source code elaborates further:
Source: wikipedia on Git, as of 9th of September 2024
“git” can mean anything, depending on your mood.
- Random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
- Stupid. Contemptible and despicable. Simple. Take your pick from the dictionary of slang. “Global information tracker”: you’re in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- “Goddamn idiotic truckload of sh*t”: when it breaks.
In the following we try to give a pragmatic hands on introduction to the concepts of Git and how it can be uses. The notes are a mix of several sources but the main ideas are based on the git training by the UnseenWizzard.
Installing Git on your system
- Open the terminal
- Type
git --version
- If Git is not installed use your package manager to install it (Ubuntu:
apt
, Fedoradnf
, …) withsudo apt install git
- Open the terminal
- Type
git --version
- If Git is not installed install it via
xcode-select --install
- Download Git von https://git-scm.com/ and install it with the default settings (unless you really, really know what you are doing).
- Open the git bash and verify the installation by typing
git --version
In the following we will always refer to the Terminal, by which we mean the Git Bash on Windows and the Terminal on Linux or Macs.
Additional resources
Obviously not everybody learns the same way and the concepts of Git can make your brain twist a bit so here are some additional resources:
- Ponuthorai and Loeliger (2022): Version Control with Git: Powerful Tools and Techniques for Collaborative Software Development;
- Chacon and Straub (2014): Pro Git: Everything you need to know about Git, Online
- Siessegger (2024): Git – kurz & gut; German
- Polge (2024): A Visual Git Reference, Online
- Coglan (2014): A book that tells you how to build Git on your own, Shop
- Polge (2024): Write yourself a Git; an online book that tells you how to build Git on your own, Link
- Git cheat sheet from education.github.com
3.1 Basics of Git
Git is a distributed VCS, which means the entire repository is distributed on various machines and possible multiple remotes. This was a clear design choice that allows individual contributors to work independently of the availability of a remote but still have the full history available. The following is based on the git training by the UnseenWizzard. The main structure as well as the basic idea in the pictures follows these notes, with a view adaptations where needed.
In particular, such a setup could look something like this:
In this setup the Remote Repository is the place you send your changes to in order to have them visible for others, and in return you can get changes from them via the Remote Repository.
Like the name suggests the Local (development) environment sits on your machine. The working directory is your current version of the files contained in the repository and the Local Repository is the copy of the entire repository (with all changes) on your machine. We will learn more about these parts as we go along.
We will use the small exercise from the previous Python section as python_ex1.py
file.
3.1.1 Let us start with getting a Remote Repository
In order to allow a playfield for this class I created a repository that we are going to use. In order to get it onto you local machine, type the following commands in the Terminal:
# Navigate to a suitable directory
git clone https://github.com/kandolfp/playground.git
This will perform the following two actions:
- Checkout the content of the remote repository into the working directory. By default this is the name of the repository, in your case the folder
playground
is created and all files are put there. - A copy of the remote repository is stored in the local repository. For all intended purposes, it acts exactly the same as the Remote Repository, with the sole exception that is not shared with others.
3.1.2 Adding content
With the following snippet you can view the content of the repository (the second line is the response):
> ls playground
python_ex1 README.md
As you can see there is currently only a README.md
and the folder python_ex1
.
Now lets add our solutions of the exercises we did in Python to the repository.
First we copy the file to exercise directory
cp PATH/TO/YOUR/FILE/solution_ex1.py python_ex1/{YOUR ID}.py
In my case the command reads like this:
> cp ../Exercises/reference_solution.py python_ex1/ID.py
'../Exercises/reference_solution.py' -> 'python_ex1/ID.py'
This has modified our working directory. In order to get an idea what Git thinks about this lets run git status
in the working directory:
On branch main
Your branch is up to date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
python_ex1/ID.py
nothing added to commit but untracked files present (use "git add" to track)
Lets break down the output of the command. First, Git tells you which branch you are on (we will hear more about that later), second, that local repository is different from the remote repository and it states that you have tracked and untracked files.
Now, a tracked files is a file that is part of the repository and Git is keeping track of what is happening to it. An untracked file on the other hand, is a file that is in the same directory but it is not managed by Git.
Git tells you how to change the status of your untracked file into a tracked file.
We do this by running git add python_ex1/ID.py
.
Now it is time to introduce another Git concept, namely the staging area. The stating area is the curious white spot between your working directory and the local repository in the above pictures. This is the place where Git collects all the changes to your files that you want to put into the local repository.
By rerunning git status
we get
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: python_ex1/ID.py
Now that we are confident that all our changes are in the staging area, we are ready to commit the changes to the local repository. This is done by running git commit
. A text editor will open and you are able to write a message telling everybody what you just did in the repository. Usually you will also see the changes you are about to commit. The commit message is something really important and the message should be meaningful and readable as it will be later used by others to understand your action(s). There are several ways to do this and it boils down to what your team wants, but here are some links on good commit messages:
The same can be achieved directly in the Terminal by writing
> git commit -m "feat: add my solution for python exercises 1"
[main 578e48f] feat: add my solution for python exercises 1
1 files changed, 85git insertions(+)
create mode 100644 python_ex1/ID.py
In the above message you can see that your commit gets some more meta data. Specifically, it gets a SHA-1 hash, namely 578e48f
The hash is used to keep track of your commits and is one of the ground breaking ideas that makes Git so successful. The hash is much longer, but due to its nature it is in most cases unique from the first seven digits.
Any changes done to a file after running git add <file>
will not be part of a commit. If they should be included you need to rerun git add <file>
.
My submitting an empty commit message you can abort a commit.
Now the changes are in the local repository and you can continue working. In order to share your changes with others you need to get them to the remote repository. This is done by pushing the changes. We do this by calling git push
, which gives us an output similar to:
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 16 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 1.17 KiB | 1.17 MiB/s, done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/kandolfp/playground.git
e8f13f4..0b8a494 main -> main
Now, if you were to do this on your own, everything would work and you would be happy. Unfortunately, since we are doing this in a class and at the same time, we will encounter some difficulties. After all, this is a crash course for Git, so eventually something hat to crash.
Some of you might get the following message for git push
:
To https://github.com/kandolfp/playground.git
! [rejected] main -> main (non-fast-forward)
error: failed to push some refs to 'https://github.com/kandolfp/playground.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
This gives us the opportunity to talk about how to get changes from the remote repository to your local repository after the initial clone. But first, in order to do this properly, we need to talk about branches.
3.2 Branches
The word branch was mentioned several times before but not explained. The main idea is rather simple.
If you consider having one commit after the other in a long chain like the trunk of a tree a branch is the same as for a tree:
In short, whenever multiple commits are based on the same commit they (and all following commits) form different branches.
By default, git always operates on branches. When we cloned the remote repository we also cloned its branches and we started working on the main
branch. You can go back and check the messages, it is always there.
Now without knowing we created a branch. It is not visible to us but it is clear from the point of the remote repository.
There are several ways of integrating or merging two branches back into one.
For now we will only talk about the most elegant and simplest way, with a rebase
.
Naturally, every branch is based on a commit. In the above example 9a98eb2
is based on e8f13f4
. As the name suggests, rebase simple changes this base. This gives us a clean way of how the entire Git commit chain is supposed to be read. We will see one way to perform a rebase in a moment. But first we need to know how to get remote changes into your local repository.
3.3 Integrating remote changes into your local environment
The above error message gives us already a hint on what to do but lets make it more structured.
By running:
> git status
On branch main
Your branch and 'origin/main' have diverged,
and have 1 and 1 different commits each, respectively.
(use "git pull" to merge the remote branch into yours)
nothing to commit, working tree clean
we can see that the remote and the local repository have different commits.
With git fetch
you can get changes from the remote repository into the local repository. This is the other way around as with the git push
command.
The important part here is, that this does not affect your working directory as the changes are only synchronized with the local repository and when you try to push again you will see the same message. It does not even affect your local branches, it will only make sure that all of the remote branches are synchronized.
3.3.1 Pulling
In order to affect the working directory and your local branches, we need to pull the changes in. This is done with git pull
.
As we have some conflicts we need to define a strategy how to deal with them. At the moment we only know one, so let us use:
> git pull --rebase
Successfully rebased and updated refs/heads/main.
This should have worked for everybody as all of you added different files to the repository and the Git tree looks something like this:
Next we will see what happens if we modify some files.
3.4 Modifying content in a repository
A good start to do this is to link our uploaded file to the table in the README
file in the repository.
With your favourite editor add the following content next to your ID
(btw. this is markdown syntax):
# Playground
## List of submitted python exercises
| Name/UID | File |
| ----------- | ----------- |[my upload](python_ex1/ID.py) | | ID |
If we check with git status
we can see that README.md
is modified.
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: README.md
no changes added to commit (use "git add" and/or "git commit -a")
Of course this is only a change in our working directory and not in either of the two repositories. Before we add the changes to the local repository we can use git diff
to see what we actually changed.
diff --git a/README.md b/README.md
index d9f0acb..9ac3846 100644--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
| Name/UID | File |
| ----------- | ----------- |-| ID | |
+| ID | [my upload](python_ex1/ID.py)|
| ID1 | |
| ID2 | | | ID3 | |
We already know the next steps, add, commit, and push.
So lets recall, with git add README.md
we move the file into the staging area. Note: If you run git diff
now, the output is empty. This is because, git diff
only works on the changes in your working directory. You can still get the diffs from your staging area with git diff --staged
(some editors will use this if you type up your commit message).
Now, before we commit, we decide to modify README.md
again. Maybe we made a typo or we just really want to nail this hand in so we change it, maybe we add
[my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`| | ID |
to make it clear we know what we are doing.
If we run git status
we see the following
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: README.md
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: README.md
which tells us that README.md
is modified and staged.
If we run a git diff
again we get
diff --git a/README.md b/README.md
index 9ac3846..28dba4b 100644--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
| Name/UID | File |
| ----------- | ----------- |-| ID | [my upload](python_ex1/ID.py)|
+| ID | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|
| ID1 | |
| ID2 | | | ID3 | |
which shows the changes to the staging area. If we are satisfied with our changes, we can use git add README.md
again to add the file to the staging area and finally commit it with git commit
. Of course we do this with a meaningful commit message.
Depending on your timing, you might have to fetch and pull in changes to your local repository. By the way, you can directly call git pull
, without first calling git fetch
, the fetch is done implicitly. But we should not get a conflict as everybody changed a different line.
3.5 Conflicts and how to resolve them
It will not always be this smooth and conflicts occur. For example when two commits with the same base make changes to a single line. We simulate this by simply copying our local repository and working directory - either with a new git clone
or by copying the directory.
For this, we assume that in location A we changed README.md
to
| Name/UID | File |
| ----------- | ----------- |[my upload](python_ex1/ID.py) | | ID |
and we commit and push this change to the remote repository.
À> git diff
diff --git a/README.md b/README.md
index d9f0acb..28dba4b 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
| Name/UID | File |
| ----------- | ----------- |
-| ID | |
+| ID | [my upload](python_ex1/ID.py)|
| ID1 | |
| ID2 | |
| ID3 | |
A> git add README.md
A> git commit -m "add my exercise sheet"
[main a16b809] add my exercise sheet
1 file changed, 1 insertion(+), 1 deletion(-)
A> git push
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 16 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 769 bytes | 769.00 KiB/s, done.
Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
To https://github.com/kandolfp/playground.git
0b8a494..a16b809 main -> main
Now in location B we do not get the changes from the remote repository but modify README.md
to
| Name/UID | File |
| ----------- | ----------- |[my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`| | ID |
and we try to do the same as before:
B> git diff
diff --git a/README.md b/README.md
index d9f0acb..28dba4b 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
| Name/UID | File |
| ----------- | ----------- |
-| ID | |
+| ID | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|
| ID1 | |
| ID2 | |
| ID3 | |
B> git add README.md
B> git commit -m "add my exercise sheet"
[main d9ac598] add my exercise sheet
1 file changed, 1 insertion(+), 1 deletion(-)
B> git push
To https://github.com/kandolfp/playground.git
! [rejected] main -> main (fetch first)
error: failed to push some refs to 'https://github.com/kandolfp/playground.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
This is to be expected so we pull in the changes from remote as we learned with rebase:
B> git pull --rebase
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
error: could not apply d9ac598... add my exercise sheet
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply d9ac598... add my exercise sheet
As usual, Git is quite helpful and tells you what to do. We have several options:
- solve the conflicts, add the files and continue the rebase.
- skip our our commit
d9ac598
, so do not apply these changes - abort the procedure
We opt for 1. and take a look with git diff
diff --cc README.md
index 12ee10e,28dba4b..0000000--- a/README.md
+++ b/README.md
@@@ -4,7 -4,7 +4,11 @@@
| Name/UID | File |
| ----------- | ----------- |++<<<<<<< HEAD
+ | ID | [my upload](python_ex1/ID.py) |
++=======
+ | ID | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|
++>>>>>>> d9ac598 (add my exercise sheet)
HEAD
is the latest commit in the chain of commits on the current branch on the remote repository. So we see:
- what
HEAD
brings in<<<<<<< HEAD
, - the end of the changes is marked with
=======
- and what we want to push out ended by a line
>>>>>>> d9ac598
together with the commit message.
This will be repeated for every conflict in the file.
If the conflicts are more elaborate and connected it is good to use a tool to sort it out. Your favourite IDE will most likely come with some tool or you look at specific Git tools for conflict resolution.
For us it is simple. We just want the file to look like
[my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`| | ID |
so we make these changes and call
B> git add README.md
B> git rebase --continue
[detached HEAD 3c6b1e6] add my exercise sheet, and make a conflict resolution
1 file changed, 1 insertion(+), 1 deletion(-)
Successfully rebased and updated refs/heads/main.
This will prompt us to write a commit message. Lets use add my exercise sheet, and make a conflict resolution
.
3.6 Stashing
There is one more case we need to have a look at. What if we made some changes to a file, are not ready to make a commit yet, but need to pull in some changes coming from the remote? Another scenario would be that something in the repository needs urgent fixing so we need to switch back to a clean copy without loosing our current work. Git gives us the possibility to deal with these situations with yet another area and the git stash
command.
> git status
On branch main
Your branch is behind 'origin/main' by 1 commits (non-fast-forward).
(use "git pull" to update your local branch)
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: python_ex1/ID.py
no changes added to commit (use "git add" and/or "git commit -a")
> git diff
diff --git a/python_ex1/ID.py b/python_ex1/ID.py
index f5b74fc..22038a5 100644
--- a/python_ex1/ID.py
+++ b/python_ex1/ID.py
@@ -61,6 +61,11 @@ print(f"Accuracy of pi with N = 100: {get_accuracy( 100):8.5f}")
print(f"Accuracy of pi with N = 1000: {get_accuracy( 1000):8.5f}")
print(f"Accuracy of pi with N = 100000: {get_accuracy(100000):8.5f}")
+# ----------------------------------------------
+# Alternative Implementation for (2)
+# ----------------------------------------------
+points = np.random.uniform(0, 1, [2, N])
+
# ----------------------------------------------
# (3) Gaussian density
We have a dirty working directory as we just started to work on an alternative implementation for (2) but we are 1 commit behind the remote. With git stash push
we can tell git to put all the changes aside for us and keep them safe (an optional message can be added). After pulling the remote changes back in we can finally reapply our stashed work by calling git stash pop
(this will apply the latest stash, in case we have several).
Here it is as image and in the terminal.
> git stash push
Saved working directory and index state WIP on main: 4e76603 update list of participants
> git status
On branch main
Your branch is behind 'origin/main' by 1 commits, and can be fast-forwarded.
(use "git pull" to update your local branch)
nothing to commit, working tree clean
> git pull --rebase
Successfully rebased and updated refs/heads/main.
> git stash list
stash@{0}: WIP on main: 4e76603 update list of participants
> git stash pop
On branch main
Your branch is up to date with 'origin/main'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: python_ex1/ID.py
no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (d31654c50cfffd2b3f4b931ccf07b5c8362d365a)
> git diff
diff --git a/python_ex1/ID.py b/python_ex1/ID.py
index f5b74fc..22038a5 100644
--- a/python_ex1/ID.py
+++ b/python_ex1/ID.py
@@ -61,6 +61,11 @@ print(f"Accuracy of pi with N = 100: {get_accuracy( 100):8.5f}")
print(f"Accuracy of pi with N = 1000: {get_accuracy( 1000):8.5f}")
print(f"Accuracy of pi with N = 100000: {get_accuracy(100000):8.5f}")
+# ----------------------------------------------
+# Alternative Implementation for (2)
+# ----------------------------------------------
+points = np.random.uniform(0, 1, [2, N])
+
# ----------------------------------------------
# (3) Gaussian density
With git stash list
we can view different stashes.
3.7 History
The last thing we look into is the history. Of course it is possible to look what happened in the repository. With git log
we can do this
B> git log
commit 3c6b1e63504e5f46d80d50d7188a4d5303a7aa86 (HEAD -> main)
Author: {Your Name} <{Your email}>
Date: Fri Oct 14 08:29:36 2022 +0200
add my exercise sheet, and make a conflict resolution
commit a16b809ce95b319180373c6b0c00647f2a6539f4 (origin/main, origin/HEAD)
Author: {Your Name} <{Your email}>
Date: Fri Oct 14 08:25:14 2022 +0200
add my exercise sheet
commit 0b8a49431b40aed9903d1ec6b76c243c20613b92
Author: {Your Name} <{Your email}>
Date: Sun Oct 9 15:59:42 2022 +0200
feat: add my solution for python exercises 1
Most likely we will see way more commits here as our fellow students made some commits as well.
So this is the official log of the repository but sometimes it is nice to see more, especially what happened when. Maybe we messed up a rebase and our changes are missing or something similar. As Git was build with fail safes in mind it has you covered there. What we want to look at is reflog
B> git reflog
3c6b1e6 (HEAD -> main) HEAD@{0}: rebase (continue) (finish): returning to refs/heads/main
3c6b1e6 (HEAD -> main) HEAD@{1}: rebase (continue): add my exercise sheet, and make a conflict resolution
a16b809 (origin/main, origin/HEAD) HEAD@{2}: pull --rebase (start): checkout a16b809ce95b319180373c6b0c00647f2a6539f4
d9ac598 HEAD@{3}: commit: add my exercise sheet
5673f78 HEAD@{4}: commit: list of student ids
0b8a494 HEAD@{5}: pull --rebase (finish): returning to refs/heads/main
0b8a494 HEAD@{6}: pull --rebase (start): checkout 0b8a49431b40aed9903d1ec6b76c243c20613b92
9a98eb2 HEAD@{7}: commit: feat: add my solution for python exercises 1
a0f8f01 HEAD@{8}: clone: from https://github.com/kandolfp/playground.git
This command shows us what happened in our local repository.
3.8 Further stuff
There is much more to see and do but this concludes the absolute basics. You will learn way more when you work with Git for some time. Some topics that you will come across are:
- The
.gitignore
file - More elaborate work with branches
- Merging of branches
- Cherry picking
- Reverting commits
git blame
to find out where this line of code comes from- and so much more