ULG 971001 - Software (VU 2) - python, git, docker

Basics of Git

Git is a distributed VCS, in this case this means the entire repository is distributed on various machines and possible multiple remotes. This was a clear design choice that allows individual contributors to work independently of the availability of a remote but still have the full history available. The following is based on the git training by the UnseenWizzard. The main structure as well as the basic idea in the pictures follows this notes, with a view adaptations where needed.

In particular, such a setup could look something like this:

 Basic setup
Basic setup

In this setup the Remote Repository is the place you send your changes to in order to have them visible for others, and in return you can get changes from them via the Remote Repository.

Like the name suggests the Local (development) environment sits on your machine. The working directory is your current version of the files contained in the repository and the Local Repository is the copy of the entire repository (with all changes) on your machine. We will learn more about these parts as we go along.

Let us start with getting a Remote Repository

In order to allow a playfield for this class I created a repository that we are going to use. In order to get it onto you local machine type the following commands in the Terminal:

#Navigate to a suitable directory
git clone https://{YOUR ID}@git.uibk.ac.at/c702169/ulg22_playground.git

This will perform the following two actions:

  1. Checkout the content of the remote repository into the working directory . By default this is the name of the repository, in your case the folder ulg22_playground is created and all files are put there.

  2. A copy of the remote repository is stored in the local repository . For all intended purposes, it acts exactly the same as the Remote Repository, with the sole exception that is not shared with others.

 Clone a remote repository to your machine
Clone a remote repository to your machine

Adding content

With the following snippet you can view the content of the repository (the second line is the response):

> ls ulg22_playground
python_ex1 README.md

As you can see there is currently only a README.md and the folder python_ex1.

Now lets add our solutions of the exercises we did in Python to the repository.

First we copy the file to exercise directory

cp PATH/TO/YOUR/FILE/solution_ex1.py python_ex1/{YOUR ID}.py

In my case the command reads like this:

> cp ../Exercises/reference_solution.py python_ex1/ID.py
'../Exercises/reference_solution.py' -> 'python_ex1/ID.py'

This has we modified our working directory. In order to get an idea what Git thinks about this lets run git status in the working directory:

On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	python_ex1/ID.py

nothing added to commit but untracked files present (use "git add" to track)

Lets break down the output of the command. First, Git tells you on what branch you are on (we will hear more about that later), second, that local repository is different from the remote repository and it states that you have tracked and untracked files.

Now, a tracked files is a file that is part of the repository and Git is keeping track of what is happening to it. An untracked file on the other hand, is a file that is in the same directory but it is not managed by Git.

Git tells you how to change the status of your untracked file into a tracked file.

We do this by running git add python_ex1/ID.py.

Now it is time to introduce another Git concept, namely the staging area . The stating area is the curious white spot between your working directory and the local repository in the above pictures. This is the place where Git collects all the changes to your files that you want to put into the local repository.

 Staging area with the changes that can be moved to the repository
Staging area with the changes that can be moved to the repository

By rerunning git status we get

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   python_ex1/ID.py

Now that we are satisfied that all our changes are in the staging area we are ready to commit the changes to the local repository. This is done by running git commit. A text editor will open and you are able to write a message telling everybody what you just did in the repository. Usually you will also see the changes you are about to commit. The commit message is something really important and the message should be meaningful and readable as it will be later used by others to understand your action. There are several ways to do this and it boils down to what you team wants but here are some links on good commit messages:

The same can be achieved directly in the Terminal by writing

> git commit -m "feat: add my solution for python exercises 1"
[main 578e48f] feat: add my solution for python exercises 1
 1 files changed, 85git insertions(+)
 create mode 100644 python_ex1/ID.py
 Committing changes the repository
Committing changes the repository

In the above message you can see that your commit get some more meta data. Specifically, it gets a SHA-1 hash, namely 578e48f The hash is used to keep track of your commits and is one of the ground breaking ideas that makes Git so successful. The hash is much longer but due to its nature in most cases already its first seven digits will make it unique.

Any changes done to a file after running git add <file> will not be part of a commit. If they should be included you need to rerun git add <file>.
My submitting an empty commit message you can abort a commit.

Now the changes are in the local repository and you can continue working. In order to share your changes with others you need to get them to the remote repository. This is done by pushing the changes. We do this by calling git push witch gives us an output similar to:

Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 16 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 1.17 KiB | 1.17 MiB/s, done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
To https://git.uibk.ac.at/c102338/ulg22_playground.git
   e8f13f4..0b8a494  main -> main
 Pushing changes from the local to the remote repository
Pushing changes from the local to the remote repository

Now, if you would do this on your own everything is working and your are happy. Unfortunately, as we are doing this in a class and all at the same time we will run into some difficulties. After all, this is a crash course for Git, eventually something hat to crash.

Some of you might get the following message for git push:

To https://git.uibk.ac.at/c102338/ulg22_playground.git
 ! [rejected]        main -> main (non-fast-forward)
error: failed to push some refs to 'https://git.uibk.ac.at/c102338/ulg22_playground.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

This gives us the opportunity to talk about how to get changes from the remote repository to your local repository after the initial clone. Unfortunately, in order to do this properly we need to talk about branches first.

Branches

The word branch was mentioned several times before but not explained. The main idea is rather simple.

If you consider having one commit after the other in a long chain like the trunk of a tree a branch is the same as for a tree:

 A branch in a chain of commits
A branch in a chain of commits
In short, whenever multiple commits are based on the same commit they (and all following commits) form different branches.

By default, git always operates on branches. When we cloned the remote repository we also cloned its branches and we started working on the main branch. You can go back and check the messages, it is always there.

Now without knowing we created a branch. It is not visible to us but it is clear from the point of the remote repository.

There are several ways of integrating or merging two branches back into one.

For now we will only talk about the most elegant and simplest way, with a rebase.

Naturally, every branch is based on a commit. In the above example 9a98eb2 is based on e8f13f4. As the name suggests, rebase simple changes this base. This gives us a clean way of how the entire Git commit chain is supposed to be read. One way of performing a rebase we will see shortly. But first, lets talk about how to

Integrating remote changes into your local environment

The above error message gives us already a hint on what to do but lets make it more structured.

By running a

> git status
On branch main
Your branch and 'origin/main' have diverged,
and have 1 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)

nothing to commit, working tree clean

we can see that the remote and the local repository have different commits.

With git fetch you can get changes from the remote repository into the local repository. This is the other way around as with the git push command.

 Fetching changes from the remote to the local repository
Fetching changes from the remote to the local repository

The important part here is, that this does not effect your working directory as the changes are only synchronized with the local repository and when you try to push again you will see the same message. It does not even effect your local branches, it will only make sure that all of the remote branches are synchronized.

Pulling

In order to affect the working directory and your local branches, we need to pull the changes in. This is done with git pull.

As we have some conflicts we need to define a strategy how to deal with them. At the moment we only know one so let us use

> git pull --rebase
Successfully rebased and updated refs/heads/main.

This should have worked for everybody as all of you added different files to the repository and the Git tree looks something like this:

 After pulling and rebasing
After pulling and rebasing

Next we will see what happens if we modify some files.

Modifying content in a repository

A good start to do this is to link our uploaded file form the README contained in the repository.

With your favourite editor add the following content next to your ID (btw. this is markdown syntax):

# ulg22_playground

## List of submitted python exercises 

| Name/UID    | File        |
| ----------- | ----------- |
| ID     | [my upload](python_ex1/ID.py) |

If we check with git status we can see that README.md is modified

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   README.md

no changes added to commit (use "git add" and/or "git commit -a")

Of course this is only a change in our working directory and not in either of the two repositories. Before we add the changes to the local repository we can use git diff to see what we actually changed.

diff --git a/README.md b/README.md
index d9f0acb..9ac3846 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 
 | Name/UID    | File        |
 | ----------- | ----------- |
-| ID  | |
+| ID  | [my upload](python_ex1/ID.py)|
 | ID1 | |
 | ID2 | |
 | ID3 | |

We already know the next steps, add, commit, and push.

So lets recall, with git add README.md we move the file into the staging area. Note: If you run git diff now, the output is empty. This is because, git diff only works on the changes in your working directory. You can still get the diffs from your staging area with git diff --staged (some editors will use this if you type up your commit message).

Now, before we commit, we decide to modify README.md again. Maybe we made a typo or we just really want to nail this hand in so we change it, maybe we add

| ID     | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|

to make it clear we know what we are doing.

If we run git status we see the following

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	modified:   README.md

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   README.md

which tells us that README.mdis modified and staged.

If we run a git diff again we get

diff --git a/README.md b/README.md
index 9ac3846..28dba4b 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 
 | Name/UID    | File        |
 | ----------- | ----------- |
-| ID  | [my upload](python_ex1/ID.py)|
+| ID  | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|
 | csad3581 | |
 | csak1512 | |
 | csak4299 | |

which show the changes to the staging area. If we are satisfied with our changes, we can use git add README.md again to add the file to the staging area and finally commit it with git commit. Of course we do this with a meaningful commit message.

Depending on your timing, you might have to fetch and pull in changes to your local repository. By the way, you can directly call git pull, without first calling git fetch, the fetch is done implicitly. But we should not get a conflict as everybody changed a different line.

Conflicts and how to resolve them

It will not always be this smooth and conflicts occur. For example when two commits with the same base make changes to a single line. We simulate this by simply copying our local repository and working directory - either with a new git clone or by copying the directory.

For this, we assume that in location A we changed README.md to

| Name/UID    | File        |
| ----------- | ----------- |
| ID     | [my upload](python_ex1/ID.py) |

and we commit and push this change to the remote repository.

À> git diff
diff --git a/README.md b/README.md
index d9f0acb..28dba4b 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 
 | Name/UID    | File        |
 | ----------- | ----------- |
-| ID  | |
+| ID  | [my upload](python_ex1/ID.py)|
 | ID1 | |
 | ID2 | |
 | ID3 | |

A> git add README.md
A> git commit -m "add my exercise sheet"
[main a16b809] add my exercise sheet
 1 file changed, 1 insertion(+), 1 deletion(-)
A> git push
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 16 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 769 bytes | 769.00 KiB/s, done.
Total 6 (delta 1), reused 0 (delta 0), pack-reused 0
To https://git.uibk.ac.at/c102338/ulg22_playground.git
   0b8a494..a16b809  main -> main

Now in location B we do not get the changes from the remote repository but modify README.md to

| Name/UID    | File        |
| ----------- | ----------- |
| ID     | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|

and we try to do the same as before:

B> git diff
diff --git a/README.md b/README.md
index d9f0acb..28dba4b 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
 
 | Name/UID    | File        |
 | ----------- | ----------- |
-| ID  | |
+| ID  | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|
 | ID1 | |
 | ID2 | |
 | ID3 | |
B> git add README.md
B> git commit -m "add my exercise sheet"
[main d9ac598] add my exercise sheet
 1 file changed, 1 insertion(+), 1 deletion(-)
B> git push
To https://git.uibk.ac.at/c102338/ulg22_playground.git
 ! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'https://git.uibk.ac.at/c102338/ulg22_playground.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

This is to be expected so we pull in the changes from remote as we learned with rebase:

B> git pull --rebase
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
error: could not apply d9ac598... add my exercise sheet
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply d9ac598... add my exercise sheet

As per usual, Git is quite helpful and tells you what to do. We have several options

  1. solve the conflicts, add the files and continue the rebase.

  2. skip our our commit d9ac598, so do not apply these changes

  3. abort the procedure

We opt for 1. and take a look with git diff

diff --cc README.md
index 12ee10e,28dba4b..0000000
--- a/README.md
+++ b/README.md
@@@ -4,7 -4,7 +4,11 @@@
  
  | Name/UID    | File        |
  | ----------- | ----------- |
++<<<<<<< HEAD
+ | ID  | [my upload](python_ex1/ID.py) |
++=======
+ | ID  | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|
++>>>>>>> d9ac598 (add my exercise sheet)

HEAD is the latest commit in the chain of commits on the current branch on the remote repository. So we see:

  • what HEAD brings in <<<<<<< HEAD,

  • the end of the changes is marked with =======

  • and what we want to push out ended by a line >>>>>>> d9ac598 together with the commit message.

This will be repeated for every conflict in the file.

If the conflicts are more elaborate and connected it is good to use a tool to sort it out. Your favourite IDE will most likely come with some toll or you look at specific Git tools for conflict resolution.

For us it is simple. We just want the file to look like

| ID  | [my upload](python_ex1/ID.py) run it by calling `python3 python_ex1/ID.py`|

so we make these changes and call

B> git add README.md
B> git rebase --continue
[detached HEAD 3c6b1e6] add my exercise sheet, and make a conflict resolution
 1 file changed, 1 insertion(+), 1 deletion(-)
Successfully rebased and updated refs/heads/main.

This will prompt us to write a commit message. Lets use add my exercise sheet, and make a conflict resolution.

Stashing

There is one more case we need to have a look at. What if we made some changes to a file, are not ready to make a commit yet, but need to pull in some changes coming from the remote? Another scenario would be that something in the repository needs urgent fixing so we need to switch back to a clean copy without loosing our current work. Git gives us the possibility to deal with these situations with yet another area and the git stashcommand.

> git status
On branch main
Your branch is behind 'origin/main' by 1 commits (non-fast-forward).
  (use "git pull" to update your local branch)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   python_ex1/ID.py

no changes added to commit (use "git add" and/or "git commit -a")

> git diff
diff --git a/python_ex1/ID.py b/python_ex1/ID.py
index f5b74fc..22038a5 100644
--- a/python_ex1/ID.py
+++ b/python_ex1/ID.py
@@ -61,6 +61,11 @@ print(f"Accuracy of pi with N =    100:   {get_accuracy(   100):8.5f}")
 print(f"Accuracy of pi with N =   1000:   {get_accuracy(  1000):8.5f}")
 print(f"Accuracy of pi with N = 100000:   {get_accuracy(100000):8.5f}")
 
+# ----------------------------------------------
+# Alternative Implementation for (2)
+# ----------------------------------------------
+points = np.random.uniform(0, 1, [2, N])
+
 
 # ----------------------------------------------
 # (3) Gaussian density

We have a dirty working directory as we just started to work on an alternative implementation for (2) but we are 1 commit behind the remote. With git stash push we can tell git to put all the changes aside for us and keep them safe (an optional message can be added). After pulling the remote changes back in we can finally reapply our stashed work by calling git stash pop (this will apply the lates stash, in case we have several).

Here it is as image and in the bash.

 Stashing changes and pulling in from remote, see numbers for order
Stashing changes and pulling in from remote, see numbers for order

> git stash push
Saved working directory and index state WIP on main: 4e76603 update list of participants

> git status
On branch main
Your branch is behind 'origin/main' by 1 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)

nothing to commit, working tree clean

> git pull --rebase
Successfully rebased and updated refs/heads/main.

> git stash list
stash@{0}: WIP on main: 4e76603 update list of participants

> git stash pop
On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   python_ex1/ID.py

no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (d31654c50cfffd2b3f4b931ccf07b5c8362d365a)

> git diff
diff --git a/python_ex1/ID.py b/python_ex1/ID.py
index f5b74fc..22038a5 100644
--- a/python_ex1/ID.py
+++ b/python_ex1/ID.py
@@ -61,6 +61,11 @@ print(f"Accuracy of pi with N =    100:   {get_accuracy(   100):8.5f}")
 print(f"Accuracy of pi with N =   1000:   {get_accuracy(  1000):8.5f}")
 print(f"Accuracy of pi with N = 100000:   {get_accuracy(100000):8.5f}")
 
+# ----------------------------------------------
+# Alternative Implementation for (2)
+# ----------------------------------------------
+points = np.random.uniform(0, 1, [2, N])
+
 
 # ----------------------------------------------
 # (3) Gaussian density

With git stash list we can view different stashes.

History

The last thing we look into is the history. Of course it is possible to look what happened in the repository. With git log we can do this

B> git log
commit 3c6b1e63504e5f46d80d50d7188a4d5303a7aa86 (HEAD -> main)
Author: {Your Name} <{Your email}>
Date:   Fri Oct 14 08:29:36 2022 +0200

    add my exercise sheet, and make a conflict resolution

commit a16b809ce95b319180373c6b0c00647f2a6539f4 (origin/main, origin/HEAD)
Author: {Your Name} <{Your email}>
Date:   Fri Oct 14 08:25:14 2022 +0200

    add my exercise sheet

commit 0b8a49431b40aed9903d1ec6b76c243c20613b92
Author: {Your Name} <{Your email}>
Date:   Sun Oct 9 15:59:42 2022 +0200

    feat: add my solution for python exercises 1

Note: Most likely we will see way more commits here as our fellow students make some commits as well.

So this is the official log of the repository but sometimes it is nice to see more, especially what happened when. Maybe we messed up a rebase and our changes are missing or something similar. As Git was build with fail safes in mind it has you covered there. What we want to look at is reflog

B> git reflog
3c6b1e6 (HEAD -> main) HEAD@{0}: rebase (continue) (finish): returning to refs/heads/main
3c6b1e6 (HEAD -> main) HEAD@{1}: rebase (continue): add my exercise sheet, and make a conflict resolution
a16b809 (origin/main, origin/HEAD) HEAD@{2}: pull --rebase (start): checkout a16b809ce95b319180373c6b0c00647f2a6539f4
d9ac598 HEAD@{3}: commit: add my exercise sheet
5673f78 HEAD@{4}: commit: list of student ids
0b8a494 HEAD@{5}: pull --rebase (finish): returning to refs/heads/main
0b8a494 HEAD@{6}: pull --rebase (start): checkout 0b8a49431b40aed9903d1ec6b76c243c20613b92
9a98eb2 HEAD@{7}: commit: feat: add my solution for python exercises 1
a0f8f01 HEAD@{8}: clone: from https://git.uibk.ac.at/c102338/ulg22_playground.git

This command shows us what happened in our local repository.

Further stuff

There is much more to see and do but this concludes the absolute basics. You will learn way more when you work with Git for some time some topics that you will come across are:

  • The .gitignore file

  • More elaborate work with branches

  • Merging of branches

  • Cherry picking

  • Reverting commits

  • git blame to find out where this line of code comes from

  • and so much more

CC BY-NC-SA 4.0 Peter Kandolf. Last modified: January 19, 2024. Website built with Franklin.jl and the Julia programming language.