What is Git?
What is Git?
So, in a nutshell, what is Git? This is a crucial subject to grasp because if you know what Git is and how it works, utilizing Git properly will be lot easier. As you study Git, try to put other VCSs, such as CVS, Subversion, or Perforce, out of your mind; this will help you avoid minor misunderstandings while using the tool. Despite the fact that Git’s user interface is similar to that of these other VCSs, Git saves and thinks about data in a fundamentally different way, and understanding these distinctions can help you avoid being confused when using it.
The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These other systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they store as a set of files and the changes made to each file over time (this is commonly described as delta-based version control).
Git does not think or store data in this manner. Git considers its data to be a collection of snapshots of a small filesystem. Every time you commit, or save the state of your project, Git takes a snapshot of all your files at that particular instant and keeps a reference to that snapshot. If files haven’t changed, Git only stores a link to the last identical file it has previously stored, in order to save space. Git considers its data to be a series of snapshots.
This is a key contrast between Git and almost every other VCS. It forces Git to rethink nearly every component of version control that most other systems have adopted from the previous generation. This transforms Git into a small filesystem with some quite strong tools built on top of it, rather than just a version control system. When we examine Git branching in Git Branching, we’ll go through some of the advantages of thinking of your data this way.
Git Has Integrity
Before being saved, everything in Git is checksummed, and that checksum is then used to refer to it. This implies that changing the contents of any file or directory without letting Git know is impossible. This feature is integrated into Git from the ground up and is key to its concept. You can’t lose information in transit or have file corruption without Git noticing.
A SHA-1 hash is the method that Git employs for checksumming. This is a 40-character string made up of hexadecimal characters (0–9 and a–f) that is generated from the contents of a Git file or directory structure. The following is an example of a SHA-1 hash:
Because Git makes extensive use of hash values, you’ll encounter them all over the place. In reality, Git saves everything in its database by the hash value of its contents, not by file name.
Git Generally Only Adds Data
Almost every operation you perform in Git adds data to the Git database. It’s difficult to get the system to perform anything irreversible or to wipe data in any way. You can lose or mess up updates you haven’t committed yet, just like with other VCS, but once you commit a snapshot into Git, it’s quite impossible to lose it, especially if you routinely push your database to another repository.
This makes using Git enjoyable since we know we can explore without risking catastrophic failure. Undoing Things provides a more detailed look at how Git keeps data and how to retrieve data that appears to be gone.
The Three States
Pay close attention now since this is the most important thing to know about Git if you want the remainder of your learning to go smoothly. Your files can be in one of three stages in Git: edited, staged, or committe
- Modified means that you have changed the file but have not committed it to your database yet.
- Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
- Committed means that the data is safely stored in your local database.
The working tree, the staging area, and the Git directory are the three primary portions of a Git project.
The working tree is a single checkout of a single project version. These files are extracted from the Git directory’s compressed database and saved to disk for you to use or alter.
The staging area is a file that keeps information about what will be in your future commit and is usually found in your Git directory. The “index” is its precise name in Git speak, but “staging area” works just as well.
Git saves the metadata and object database for your project in the Git directory. This is the most crucial element of Git, because it’s what gets copied when you clone a repository from another machine.
The basic Git workflow goes something like this:
- You modify files in your working tree.
- You selectively stage just those changes you want to be part of your next commit, which adds only those changes to the staging area.
- You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.
It’s considered committed if a particular version of a file is in the Git directory. It is staged if it has been changed and added to the staging area. It is also modified if it has been altered after it was checked out but has not been staged. You’ll learn more about these phases in Git Basics, as well as how to use them or skip the staged section completely.