📝 Git & GitHub

How Git stores data 💾

0
Author
04e5cc8b-58ac-4bdc-bdee-661bbb
📅
Published
06.05.2026
⏱️
Reading time
3 min
👁️
Views
50
🌱
Level
Beginner

Have you ever wondered how Git works so fast? How it stores the entire project history while taking up so little space? Let’s look under the hood!

The magic of Git: snapshots, not diffs

Most version control systems store changes (deltas):

File v1: "Hello"
Change 1: +6 characters " World"
Change 2: +1 character "!"

Git works differently. It takes snapshots of the entire project:

Commit 1: full project snapshot
Commit 2: full project snapshot (with changes)
Commit 3: full project snapshot

How does it work?

1. Hashing (SHA-1)

Git converts every file into a unique hash (checksum):

# File: hello.txt contains "Hello World"
# Git computes the SHA-1 hash:
557db03de997c86a4a028e1ebd3a1ceb225be238

If the file hasn’t changed — the hash is identical!
If even one character changed — a completely different hash.

2. Git objects

Git stores 4 types of objects:

1. Blob (Binary Large Object)
- The file contents
- Pure data, no filename attached

2. Tree
- A directory in the filesystem
- A list of files (blobs) and subdirectories (trees)

3. Commit
- A snapshot of the project at a point in time
- Points to a tree
- Points to the parent commit
- Contains author, date, message

4. Tag
- A named label for a commit
- For example, “v1.0.0”

Example: how Git stores a commit

Consider a simple project:

my-project/
├── README.md
└── src/
    └── main.py

What Git creates:

BLOB for README.md
  hash: abc123...
  content: "# My Project\n..."

BLOB for main.py
  hash: def456...
  content: "print('Hello')"

TREE for src/
  hash: ghi789...
  main.py -> def456...

TREE for root
  hash: jkl012...
  README.md -> abc123...
  src -> ghi789...

COMMIT
  hash: mno345...
  tree: jkl012...
  parent: previous commit
  author: "Vasya <vasya@example.com>"
  date: "2026-04-10 15:00:00"
  message: "Add README"

Saving space: deduplication

The clever part: if a file hasn’t changed between commits, Git does NOT create a new copy!

Commit 1:
  README.md -> blob abc123

Commit 2 (only main.py changed):
  README.md -> blob abc123 (THE SAME blob!)
  main.py -> blob xyz999 (new blob)

Result: massive storage savings!

Compression and pack files

Over time, Git additionally compresses objects into pack files:

  • Similar files are compressed together
  • Older versions of files are stored as deltas (diffs)
  • This happens automatically

Advantages of Git’s approach

✅ Speed

All operations are local:
- Viewing history — instant
- Switching branches — seconds
- Comparing versions — fast

✅ Integrity

Every object is identified by its hash:
- Impossible to alter the past without detection
- Any data corruption is caught immediately
- History is cryptographically protected

✅ Compactness

Thanks to deduplication and compression:
- Many versions of files take up little space
- You can store the full project history

✅ Distribution

Every clone is a complete copy:
- Full project history
- All branches
- All tags

Where does Git store data?

Everything lives in the .git/ directory:

.git/
├── objects/      # Blob, tree, commit objects
├── refs/         # Pointers to branches and tags
├── HEAD          # Current branch
├── index         # Staging area
└── config        # Configuration

Practical example

# Create a file
echo "Hello Git" > test.txt

# Add to staging
git add test.txt

# Git created a blob object!
# You can inspect its contents:
git cat-file -p abc123...

# Make a commit
git commit -m "Add test"

# Git created:
# - blob for test.txt
# - tree for the root
# - commit object

Interesting facts

🔍 SHA-1 collisions:
- Theoretically possible
- In practice, the probability is negligible
- Git is transitioning to SHA-256

📦 Size of the .git directory:
- Typically 10–30% of the project size
- Linux kernel: ~3 GB of code, ~1.5 GB of .git
- 20+ years of history in just 1.5 GB!

🚀 Speed:
- git log — instant (local database)
- svn log — seconds (server request)

Takeaways

Git is smart because:

  1. ✅ It stores snapshots, not deltas
  2. ✅ It uses hashing for identification
  3. ✅ It deduplicates unchanged files
  4. ✅ It compresses data automatically
  5. ✅ It works locally (fast!)

Now you understand why Git is so fast and efficient! 🚀

Your reaction to the article

💬 Comments (0)

🔐 Sign in to leave a comment
🚪 Login
💭

No comments yet

Be the first to share your opinion about this article!

🔗 Similar

Similar articles

Continue learning with these materials

📝

Git Hosting Platforms: Full Comparison 🏆

GitHub, GitLab, Bitbucket — which one to choose? A complete comparison with up-to-date data.

📅 06.05.2026 👁️ 51
📝

What Is a Git Commit and Why Do You Need It? 📸

A commit is a saved snapshot of your project at a specific point in time...

📅 06.05.2026 👁️ 55
📝

Why Git won over every other version control syst…

Today Git is the de facto standard for version control in software development. But it...

📅 06.05.2026 👁️ 50

Did you like the article?

Subscribe to our updates and receive new articles first. Grow with PyLand!