"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > Mini-git, Understanding How Files Are Stored in Git Objects

Mini-git, Understanding How Files Are Stored in Git Objects

Published on 2024-08-24
Browse:664

Mini-git, Understanding How Files Are Stored in Git Objects

Yesterday, I set out to implement one of Git's core functionalities on my own—specifically, how files are stored, what Git objects are, and the processes of hashing and compressing. It took me 4 hours to develop, and in this article, I'll walk you through my thought process and approach.

What Happens When You Commit a File?

When you commit a file in Git, several important steps occur under the hood:

File Compression:

The content of the file is compressed using a zlib algorithm to reduce its size. This compressed content is what gets stored in the Git object database.

Hash Calculation:

A unique SHA-1 hash is generated from the compressed file content. This hash serves as the identifier for the file in the Git object database.

Storing the Object:

The object file is stored in the .mygit/objects directory, organized by the first two characters of the hash. This structure makes it easier to manage and retrieve objects efficiently.
Updating Commit Information:

To demonstrate how files are stored in git.
I have implemented commit functionality, taking one file in to consideration

  1. For every file, I have calculated hash
  2. Inside objects folder, new folder is created with name equal to first two characters of hash.
  3. And a file is created inside that folder with remaining hash as name.(this file stores the compressed format of committed file)
  4. Detected changes by comparing newly calculated hash and last calculated hash of the file

Detecting Changes

I implemented this algorithm based on my own approach, but Git uses more efficient algorithms for these operations.

  1. Extracted array of lines from oldContent and newContent
  2. Created a Map to store line as key and index as value
  3. Created two new arrays to store indexes of common lines in oldContent and newContent 4.eg: OldCommonarray = [0 , 3] then deleted lines will be [1,2]

GitHub Repo
Linkedin

Thanks a lot for you time.

Release Statement This article is reproduced at: https://dev.to/keerthivardhan1/mini-git-understanding-how-files-are-stored-in-git-objects-5bfb?1 If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3