Skip to content

Commit

Permalink
Merge pull request #913 from rpjday/objects
Browse files Browse the repository at this point in the history
Tweaks (grammar/sentence structure) to first part of "Git Objects"
  • Loading branch information
ben authored Nov 1, 2017
2 parents 3cf6874 + 1c5fa82 commit 3f0242e
Showing 1 changed file with 18 additions and 15 deletions.
33 changes: 18 additions & 15 deletions book/10-git-internals/sections/objects.asc
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@ Git is a content-addressable filesystem.
Great.
What does that mean?
It means that at the core of Git is a simple key-value data store.
You can insert any kind of content into it, and it will give you back a key that you can use to retrieve the content again at any time.
To demonstrate, you can use the plumbing command `hash-object`, which takes some data, stores it in your `.git/objects` directory (the _object database_), and gives you back the key the data is stored as.
What this means it that you can insert any kind of content into a Git repository, for which Git will hand you back a unique key you can use later to retrieve that content.

First, you initialize a new Git repository and verify that there is nothing in the `objects` directory:
As a demonstration, let's look at the plumbing command `git hash-object`, which takes some data, stores it in your `.git/objects` directory (the _object database_), and gives you back the unique key that now refers to that data object.

First, you initialize a new Git repository and verify that there is (predictably) nothing in the `objects` directory:

[source,console]
----
Expand All @@ -23,18 +24,20 @@ $ find .git/objects -type f
----

Git has initialized the `objects` directory and created `pack` and `info` subdirectories in it, but there are no regular files.
Now, store some text in your Git database:
Now, let's use `git hash-object` to create a new data object and manually store it in your new Git database:

[source,console]
----
$ echo 'test content' | git hash-object -w --stdin
d670460b4b4aece5915caf5c68d12f560a9fe3e4
----

The `-w` tells `hash-object` to store the object; otherwise, the command simply tells you what the key would be.
`--stdin` tells the command to read the content from stdin; if you don't specify this, `hash-object` expects a file path at the end.
The output from the command is a 40-character checksum hash.
This is the SHA-1 hash – a checksum of the content you're storing plus a header, which you'll learn about in a bit.
In its simplest form, `git hash-object` would take the content you handed to it and merely return the unique key that _would_ be used to store it in your Git database.
The `-w` option then tells the command to not simply return the key, but to write that object to the database.
Finally, the `--stdin` option tells `git hash-object` to get the content to be processed from stdin; otherwise, the command would expect a filename argument at the end of the command containing the content to be used.

The output from the above command is a 40-character checksum hash.
This is the SHA-1 hash -- a checksum of the content you're storing plus a header, which you'll learn about in a bit.
Now you can see how Git has stored your data:

[source,console]
Expand All @@ -43,13 +46,13 @@ $ find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
----

You can see a file in the `objects` directory.
This is how Git stores the content initially as a single file per piece of content, named with the SHA-1 checksum of the content and its header.
If you again examine your `objects` directory, you can see that it now contains a file for that new content.
This is how Git stores the content initially -- as a single file per piece of content, named with the SHA-1 checksum of the content and its header.
The subdirectory is named with the first 2 characters of the SHA-1, and the filename is the remaining 38 characters.

You can pull the content back out of Git with the `cat-file` command.
Once you have content in your object database, you can examine that content with the `git cat-file` command.
This command is sort of a Swiss army knife for inspecting Git objects.
Passing `-p` to it instructs the `cat-file` command to figure out the type of content and display it nicely for you:
Passing `-p` to `cat-file` instructs the command to first figure out the type of content, then display it appropriately:

[source,console]
----
Expand Down Expand Up @@ -78,7 +81,7 @@ $ git hash-object -w test.txt
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
----

Your database contains the two new versions of the file as well as the first content you stored there:
Your object database now contains both versions of this new file (as well as the first content you stored there):

[source,console]
----
Expand All @@ -88,7 +91,7 @@ $ find .git/objects -type f
.git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4
----

Now you can revert the file back to the first version
At this point, you can delete your local copy of that `test.txt` file, then use Git to retrieve, from the object database, either the first version you saved:

[source,console]
----
Expand All @@ -106,7 +109,7 @@ $ cat test.txt
version 2
----

But remembering the SHA-1 key for each version of your file isn't practical; plus, you aren't storing the filename in your system just the content.
But remembering the SHA-1 key for each version of your file isn't practical; plus, you aren't storing the filename in your system -- just the content.
This object type is called a _blob_.
You can have Git tell you the object type of any object in Git, given its SHA-1 key, with `cat-file -t`:

Expand Down

0 comments on commit 3f0242e

Please sign in to comment.