A Universally Unique Identifier (UUID) is a 128-bit label used in computer systems to identify information uniquely. UUIDs are designed to be unique across space and time, allowing them to be generated independently without a central authority, minimising the risk of duplication.
UUIDs serve various purposes, including:
- Identifying records in databases.
- Tagging objects in distributed systems.
- Serving as primary keys in applications where uniqueness is critical.
Real-world Use Cases
-
Databases: UUID is used as the primary key in relational databases to ensure the unique identification of records.
-
Microservices: Facilitate service communication by providing unique identifiers for requests and resources.
-
IoT Devices: Identify devices uniquely in a network, ensuring that data from multiple sources can be aggregated without conflicts.
Advantages and Disadvantages in use of UUID
Advantages:
-
Global Uniqueness: UUIDs are extremely unlikely to collide, making them suitable for distributed systems where multiple nodes generate identifiers independently.
-
No Central Authority Required: They can be generated without coordination, which simplifies operations in distributed environments.
-
Scalability: They work well in systems that require scaling across multiple servers or services.
Disadvantages:
-
Storage Size: UUIDs consume more space (128 bits) compared to traditional integer IDs (typically 32 bits), which can lead to increased storage costs.
-
Performance Issues: Indexing UUIDs can degrade database performance due to their randomness and size, leading to slower query times compared to sequential IDs.
-
User Unfriendliness: UUIDs are not easily memorable or user-friendly when presented in user interfaces.
The Standard
The standard representation of a UUID consists of 32 hexadecimal characters divided into five groups, separated by hyphens, following the format 8-4-4-4-12, resulting in a total of 36 characters (32 alphanumeric plus 4 hyphens).
The UUID format can be visualized as follows:
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
Where:
-
M indicates the UUID version.
-
N indicates the variant, which helps interpret the UUID's layout.
Components of a UUID
-
TimeLow: 4 bytes (8 hex characters) representing the low field of the timestamp.
-
TimeMid: 2 bytes (4 hex characters) representing the middle field of the timestamp.
-
TimeHighAndVersion: 2 bytes (4 hex characters) that include the version number and the high field of the timestamp.
-
ClockSequence: 2 bytes (4 hex characters) used to help avoid collisions, especially when multiple UUIDs are generated in quick succession or if the system clock is adjusted.
-
Node: 6 bytes (12 hex characters), typically representing the MAC address of the generating node.
Types of UUIDs
Version 1: Time-based UUIDs that use a combination of the current timestamp and the MAC address of the generating node. This version ensures uniqueness across space and time.
Version 2: Similar to version 1 but includes local domain identifiers; however, it is less commonly used due to its limitations.
Version 3: Name-based UUIDs generated using an MD5 hash of a namespace identifier and a name.
Version 4: Randomly generated UUIDs that provide high randomness and uniqueness, with only a few bits reserved for versioning.
Version 5: Like version 3 but uses SHA-1 for hashing, making it more secure than version 3.
Variants
The variant field in a UUID determines its layout and interpretation. The most common variants include:
-
Variant 0: Reserved for NCS backward compatibility.
-
Variant 1: The standard layout used for most UUIDs.
-
Variant 2: Used for DCE Security UUIDs, which are less common.
-
Variant 3: Reserved for future definitions.
Example
For Version 4, a UUID might look like this:
550e8400-e29b-41d4-a716-446655440000
Here:
-
41d4 indicates it's a version 4.
-
a7 represents the variant, in this case, the common "Leach-Salz" variant.
How UUIDs are Calculated
-
Version 1 (Time-based):
- The timestamp is typically the number of 100-nanosecond intervals since October 15, 1582 (the date of the Gregorian calendar reform).
- The node is the MAC address of the machine generating the UUID.
- The clock sequence helps ensure uniqueness when the clock time changes (e.g., due to system restarts).
-
Version 3 and Version 5 (Name-based):
- A namespace (like a DNS domain) is combined with a name (like a file path or URL) and hashed.
- The hash (MD5 for version 3, SHA-1 for version 5) is then structured into a UUID format, ensuring the version and variant fields are properly set.
-
Version 4 (Random-based):
- Random or pseudo-random numbers are generated for the 122 bits of the UUID.
- The version and variant fields are set accordingly, ensuring compliance with UUID standards.
UUIDv4 Calculation Example
Step 1: Generate 128 Random Bits
Let's assume we generate the following 128-bit random value:
11001100110101101101010101111010101110110110111001011101010110110101111011010011011110100100101111001011
Step 2: Apply UUIDv4 Version and Variant
Version: Replace bits 12-15 (4th character) with 0100 (for UUID version 4).
Original: 1100 becomes 0100 → Updated value in this position.
Variant: Replace bits 6-7 of the 9th byte with 10 (for the RFC 4122 variant).
Original: 11 becomes 10 → Updated value in this position.
Step 3: Format into Hexadecimal
Convert the 128-bit binary into 5 hexadecimal groups:
- 32-bit group: 11001100110101101101010101111010 → ccda55ba
- 16-bit group: 1011101101101110 → b76e
- 16-bit group: 0100010101000101 → 4545 (with 0100 for version 4)
- 16-bit group: 1010110111110010 → adf2 (with 10 for the variant)
- 48-bit group: 11010011011110100100101111001011 → d39d25cb
Step 4: Combine the Groups
The final UUID would look like this:
ccda55ba-b76e-4545-adf2-d39d25cb