I’ve been thinking about versioning as a concept and arrived at a sufficiently concise explanation that seemed worth sharing. Let’s start with some simple definitions:
Definition 1
Version tags are compressed references to objects within a given semantic category, where the category is often (but not necessarily) scoped to a name or identifier.
As an example, "Big Sur"
and "Monterey"
are version tags in the category "macOS"
, whereas 95
and 98
are version tags in the category "Windows"
. You can compare, say, ("macOS", "Big Sur")
to ("Windows", 98)
, but comparing "Monterey"
to 95
without reference to the respective operating systems would be like comparing apples and oranges.
System designers may assert additional semantics around version tags. For instance, they may assert that version tags are always numeric, and that a larger number references a newer object within the semantic category. (As a side note, “newer according to whom?” is a good question to ask in any distributed system.)
Version tags are compressed references in the sense that the uncompressed alternative would be to the use the referenced object in its entirety. For instance, if you had the luxury of always being able to perform bit-by-bit comparisons of the objects referenced by two version tags, you would no longer need version tags.
Version tags may be assigned through a process that conforms to either (or both) of the following forms:
- 1. The version tag is derived from the referenced object.
- 2. The version tag is derived from contextual state.
In the first scheme (ex: SHA256 checksums as version tags), the system may enforce either one (or both) of the following invariants.
- 1A. Distinct objects likely1Why? Because there are no collision-free lossy hashing schemes. result in distinct version tags.
- 1B. Distinct version tags reference distinct objects.
In the second scheme (ex: monotonically increasing version numbers), the system may enforce the following invariant.
- 2A. There is a total ordering relationship (representing a notion of newness) across the set of version tags associated with objects in the given semantic category.
Interestingly, 1A and 2A may be entirely compatible. For instance, the version tag may be derived from the previous version tag supplied within the contents of the target object (scheme 1), but confirmed by checking that no new version tags have been concurrently created (scheme 2).
On the other hand, 1B and 2A are not compatible unless you’re willing to rewrite history in the process of creating new objects. To see an example of this incompatibility, imagine that you have an object of type Object
that you’ve chosen to version using the first scheme. Initially, the object does not exist, and [Step 1] you create a new one with content "foo"
(version tag = 1), which you later [Step 2] update to "bar"
(version tag = 2). If you later decide to [Step 3] update the content to "foo"
once again, the system would need to make a choice between the two invariants, either setting the version tag to 1 (maintaining 1B) or setting it to 3 (maintaining 2A).
On the other hand, if you are willing to rewrite history, you can drop the reference to "bar"
altogether, and thus not violate either invariant. The problem with this approach is that these references may have been published to external systems, and it’s not easy to have everyone rewrite history the same way.
However, there is one other way of rewriting history that might be more “principled”. That is to retain exactly one previous version in your history. At Step 3 in the example above, you would drop "foo"
with version tag 1, retain "bar"
with version tag 2, and create "foo"
with version tag 3. This scheme offers compatibility with 1A, 1B and 2A, but requires that you commit to tracking only one previous version!
That’s all for today, folks! 🖖