Garrett Martin

September 24, 2024

Ubik Devlog #4

Welcome to the fourth devlog! If you're new here, Ubik is a distributed bug tracker/CI tool that I've been working on. In this week's installment, I'll be sharing a couple of small features and a quick recap of Git Merge.

Issue labels


Labels are pretty standard fare when it comes to issue tracking software. Ubik now allows you to add labels to an issue and filter issues by their label (as well as their status).

If you're familiar with the Bubbletea TUI framework, you can add this kind of custom filtering logic without creating your own list component. Here's the snippet that makes it work:

// the method that satisfies the list.Item interface from
// github.com/charmebracelet/bubbles/list

func (i Issue) FilterValue() string {
	labels := strings.Join(i.Labels, " ")
	return fmt.Sprintf("%s\n%s\n%s", i.Title, labels, i.Status)
}

// the custom filter that wraps the default filter with some
// extra logic, satisfies the list.FilterFunc interface
func LabelFilter(term string, targets []string) []list.Rank {
	var labelTargets []string
	labelTerm := strings.TrimPrefix(term, "label:")

	for _, t := range targets {
		labelsPart := strings.Split(t, "\n")[1]
		labelTargets = append(labelTargets, labelsPart)
	}

	return list.DefaultFilter(labelTerm, labelTargets)
}

// then you can set the custom filter on the list model:

issueList.Filter = IssueFilter

I've thought about adding first-class support for something like projects, but I'm using labels as a lightweight alternative in the meantime.

Distributed shortcodes


Centralized services like Github assign incrementing integers to each issue. This makes them simple to reference, "#45 should fix this bug", for example. With no centralized server, Ubik can't take the same approach; if you and I both created issue #5 in our respective clients without each other's knowledge, those issues would have conflicting identifiers.

Ubik takes a different approach:

  1. Assign a UUID to each record. This is the primary key that Ubik uses under the hood.
  2. Base64 encode the first six bytes of the uuid, producing 8 eight URL-safe characters.
  3. Take the first six characters of the base64-encoded string and use that as the "shortcode" that can be referenced like, "#uU0nuZ should fix this bug"

I'm making some trade-offs here: six-character shortcodes are not as readable and easy to use as incrementing integers, but they're much better than using raw UUIDs to reference records. UUIDs are extremely resistant to collisions; the shortcoces sacrifice quite a bit of that collision resistance in the name of ergonomics.

Just how likely is a collision between shortcodes? When I researched the question, I came across the "birthday problem". It's a famous problem in probability theory that asks, "how many people need to be in a room before there's a 50% chance that two of them share the same birthday"?

Turns out, it's only 23! 23 people makes 253 possible pairs ((23*22)/2). Each pair has a 364/365 chance of not matching. So the probability that all pairs don't match is (364/365)^253 ≈ 50%

So, let's turn this into the "shortcode problem": a six-character string in base64 gives us 64^6 = 68,719,476,736 unique possibilities. For the sake of this example, let's say we have 100,000 records with shortcodes - what's the chance that we get a shortcode collision? Here's some pseudo-code:

m = 64^6 # the number possible combinations
n = 100000

1 - ((m - 1) / m) ^ (n(n-1)/2)

# which comes out to

1 - ((68719476736 - 1) / 68719476736) ^ 4999950000

So there's about a 7% chance that two shortcodes collide in a collection of 100,000 issues. That's not great, but I'm comfortable enough with that number for now since I don't think I'll create that many issues for any of my repos. For reference, the official Rails codebase has about 17,000 total issues.

If an instance of Ubik is dealing with 100k+ records, I can easily add some logic to take advantage of the other two remaining characters in the base64 encoded string, which would just create longer shortcodes for new records. I think that's a fair to trade a bit of readability for collision resistance at that point.

If you're reading this and thinking, "there is a much better and easier way to do this" or "your math is wrong", please let me know!

Git Merge

IMG_6045.jpeg


I flew to Berlin this week to attend Git Merge, a small git-focused conference. I learned a ton about how git works under the hood and that there are some really cool projects that have some overlapping goals to Ubik, like Radicle, GerritForge, Sapling, and gittuf. What the conference made clear to me is that the scope for creating good tools on top of git is massive. Especially when it comes to the user experience of code review. That's going to be a big part of Ubik and I have a ton of work to do to understand the existing options outside of Github's pull request model, which I'm not satisfied with.

I met many fantastic programmers that were generous with their time and advice (thanks especially to Scott, Caleb, Lars, Alexis, and Aditya). It was intimidating, but the whole experience has me excited to continue working on Ubik.