Ntwali B.

Reflections on command line interface tools

Despite the ubiquity the graphical user interface the command line interface is still very much in use, and clearly not going anywhere. It behoves us then to design and implement them in such a way that they are easy and intuitive to use. This post elaborates on some of my thoughts on how to accomplish this.

Published

02 October 2025

Introduction
Core concepts
Design principle
Implementation details
- Core concepts implementation
- Shell completion
Beyond commands: natural language
Conclusion

Introduction

While the graphical user interface (GUI) is the default user interface on most consumer computer systems, the command line interface (CLI) is still the default user interface on most server systems. The CLI is also still very much used by programmers and those who need to automate certain processes.

It is therefore important that CLI tools are designed and implemented in such a way that:

They are intuitive: prior knowledge of a few commands should transfer to new commands.
They are predictable: related commands should behave similarly.
They are reliable: commands should protect against errors and the system should provide ways to recover from errors.

Over time, I noticed a lot of inconsistencies in many command line tools that I have used until I started working more with modern tools such as Kubernetes, Git, Docker, etc. To illustrate these three criteria above, let’s take kubectl as an example.

`kubectl` is intuitive

When learning Kubernetes, to list pods, I just typed kubectl get pods. If I wanted to create a pod, I would write a YAML manifest describing the pod then call kubectl create -f. When I want to get details about a pod, you could have guessed it: kubectl describe pod. This is clearly intuitive.

`kubectl` is predictable

Now when I got to learn about nodes, deployments, services, and so on, the same command structures were transferable. If I want to create a deployment, I just call kubectl create -f. I want to see deployments: kubectl get deployments. I want to get details about a deployment: kubectl describe deployment. The commands the user already knows about pods transfer to working with deployments. In fact, predictability also helps a lot in reducing the learning curve: when the user realize they can get and describe nodes but they can’t create them, it serves as a learning aid about the architecture of Kubernetes itself! In short, kubectl is predictable.

`kubectl` is (somewhat) reliable

Where possible, kubectl tries to prevent the user from making errors. For instance, if ones doesn’t want a pod that’s already been created to be updated, one uses kubectl create -f: if the pod specified in the manifest already exists, this command will fail. On the other hand, if one wants to create or update the command kubectl apply -f will do the trick. Any resource created with kubectl create can be removed with kubectl delete allowing users to recover from erroenous creations. Since Kubernetes deployments are versioned, the user can also rollback a deployment to one they desire. And last, Kubernetes provides audit logs to see what happened and when to help recover from possible errors. I say kubectl is somewhat reliable because it sometimes can be a tad involved to recover from errors: there is not single undo subcommand that would somehow rollback the last action.

The purpose of this blog post is to share my thoughts about ways to build CLI tools as good as kubectl, and possibly even better.

We will start with core concepts to build reliable CLI tools. Then we will move to discussing ways to build intuitive and predictable commands. Then we will talk about implementation details. For this purpose we will use Python’s Click library. Last we will talk about what modern language models can enable in order to make our users’ lives easier where and when possible.

Core concepts

Our starting point is to look to functional programming for inspiration: we require that each command be referentially transparent. Adopting ideas from functional programming is not new, Nushell takes the same approach.

Referential transparency is a worthy goal to aim for: if you know that each command will always give the same ouput for a particular set of inputs, things like caching become automatic without much effort. Unfortunately this goal is not feasible: using kubeclt create -f then kubectl get pods will return different results if the order was reversed. So in reality, we can’t truly have referential transparency.

We will therefore relax the condition with idempotency: we require that the same command with the same inputs invoked twice in row shall return the exact same output.

Formally we call this condition strong idempotency and we define it as:

Definition: Let $c$ be a command. Let $(x_1, x_2, \dots, x_n)$ be the list of inputs passed to the command. The command $c$ is idempontent if $c(x_1, x_2, \dots, x_n) \circ c(x_1, x_2, \dots, x_n) = c(x_1, x_2, \dots, x_n)$.

This nototation simply means that we execute $c(x_1, x_2, \dots, x_n)$ twice in a row and the results remain $c(x_1, x_2, \dots, x_n)$, as if $c(x_1, x_2, \dots, x_n)$ was executed once.

Idempotency in mathematics vs our definition

Note that we don't use the actual definition of idempotency used in mathematics: idempotency in mathematics relies on function composition which is not what we are doing here. That is, in mathematics, a function $f$ is idempotent if $f(f(x))=f(x)$, or in other words $f \circ f = f$. This is not what we are doing here, we just borrowed the word.

For example in our kubectl example, kubectl get pods called twice in a row should return the same list of pods.

But once more, this condition is still too strong. Consider the following scenario: we call kubectl get pods, someone else calls kubectl create -f then when we can call kubectl get pods again, we won’t get the same result. Instead, we will pursue a weaker form of idempotency. To make this happen, we require that each command returns a reference to the output along with the output itself.

Formally we define weak idempotency as follow:

Definition: Let $c$ be a command. Let $r$ be a reference to the output of $c$, that is $c(x_1, x_2, \dots, x_n)=r$. We say that $c$ is weakly idempotent if $r^2 = r$.

In retrospect this is obvious: if we think of $r$ an identifier of the results obtained from $c(x_1, x_2, \dots, x_n)$, there is no reason to expect that $r$ should reference a different result than that which it was initially assigned to.

This solves our problem of interleaving create and get: we can use the reference to get previous results.

In practice, we can still get strong idempotency from weak idempotency by calculating the reference value from the current input list and refusing to run the new command if the references match.

The real value of using references lies in two aspects:

It enables non-blocking commands: instead of waiting for a command to finish and not be forced to run it in the background, it can just return the reference against which the output can be checked later.
It enables histories: this will be key to reliability since having a history allows undoing previous commands just as kubectl rollout undo deployment allows us to undo deployments. Equivalently, having a history allows traceability: assume that results are stored and linked to their references. It becomes easy to pull the outputs of previously executed commands. And last, having a history enables auditing: if the implementation uses something like a hash list for the history, it becomes possible to know what commands were run, in what order without having to rely on the shell keeping track of said history. All this plays into building a reliable CLI tools.

To obtain reliability, we need the ability to perform three types of actions:

rollbacks: the ability to undo a previously executed command.
playbacks: the ability to execute commands from a point in time up to another point.
dry-runs: the ability to see what the outcome would be without committing the results to the backing system.

We explore those three types of actions in detail below.

Rollbacks

It does happen, if not often, that a command that performs a write operation (it changes the state of the system) was either poorly written or it was the wrong command and needs to be undone. This is an ability that it scarely present in CLI systems but generally expected in GUI systems.

That the ability to undo a command is rarely present in CLI system seems like a traversity to me. For instance, Github has a page on how to undo commands in Git (Wehner, 2015). And Stackoverflow is full of questions on other CLI tools asking how to undo a specific operation. It is therefore important that the new breed of CLI tools start thinking from the get-go how to make this functionality available to users.

The question now is how should this be implemented. In general, there two types of undo identified in the field of computer-human interaction: flip undo and backtrack undo (Mancini, 1997). Flip undo is definitely not application because it decays to a toggle (due to the strong reflexive property of undo under such a circurmstance, see the reference above).

We are therefore left with backtracking undo. This amounts to undoing commands one step at a time going backwards in the history. But the reader will note that many tools, such as git, allow to undo commands at practically any point in the history. So the undo subcommand can be allowed to take a reference to the command to rollback.

There are of course complications that can easily come with implementing this capability depending on the complexity of the system such as not wanting to leave the history in an inconsistent state or the actual system itself. This becomes important if the ability to do playbacks is provided.

It is wise then not to elaborate on this much more than what is said above but to invite the reader to think and consider implementing such a capability in their next project.

Playbacks

Let give a more or less formal definition of what a playback is so we are on the same page: a playback of previous executed commands between states of the system $s’$ at time $t-\Delta t$ and $s$ at time $t$ is the action of undoing commands from time $t$ to time $t-\Delta t$ effectively return the system to state $s’$ then running those commands again till time $t$ and reach the state $s$ again.

This can be a useful ability under some circumstances in order to reproduce a particular state of the system. It can be useful in audits, or to fulfill stability requirements.

Dry-runs

A dry-run consists of running a command and getting a result without changing the state of the system. This allows the user to first validate that the command they are about to run is safe before actually committing it to the system. It is important to note that dry-runs are not written to the history since they made no change to the underlying system. On the other hand, they can be logged.

For instance, the kubectl create subcommand has a --dry-run option which allows the user to first verify what resource will be created before it is actually submitted to the API server.

In conclusion rollbacks, playbacks, and dry-runs are useful tools to create reliable CLI tools that allow users to prevent, debug and recover from errors; intentional or not.

Design principle

There are many ways that CLI tools can be designed but I’ve come to favor the design of kubectl, and I am seeing other tools adopt the same design philosphy. It goes to say that this section is quite opiniated but I hope it is useful.

A command is structured as follows:

command <verb> <resource-type> <resource-name> [options]

A command is structured around resources. Each subcommand (verb) is made to manipulate a particular resource. In general, for any resource one may be working with, CRUD operations are expected to be supported. In fact, we are essentially reusing the same design philosphy of REST APIs.

With that in mind, I propose the following verbs (or their variants):

get: show a summary view of a resource.
describe: show a detailed view of a resource.
list: list resources of the given type.
create: create a new resource.
delete: delete the given resource.

Not all resources will provide all those verbs but they form a starting point about what can be expected. And of course more verbs can be added such as exec if the underlying system provides the ability to execute some task, and so on.

Each command, whether it failed or succeeded, should return a reference that can later be used either to get the status of the command that produced it (success, failure, pending, etc), or to get the results upon a successful completion, or last to get logs associated with that particular command execution.

There is no requirement that each command should return the reference as part of the result immediately but the reference should figure in the history.

This effectively makes command line tools a lot more intuitive and predictable to use since the user will focus more on learning about resources and how to work with them and less on particular idiosyncrasies of the tool.

Implementation details

There are many ways to implement the different verbs depending on the programming language and the library used to implement the tool. We will use Python’s Click library to accomplish our goal.

The general structure would be as follows:

import click

# Base CLI group
@click.group()
def cli():
    """Command group"""
    pass

# Verb
@cli.group()
def verb():
    """Verb group"""
    pass

# Resource type under verb
@verb.command()
@click.argument("resource_type")
@click.argument("resource_name")
@click.option("-o", "--option", default="default", help="Option")
def resource(resource_type, resource_name, *, option):
    """Verb implementation for resource_type given resource_name"""
    print(f"Verb called with resource_type: {resource_type}, resource_name: {resource_name}, option: {option}")

if __name__ == "__main__":
    cli()

Basic structure of a command: we illustrate a basic structure that puts in practice the design principles.

The most important part is to write actual subcommands (@verb.command()) in a such a way as to separate positional arguments and keyword arguments. As much as possible, positional arguments should be required arguments. If a value is not required, I have found it best to make it a positional argument. Moreover note the lack of using **kwags. This ensures that the function signature clearly shows what the subcommand expects.

Core concepts implementation

Click’s context object might be used to implement the history in a clean way.

Shell completion

Click provides a way to enable shell completion.

Beyond commands: natural language

Developers work with a variety of CLI tools on a day-to-day basis so it is natural to forget some commands or the particular option of a command and so on. I believe one good use of small language models will be to finetune them (or use RAG) on a particular tool and have that model shipped so users can switch between structured commands and natural language in case they quickly to know how to do something without necessarily reading the documentation or asking online.

I have been exploring this using a tool I call CLAI. Feedback is welcome!

Conclusion

This blog post is mostly my attempt to write down some thoughts I’ve had while developing CLI tools and using them so next time I need to write one, I can write a CLI tool that’s a bit better.

Wehner, J. (2015). How to undo (almost) anything with Git. https://github.blog/open-source/git/how-to-undo-almost-anything-with-git/
Mancini, R. (1997). Modelling Interactive Computing by Exploiting the Undo (Number IX-97-5) [Dottorato di Ricerca in Informatica]. Università degli Studi di Roma "La Sapienza".