20 March 2025

The Cost of Abstraction or: How to Write Modular Code in Assembly
-----------------------

INTRODUCTION

Typical advice for writing "clean" code often emphasises creating more
abstractions. Consider the following examples:

  * "Decompose functions until they each do one thing"
    means creating more functional/procedural abstractions.

  * "Pass state explicitly rather than using global variables"
    requires grouping state into abstract 'Context' datastructures.

  * "Encapsulate your datatypes"
    means designing abstract interfaces to interact with your datatype 

I acknowledge abstraction is a powerfull tool, but I argue that it have
associated costs which are too often overlooked by developers trying to
write clean code by the conventional advice. In this article I shall make
explicit four of these costs and discuss some illustrative examples for each.
By designing abstractions with these potential costs in mind, I believe that
their effects can be mitigated.  And although it is a very sweeping claim
to make, I believe that most software invents too many abstractions. I think
if developers were more aware of the downsides of creating new abstractions
then there would be fewer.


1. EXPRESSIVENESS

An abstraction can be thought of as a binary relation from a set of objects
in the abstract space and a set of objects over which we abstracting. If the
relation is not surjective, then expressiveness is lost by incompleteness
since not every base object has an abstract representation. If the relation
does not have a retraction when we consider its domain to be limited the image
then expressiveness is lost by ambiguity because an exact representation in
the base space cant be uniqly identified by the abstract space.

To make this clearer lets use the C programming lanugage as an example.
Consider a binary relation from the set of all valid C compilation unit
source text and the set of all valid object files. There is an arrow iff the
source text could compile into the object file according to the C standard.
There exist object files for which there is no corresponding C program [1]
so expressiveness is lost be incompleteness. And object files cant be uniqly
expressed with a C source program [2] so expressiveness is lost by ambiguity.
There are some aespects of an object file which the C programmer has no control
over. When you are designing an abstraction, be aware of what properties of
the base object you have lost to ambiguity or incompleteness.


2. OBSCURITY


3. PERFORMANCE


4. BOILERPLATE


SUMMARY

NOTES

[1] In x86_64, the INT 3 'SIGTRAP' instruction is used by debuggers to set
    breakpoints. The C standard has no way to write this instruction, so any
    object file containing it cant be expressed in C due to incompleteness.
[2] For example because you could always insert no op instructions to get
    another object file corresponding to a C source.


OOP is the exempla.
Only make abstraction when benifit outweighs this cost.
In assembly the cost is so great that you will be going in circles if you stick
to the advice.
The costs of abstraction are:
  * expressiveness (markdown over html, glfw tablet input)
  * obscurity (Concise shell script or verbose python script, rust macros, OOP)
  * performance (High level languages, XML Parser, if two cstrings match then print their length, xlib)
  * boilerplate (OOP, Asm)
Building interfaces & abstractions is hard. A few good abstractions is better
than many shit ones.