20 March 2025 The Cost of Abstraction or: How to Write Modular Code in Assembly ----------------------- INTRODUCTION Typical advice for writing "clean" code often emphasises creating more abstractions. Consider the following examples: * "Decompose functions until they each do one thing" means creating more functional/procedural abstractions. * "Pass state explicitly rather than using global variables" requires grouping state into abstract 'Context' datastructures. * "Encapsulate your datatypes" means designing abstract interfaces to interact with your datatype I acknowledge abstraction is a powerfull tool, but I argue that it have associated costs which are too often overlooked by developers trying to write clean code by the conventional advice. In this article I shall make explicit four of these costs and discuss some illustrative examples for each. By designing abstractions with these potential costs in mind, I believe that their effects can be mitigated. And although it is a very sweeping claim to make, I believe that most software invents too many abstractions. I think if developers were more aware of the downsides of creating new abstractions then there would be fewer. 1. EXPRESSIVENESS An abstraction can be thought of as a binary relation from a set of objects in the abstract space and a set of objects over which we abstracting. If the relation is not surjective, then expressiveness is lost by incompleteness since not every base object has an abstract representation. If the relation does not have a retraction when we consider its domain to be limited the image then expressiveness is lost by ambiguity because an exact representation in the base space cant be uniqly identified by the abstract space. To make this clearer lets use the C programming lanugage as an example. Consider a binary relation from the set of all valid C compilation unit source text and the set of all valid object files. There is an arrow iff the source text could compile into the object file according to the C standard. There exist object files for which there is no corresponding C program [1] so expressiveness is lost be incompleteness. And object files cant be uniqly expressed with a C source program [2] so expressiveness is lost by ambiguity. There are some aespects of an object file which the C programmer has no control over. When you are designing an abstraction, be aware of what properties of the base object you have lost to ambiguity or incompleteness. 2. OBSCURITY 3. PERFORMANCE 4. BOILERPLATE SUMMARY NOTES [1] In x86_64, the INT 3 'SIGTRAP' instruction is used by debuggers to set breakpoints. The C standard has no way to write this instruction, so any object file containing it cant be expressed in C due to incompleteness. [2] For example because you could always insert no op instructions to get another object file corresponding to a C source. OOP is the exempla. Only make abstraction when benifit outweighs this cost. In assembly the cost is so great that you will be going in circles if you stick to the advice. The costs of abstraction are: * expressiveness (markdown over html, glfw tablet input) * obscurity (Concise shell script or verbose python script, rust macros, OOP) * performance (High level languages, XML Parser, if two cstrings match then print their length, xlib) * boilerplate (OOP, Asm) Building interfaces & abstractions is hard. A few good abstractions is better than many shit ones.