Building a Lisp-style format in JavaScript, Part 2: The Implementation

In Part 1 of this series, we laid the groundwork for our project by defining a clear specification for a JavaScript-based, Lisp-inspired format function. We defined our core principles and the exact behavior of our initial set of directives: ~a, ~%, ~~, ~{...~}, and ~[...~].

Now, it’s time to bring that specification to life. In this post, we’ll dive deep into the implementation, focusing not just on making it work, but on building it in a clean, modular, and extensible way.

Architecting for Extensibility

It would be easy to write our function with a giant switch statement, but that would quickly become a maintenance nightmare as we add more features. Instead, we’ll adopt a more robust architecture based on a few key concepts:

A Parser State Object: We’ll bundle the entire state of our parsing process—the string, the arguments, our current position, and the result—into a single state object. This keeps our function signatures clean.
A Directive Handler Registry: This is the core of our extensible design. We’ll create a simple JavaScript object that maps directive characters (like 'a' or '{') to the functions that handle them. To add a new directive, we’ll just add a new entry to this object.
A Central Engine Loop: The main format function will be a simple loop. Its only job is to walk through the string, find a tilde (~), look up the corresponding handler in our registry, and delegate the work to it.

This “pluggable” architecture separates the “what” from the “how” and is a powerful pattern for building interpreters and parsers.

A Note on State: The Engine’s Logic Flow

The name ParserState is deliberate. Our function is a simple state machine. However, it’s not a formal Finite State Machine (FSM) with many explicit, named states (e.g., READING_TEXT, SAW_TILDE). Instead, our state object acts as a single context object—a snapshot of the entire process at any given moment.

The logic is a simple loop that makes a decision based on the current character. Here is a diagram illustrating that flow:

                      +-------------------+
                      |   Start `format`  |
                      | (Create `state`)  |
                      +-------------------+
                               |
                               v
                      +-------------------+
                      | Loop: Has string  |
                      |   ended? (Y/N)    |----+
                      +-------------------+    |
                          | (No)               | (Yes)
                          v                    |
            +---------------------------+      |
            | Read char at `state.i`    |      |
            | Is it a tilde '~'? (Y/N)  |      |
            +---------------------------+      |
                  | (Yes)        | (No)        |
                  v              v             |
  +--------------------------+  +------------+ |
  | Look up directive char   |  | Append char| |
  |   in `directiveHandlers` |  | to `result`| |
  +--------------------------+  +------------+ |
                  |                  |         |
                  v                  |         |
      +------------------------+     |         |
      | Found? -> Call Handler |-----+         |
      | Not Found? -> Print ~  |               |
      +------------------------+               |
                                               v
                                         +-----------+
                                         | End Loop  |
                                         | Return    |
                                         | `result`  |
                                         +-----------+

As the diagram shows, the engine is in one of two main “modes” on each iteration: “appending text” or “handling a directive.” The state object is passed to the handlers, which can then modify it (e.g., by consuming arguments or advancing the string pointer) before returning control to the main loop. This is a very pragmatic and effective way to manage the parsing process.

The Code: The Engine and its Helpers

Let’s start with the skeleton of our format function and the helper that finds matching delimiters for our structural directives.

/**
 * Finds the matching closing delimiter for a structural directive (e.g., ~{...~}).
 * This is crucial for handling nested structures correctly.
 */
function findMatchingDelimiter(state, openChar, closeChar) {
    let depth = 1;
    // Start searching from the character *after* the opening directive
    for (let j = state.i + 1; j < state.str.length; j++) {
        if (state.str[j] === '~') {
            const nextChar = state.str[j + 1];
            if (nextChar === openChar) {
                depth++;
            } else if (nextChar === closeChar) {
                depth--;
                if (depth === 0) {
                    return j; // Found it!
                }
            }
            j++; // Skip the directive character itself
        }
    }
    return -1; // No match found
}

/**
 * The main `format` engine.
 */
function format(formatString, ...args) {
    const state = {
        str: formatString, // The string being parsed
        args: args,        // The arguments list
        i: 0,              // Current index in the format string
        argi: 0,           // Current index in the arguments list
        result: "",        // The accumulated output string
        modifier: null,    // For handling ':' and '@'
    };

    while (state.i < state.str.length) {
        if (state.str[state.i] === '~') {
            state.i++; // Move past the '~'
            state.modifier = null;

            // Check for modifiers like ':'
            if (state.str[state.i] === ':') {
                state.modifier = ':';
                state.i++;
            }

            const char = state.str[state.i];
            const handler = directiveHandlers[char];

            if (handler) {
                handler(state); // Delegate to the registered handler
            } else {
                // Unrecognized directive, so we print it literally.
                state.result += '~';
                if (state.modifier) state.result += state.modifier;
                state.result += char;
            }
        } else {
            state.result += state.str[state.i];
        }
        state.i++;
    }

    return state.result;
}

The Directive Handler Registry

This is where we define the behavior for each directive. Each function takes the state object and is responsible for appending to state.result and updating the string and argument pointers (state.i and state.argi).

const directiveHandlers = {
    /** ~a - Aesthetic: Print argument as a human-readable string. */
    'a': (state) => {
        if (state.argi < state.args.length) {
            state.result += String(state.args[state.argi++]);
        }
    },

    /** ~% - Newline. */
    '%': (state) => {
        state.result += '\n';
    },

    /** ~~ - Literal tilde. */
    '~': (state) => {
        state.result += '~';
    },

    /** ~{...~} - Iteration. */
    '{': (state) => {
        const end = findMatchingDelimiter(state, '{', '}');
        if (end === -1) {
            state.result += '~{'; // Unmatched, print literally.
            return;
        }

        const loopBody = state.str.substring(state.i + 1, end);
        const list = state.args[state.argi++];
        if (Array.isArray(list)) {
            // For each item in the list, recursively call `format`.
            // The item itself becomes the sole argument for that execution.
            for (const item of list) {
                state.result += format(loopBody, item);
            }
        }
        // Advance parser past the entire ~{...~} block
        state.i = end + 1;
    },

    /** ~[...]~] and ~:[...]~] - Conditional. */
    '[': (state) => {
        const end = findMatchingDelimiter(state, '[', ']');
        if (end === -1) {
            state.result += '~['; // Unmatched, print literally.
            return;
        }

        const clausesStr = state.str.substring(state.i + 1, end);
        const clauses = clausesStr.split(/~;/g);
        const selector = state.args[state.argi++];
        let clauseIndex = -1;

        // Check if the ':' modifier is active
        if (state.modifier === ':') {
            clauseIndex = (selector === false || selector === null || selector === undefined) ? 0 : 1;
        } else if (typeof selector === 'number') {
            clauseIndex = selector;
        }

        if (clauseIndex >= 0 && clauseIndex < clauses.length) {
            // Recursively format the chosen clause with the rest of the arguments.
            state.result += format(clauses[clauseIndex], ...state.args.slice(state.argi));
        }

        // Advance parser past the entire ~[...]~] block
        state.i = end + 1;
    }
};

Notice the beautiful recursion: for both iteration and conditionals, we call format again on a substring. This allows our directives to be nested to any depth, for free!

The Final Result

With all the pieces assembled, we now have a fully functional and extensible format engine. It meets all the requirements of our specification and is ready for future enhancements. For instance, adding a directive for hexadecimal output (~x) is now as simple as adding a new three-line function to the directiveHandlers object.

This journey shows that by investing a little time in architecture, we can turn a simple tool into a powerful and maintainable engine.

Get the Full Code

For those who want to dive in, experiment, and use this engine in their own projects, we’ve prepared the complete source code. You can find both the extensible JavaScript version discussed in this post, as well as a fully-typed TypeScript version for enhanced robustness.

JavaScript Implementation: format.js
TypeScript Implementation: not yet

Feel free to use them, learn from them, and extend them to fit your needs. Happy formatting

Building a Lisp-style format in JavaScript, Part 1: The Specification

Blog Archive

Archive of all previous blog posts