Experimental Compiler Toolchain & Language

Temporarily named Viper

Purpose

We wish to create a compiler toolchain for an experimental systems programming language. The main purpose of the project is to experiment with compiler toolchain technologies while also exploring potential useful language design ideas for the compiled language itself.

Some examples of the kinds of ideas being pursued are:

How can we express the syntax of the compiled language so that it is easy to change and extend in the compiler while maintaining acceptable runtime performance.
- Is it possible (and desirable) to decouple the parser from the tokenizer?
- How effective is single pass tokenizing vs lazy tokenizing?
  - This can affect the parser (somewhat), so what are the tradeoffs here?

Building the toolchain

Building the toolchain from source requires CMake.

This is currently being built on MacOS with C++ 20, but we have working builds on Ubuntu and WSL as well. Theoretically, this should compile on Windows, but it is untested, and that assumption has failed me too many times to be confident.

We recommend using the utility script for building/running any targets.

./go build --<debug | release> toolchain

This will build the toolchain in either release mode or with debug symbols.

Running a built toolchain

Similarly to building, you can do:

./go run --<debug | release> toolchain

This will run the toolchain.

To make life easier, you can create a symlink to the built binary:

# Symlink the debug build
ln -s build_debug/toolchain/toolchain
# Or symlink the release build
ln -s build_release/toolchain/toolchain

Then rather than using the go utility script you can just

./viper ...

Commands

To run the lexer:

./viper lex -f input.viper

or

./viper lex --file input.viper

To run the tests

./viper test

About the toolchain

The compiler toolchain architecture is largely inspired by the Carbon Compiler.

One of the main endeavors with this experiment is to see how we can create pieces of this toolchain and make them express the desired compiled language well. For example, a very ideal pattern for expressing syntax tokens would be:

class Token
{
    ...
};

class EqualToken : public Token<EqualToken> {...};
class PlusToken : public Token<PlusToken> {...};

// Done at compile time
TokenMap()
    .add_token("=", EqualToken)
    .add_token("+", PlusToken)
    ...

Where to add a new token you can just make another class, add it to the map, and the lexer handles the rest. This allows for ergonomic tweaking of the language while keeping runtime performance.

For example, the Carbon toolchain uses the X-macro pattern for this behavior.

However, this can mean various places in the source code that have to change in order to change things around.

At times, this is very pleasant to work with, but it can also decouple what is actually happening when defining a new type (or whatever else).

Right now, we have something that works well:

static constexpr TokenSpec spex = TokenSpec::specify()
    .addKeyword(TokenSpecInfo("let", TokenKind::Let))
    .addKeyword(TokenSpecInfo("return", TokenKind::Return))
    .identifierCanStartWith('_')
    .identifierCanStartWithLower()
    .identifierCanStartWithUpper()
    .identifierCanInclude('_')
    .identifierCanIncludeLower()
    .identifierCanIncludeUpper()
    .identifierCanIncludeNumeric()
    .addSymbol(TokenSpecInfo("->", TokenKind::MinusGreater))
    .addSymbol(TokenSpecInfo("(", TokenKind::LeftParen))
    .addSymbol(TokenSpecInfo(")", TokenKind::RightParen))
    .addSymbol(TokenSpecInfo("==", TokenKind::EqualEqual))
    .addSymbol(TokenSpecInfo("+=", TokenKind::PlusEqual))
    .addSymbol(TokenSpecInfo("/=", TokenKind::ForwardSlashEqual))
    .addSymbol(TokenSpecInfo("*=", TokenKind::AsteriskEqual))
    .addSymbol(TokenSpecInfo("+", TokenKind::Asterisk))
    ...
;

This works well for defining the valid characters and patterns that symbols, keywords, and identifiers should have. This is only a start though.

If there is a way to have an easily expressible system for tokenizing that relies heavily on compile-time execution while being flexible, we can hopefully make a more C++-like pattern rather than using macros. (Or maybe this is all just coping).

Concept Language

The language itself is not fleshed out, and as this is experimental it will greatly vary over time.

Here are base-level concepts for the language as of now:

Structures:

struct Point
{
    x: i32;
    y: i32;
}

Functions:

define my_function() -> Point
{
    return Point {
        x: 0,
        y: 0
    };
}

Entry point to program:

// Entry function
@entry
define main() -> i32
{
    return 0;
}

Modifiers

#[printable, debug_printable]
struct Point
{
    x: i32;
    y: i32;
}

define test() -> void
{
    let p: Point = Point {
        x: 0,
        y: 0
    };

    p.print();
}

// This is a pure function
define my_function(p: Point&) -> void
{
    // Will not compile
    p.method_that_mutates();

    // Non-modifying, produces no side-effects. Compiles fine
    p.print();
}

// This is an impure function
@impure
define my_function_2(p: Point&) -> void
{
    // Marked impure, compiles fine now
    p.method_that_mutates();

    // Non-modifying, produces no side-effects. Compiles fine
    p.print();
}

Unions function similar to Rust tagged-unions/enums:

union Token
{
    Keyword(std::string),
    Equal,
    Plus,
    Minus,
    Slash,
    ...
}

Other language concepts:

Explicit mutability
- All variables are immutable by default, and mutability is explicit in the syntax (via mut keyword)
Trait-like system
- This is not fleshed out, but the trait system in Rust is great at defining shared behaviors without inheritence. As this language is largely designed with data-driven models in mind, we want to know what we can take from that system to make data-driven work easy and ergonomic.
Pure functions by default
- We are experimenting with all procedures being pure by default. To make an impure function you use the explicit @impure directive (syntax subject to change).
- The goal is to see what is achievable with this pattern, and is this useful at all.
- Obviously, no pure function can call an impure function while guaranteeing that no side-effects occur, so we have a function-flagging problem here. We wish to see what the effects of this are.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
common		common
tests		tests
toolchain		toolchain
.clang-tidy		.clang-tidy
.clangd		.clangd
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
go		go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experimental Compiler Toolchain & Language

Purpose

Building the toolchain

Running a built toolchain

Commands

About the toolchain

Concept Language

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Experimental Compiler Toolchain & Language

Purpose

Building the toolchain

Running a built toolchain

Commands

About the toolchain

Concept Language

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages