In the previous issue we began discussion of left joins.  Today I’d like to talk about the spectrum between narrow and wide tables, and how both kinds of tables are connected via left join.

Add WebAssembly, get performance. Is that how it really works?

The incredibly unsatisfying answer is: It depends. It depends on oh-so-many factors, and I’ll be touching on some of them here.

With all the talk of using Rust to reduce memory unsafety bugs, such as Android using Rust in the Android Open Source Project, there’s a lot of extremely reasonable concern about the high cost of “rewriting it all in Rust” (or any other safer language), as it’s often phrased. Operating systems, web browsers, complex online services, and so on can be implemented with tens of millions of lines of C/C++ code. (Sometimes more.) Rewriting all that seems prohibitively expensive, and exacerbates what Alex Gaynor aptly calls grief — people stay in the denial stage longer when struck by the enormity of the memory unsafety problem.

Thankfully, replacing C/C++ with code in a safer language is not an all-or-nothing task. We can do it gradually; some parts we might never need to replace. Most safer languages can link in the same address space as C and/or C++, and call into and be called by C/C++. You can also normalize data structures such that the safe code handles arbitrary inputs, and the C/C++ code can focus on a single, simpler grammar.

Read More

People appear to enjoy the blog posts about me porting different compilers to OpenBSD. What I would like to do for the next couple of posts beginning with this one is to take a step back and de-complexify these programs. Both the D compiler and the GNU Modula-2 compiler are highly complex pieces of software. But at their core they are the exact same thing: a program that can create programs. We need not explore something so complex in order to learn how to create a program of our own that creates programs. In this series of blog posts, we will create two programs that will help us demystify programs that create programs: first, in this blog post, we will create a disassembler, or a program that reads a program and produces a higher-level representation (assembly); second, in a couple of subsequent blog posts, we will create an assembler, a program that understands that higher-level assembly language and produces a program from it.

Read More

The LLVM project is a modular set of tools that make designing and implementing a compiler significantly easier. The most well known part of LLVM is their intermediate representation; IR for short. LLVM’s IR is an extremely powerful tool, designed to make optimization and targeting many architectures as easy as possible. Many tools use LLVM IR; the Clang C++ compiler and the Rust compiler (rustc) are both notable examples. However, despite this unified architecture, code generation can still vary wildly between implementations and how the IR is used. Some time ago, I stumbled upon this tweet discussing Rust’s implementation of clamping compared to C++…

Read More

As developers, there is so much we can learn to improve our skills, ranging from deep theory, to small practical tidbits of information. It can be overwhelming at times and we are forced to pick what we want to learn deeply and what we just want to gloss over. You can’t learn it all!

The rise of a new generation of low-level programming languages like Rust, Go and Zig has caused C and its primitive type system to fall into some disrepute. Nonetheless, with sufficient creativity it is possible to achieve surprisingly sophisticated results in C. One such result is generic data structures. This post reviews two techniques for implementing generic data structures in C: unsafely using raw memory and pointer casts, and safely using code generation through macros.1

The story behind this article is very simple, I wanted to learn about new C++20 language features and to have a brief summary for all of them on a single page. So, I decided to read all proposals and create this “cheat sheet” that explains and demonstrates each feature. This is not a “best practices” kind of article, it serves only demonstrational purpose. Most examples were inspired or directly taken from corresponding proposals, all credit goes to their authors and to members of ISO C++ committee for their work. Enjoy!

Read More

Overview of design decisions.

In this paper I won’t shy away from bold statements. This necessarily means that I am relying on informal concepts, I hope to make these concepts clear enough to be modeled mathematically.

None of these ideas are my own, I have no idea. This is just a general direction that I can see things moving into and my attempt at finding what their convergence would look like at the limit.

To that end data lisp is work that will forever be in progress. The concept of a “data interchange format” is clearly fundamental and hopefully understood by all. Any solution is therefore necessarily incomplete and must be designed for adaptation.

The largest inspiration and fundamental primitive is the idea of a “propagator network” (henceforth “prop-net”). This is an idea heralded by Sussman (Scheme, Art of the Propagator, SICP, Designing Software for Flexibility), Arntzenius (created datafun which is the direct inspiration to this project along with statebox which is where I learned everything about concurrency) and Kmett (Haskell lens library, guanxi, variousHaskell interpretations of category theory) as well as others.

We will start from fundamental assumptions and work towards an implementation.

Read More

Thoughts on the Elixir language and its Phoenix framework after two years of professional work.

I am a seasoned web developer (working primarily with Ruby on Rails before) and someone who got an opportunity to work on a commercial Elixir project. During the past 2 years, I wrote a lot of Elixir, which I had to learn from scratch. I always found this kind of personal post interesting, so I figured I would write one for you.

Many people must have heard this quote (by Phil Karlton) many times: There are only two hard things in Computer Science: cache invalidation and naming thing. Two days ago, Nick Tierney mentioned it again in his post “Naming Things”. Since he said he was not sure what cache invalidation meant, and I have a tiny bit experience here, I want to write this short post to explain why cache invalidation is hard from my experience.

I have never contributed to nginx. My C skills are 110. But downloading the source, hacking it up, compiling it, and running it doesn’t scare me. This post is to help you overcome your own fears about doing so. Not necessarily because you should be running out-of-tree diffs in production but because I see a lot of developers never even consider looking at the source of a big tool or dependency they use.

Most of all, studying mature software projects is one of the best ways to grow as a programmer.

Read More

Since C++11, we have a && in the language, and it can take some time to understand its meaning and all the consequences this can have on your code.

We’ve been through a detailed explanation of lvalues, rvalues and their references, which covers a lot of ground on this topic.

But there is one aspect that we have to talk about: what does auto&&, X&&, or even int&& means in code:

Since its creation, C++ has become one of the most widely used programming languages in the world. Well-written C++ programs are fast and efficient. The language is more flexible than other languages: It can work at the highest levels of abstraction, and down at the level of the silicon. C++ supplies highly optimized standard libraries. It enables access to low-level hardware features, to maximize speed and minimize memory requirements. Using C++, you can create a wide range of apps. Games, device drivers, and high-performance scientific software. Embedded programs. Windows client apps. Even libraries and compilers for other programming languages get written in C++.

One of the original requirements for C++ was backward compatibility with the C language. As a result, C++ has always permitted C-style programming, with raw pointers, arrays, null-terminated character strings, and other features. They may enable great performance, but can also spawn bugs and complexity. The evolution of C++ has emphasized features that greatly reduce the need to use C-style idioms. The old C-programming facilities are there when you need them, but with modern C++ code you should need them less and less. Modern C++ code is simpler, safer, more elegant, and still as fast as ever.

The following sections provide an overview of the main features of modern C++. Unless noted otherwise, the features listed here are available in C++11 and later. In the Microsoft C++ compiler, you can set the /std compiler option to specify which version of the standard to use for your project.

Read More

Copy elision is a C++ compiler optimization that, as its name suggests, eliminates extra copy and move operations. It is similar to the classical copy propagation optimization, but specifically performed on C++ objects that may have non-trivial copy and move constructors. In this post, I’ll walk through an example where an obvious optimization you might expect from your compiler doesn’t actually happen in practice.

C has many misnomers concerning keywords. Here I give a table of possible keywords and convenient macro names that might replace them. New keywords in future C standards usually start with an underscore and a capital letter. Macros in new header files may be just convenient names. (For an example take the inclusion of _Bool and bool into C99).

This is not completely serious, but it might give you an idea of the different concepts. It also should emphasize the fact that none of these keywords is ignored by a conforming C compiler. If you have any ideas of other keywords or other possible namings, please let me know.

Read More

“signed char lotte” is a computer program written by Brian Westley and the winner of the “Best Layout” award in the 1990 International Obfuscated C Code Contest. The cleverness of the text is staggering. Superficially it reads as an epistolary exchange between two (possibly former) lovers, Charlotte and Charlie. At the same, it is an executable piece of code whose action is thematically related to its story.

It has been argued that code is not literature, and that it cannot be “read” in a straightforward way. “signed char lotte” is not a counterexample to this. The text is essentially a palimpsest, with the lovers’ storyline written over and obscuring the code that governs the executable behavior. The former can be “read”, but not the latter.

Read More