A common misconception about Bash is that function names must follow the same rules that variables do. The Bash manual even suggests this:

A word consisting solely of letters, numbers, and underscores, and beginning with a letter or underscore. Names are used as shell variable and function names. Also referred to as an identifier.

In fact Bash function names can be almost any printable character. For instance I can define my own pre-increment unary function:

So anybody who tells you that “consume is complicated” is wrong. Consume is not complicated. They’ve just chosen the wrong model to describe it.

Look, memory ordering pretty much is the rocket science of CS, but the C standards committee basically made it a ton harder by specifying “we have to make the rocket out of duct tape and bricks, and only use liquid hydrogen as a propellant”.

APL is an array-oriented programming language that will change the way you think about problems and data. With a powerful, concise syntax, it lets you develop shorter programs that enable you to think more about the problem you’re trying to solve than how to express it to a computer.

Programming languages typically make a distinction between normal program actions and erroneous actions. For Turing-complete languages we cannot reliably decide offline whether a program has the potential to execute an error; we have to just run it and see.

In a safe programming language, errors are trapped as they happen. Java, for example, is largely safe via its exception system. In an unsafe programming language, errors are not trapped. Rather, after executing an erroneous operation the program keeps going, but in a silently faulty way that may have observable consequences later on. Luca Cardelli’s article on type systems has a nice clear introduction to these issues. C and C++ are unsafe in a strong sense: executing an erroneous operation causes the entire program to be meaningless, as opposed to just the erroneous operation having an unpredictable result. In these languages erroneous operations are said to have undefined behavior.

Read More

I just spent more than two hours troubleshooting a seemingly simple HTML problem. When I copied and pasted a small section of HTML, the web browser displayed the newly-pasted section differently from the original. The horizontal spacing between some of the elements was slightly different, causing the whole page to look wrong. But how could this be? The two HTML sections were identical – the new one was literally a copy of the old one.

This simple-sounding problem defied all my attempts to explain it. I came up with lots of great theories: problems with my CSS classes, or with margins and padding. Mismatched HTML tags. Browser bugs. I tried three different browsers and got the same results in all of them.

Feeling very confused, I looked again at the two sections of HTML in the WordPress editor (text view), and confirmed they were exactly identical. Then I tried Firefox’s built-in web developer tools to view the page’s rendered elements, and compared all their CSS properties. Identical – yet somehow they rendered differently. I used the developer tools to examine the exact HTML received from my web server, checked the two sections again, and verified they were character-for-character identical. Firefox’s “page source” tool also confirmed the two sections were exactly identical.

Read More

Build systems were developed to simplify and automate running the compiler and linker and are an essential part of modern software development. This blog post is a precursor to future posts discussing our experiences refactoring the training projects to use the CMake build generator.

Like them or not, null-terminated strings are essential to C, and working with them is necessary in all but the most trivial programs. While C-style strings are a fundamental part of using the language, manipulating them is a common source of security bugs and lost performance. One of the most common operations is copying a string from one buffer to another, and there are a variety of string functions that claim to do this in C. Anecdotally, however, there is much confusion about what they actually do, and many people desire a string copying function with the following properties:

  1. The function should accept a null-terminated source string, a destination buffer, and an integer representing the size of the destination buffer.

  2. Upon return the function should ensure that the destination buffer points to a null-terminated string containing a prefix of the source string when possible (specifically, when the destination buffer has a non-zero size) to avoid issues in the future with unterminated strings. (While string truncation has its own issues, it is often a fairly reasonable fallback.)

  3. The function should indicate how many characters it copied from the source, as well as indicate if an overflow occurred. (This allows for dealing with the overflow, if desired.)

  4. The function should be efficient, and it should not read or write memory that it does not have to. These go partially hand-in-hand: the function should run in a single pass, not write to the destination buffer past the NUL byte it places, or read characters from the source string once it’s determined that it has filled the destination buffer. Ideally, the implementation would be vectorizable (relaxing some of the previous constraints slightly to within platform alignment guarantees).

  5. The function should be standardized, so that it may be used portably across systems. Conformance to ISO C or POSIX.1 are generally the most desirable.

That is, what is often necessary is the function below, which we’ll call strxcpy:

Read More

In January 2020, I told two members of Racket’s core team that I would no longer be contributing to Racket or partic­i­pating in the Racket commu­nity. Why? Because of a history of inten­tional, person­al­ized abuse and bullying directed at me by another member of the Racket core team: Matthias Felleisen.

In the first article in this series on developing for Apple Silicon Macs using assembly language, I built a simple framework AsmAttic to use as the basis for developing ARM assembly language routines. In that, I provided a short and simple demonstration of calling an assembly routine and getting its result. This article starts to explain the mechanics of writing your own routines, by explaining the register architecture of ARM64 processors.

Haskell offers ample opportunities for ah ha! moments, where figuring out just how some function or feature works can unlock a whole new way of thinking about how you write programs. One great example of an ah-ha moment comes from when you can first start to understand fixed points, why you might want to use them, and how exactly they work in haskell. In this post, you’ll work through the fixed point function in haskell, building several examples along the way. At the end of the post you’ll come away with a deeper understanding of recursion and how haskell’s lazy evaluation changes the way you can think about writing programs.

If you already have some experience with haskell, you may want to skip the first section and jump directly into learning about fix

Read More

There are about six major conceptualizations of memory, which I’m calling “memory models”², that dominate today’s programming. Three of them derive from the three most historically important programming languages of the 1950s — COBOL, LISP, and FORTRAN — and the other three derive from the three historically important data storage systems: magnetic tape, Unix-style hierarchical filesystems, and relational databases.

These models shape what our programming languages can or cannot do at a much deeper layer than mere syntax or even type systems. Mysteriously, I’ve never seen a good explanation of them — you pretty much just have to absorb them by osmosis instead of having them explained to you — and so I’m going to try now. Then I’m going to explain some possible alternatives to the mainstream options and why they might be interesting.

Read More

While others may see Rust and Go as competitive programming languages, neither the Rust nor the Go teams do. Quite the contrary, our teams have deep respect for what the others are doing, and see the languages as complimentary with a shared vision of modernizing the state of software development industry-wide.

In this article, we will discuss the pros and cons of Rust and Go and how they supplement and support each other, and our recommendations for when each language is most appropriate.

Companies are finding value in adopting both languages and in their complimentary value. To shift from our opinions to hands-on user experience, we spoke with three such companies, Dropbox, Fastly, and Cloudflare, about their experience in using Go and Rust together. There will be quotes from them throughout this article to give further perspective.

Read More

Recently I had to parse some command line output inside a C++ program. Executing a command and getting just the exit status is easy using std::system, but also getting output is a bit harder and OS specific. By using popen, a POSIX C function we can get both the exit status as well as the output of a given command. On Windows I’m using _popen, so the code should be cross platform. This article starts off with a stack overflow example to get just the output of a command and builds on that to a safer version (null-byte handling) that returns both the exit status as well as the command output. It also involves a lot of detail on fread vs fgets and how to handle binary data.

Read More

This might seem an odd article: every tutorial on the internet teaches you that three point perspective is just the art term for “regular 3D”, where you set up a camera, tweak its distance, FOV, and zoom, and you’re done. The vanishing points that you use when using pen and paper correspond to where the X, Y, and Z axes intersect your clipping plane, and that’s all she wrote… Except that’s not “true” three point perspective. That’s the easy-for-computer-graphics version of three point perspective: the strict version is quite a bit trickier.

The thing that makes it tricky is that in a strict implementation of three point perspective, your vanishing points have to literally be vanishing points: they don’t represent intersections of axes that run to infinity and a clipping plane somewhere off in the distance relative to your camera, the vanishing points are the exact points where all parallel lines to infinity converge. Which is a problem for computer graphics because that means we’re not dealing with linear space, which means we can’t use linear algebra to compute nice “3D world coordinates to 2D screen coordinates” using matrix operations. Which is a slight problem given that that’s the fundamental approach that allows efficient 3D computer graphics on pretty much any modern hardware.

So let’s look at what makes this so crazy, and how we can implement it anyway.

Read More

It was 2005, and I felt like I was in the eye of a hurricane. I was an independent performance consultant and Sun Microsystems had just released DTrace, a tool that could instrument all software. This gave performance analysts like myself X-ray vision. While I was busy writing and publishing advanced performance tools using DTrace (my open source DTraceToolkit and other DTrace tools, aka scripts), I noticed something odd: I was producing more DTrace tools than were coming out of Sun itself. Perhaps there was some internal project that was consuming all their DTrace expertise?

Read More

Undefined behavior ranks among the most baffling and perilous aspects of popular programming languages. This installment of Drill Bits clears up widespread misconceptions and presents practical techniques to banish undefined behavior from your own code and pinpoint meaningless operations in any software—techniques that reveal alarming faults in software supporting business-critical applications at Fortune 500 companies.

Early in the history of programming languages, two schools of thought diverged. Quicksort inventor C.A.R. Hoare summarized one philosophy in his Turing Award lecture:7 The behavior of every syntactically correct program should be completely predictable from its source code. For the sake of safety, security, and programmer sanity, it must be impossible for a program to “run wild.” Ensuring well-defined behavior imposes runtime overheads (e.g., array bounds checks), but predictability justifies the cost. Today, “safe” languages such as Java embody Hoare’s advice.

Read More

The Unix shell is a powerful, ubiquitous, and reviled tool
for managing computer systems. The shell has been largely
ignored by academia and industry. While many replacement
shells have been proposed, the Unix shell persists. Two re-
cent threads of formal and practical research on the shell
enable new approaches. We can help manage the shell’s essential shortcomings (dynamism, power, and abstruseness)
and address its inessential ones. Improving the shell holds
much promise for development, ops, and data processing.

Read More

This paper describes the development of the programming language Erlang during the period 1985-1997.

Erlang is a concurrent programming language designed for programming large-scale distributed soft real-time control applications.

The design of Erlang was heavily influenced by ideas from the logic and functional programming communities. Other sources of inspiration came from languages such as Chill and Ada which are used in industry for programming control systems.

Postgres has had “JSON” support for nearly 10 years now. I put JSON in quotes because well, 10 years ago when we announced JSON support we kinda cheated. We validated JSON was valid and then put it into a standard text field. Two years later in 2014 with Postgres 9.4 we got more proper JSON support with the JSONB datatype. My colleague @will likes to state that the B stands for better. In Postgres 14, the JSONB support is indeed getting way better.

I’ll get into this small but pretty incredible change in Postgres 14 in just a minute, first though it is worth some summary disclaimer on the difference between JSON and JSONB. JSON still exists within Postgres and if you do: CREATE TABLE foo (id serial, mycolumn JSON); You’ll get a JSON datatype. This datatype will ensure you insert valid JSON into it, but will store it as text. This is quite useful if you don’t want to index most of the JSON and want to just quickly insert a ton of it (a great example use case for this is recording API/log input where you may want to play requests).

JSONB unlike JSON compresses the data down and does not preserve whitespace. JSONB also comes with some better indexing ability in GIN indexesWhile you can index JSON you have to index each path. From here on I’ll be using JSON interchangeably, but please in your app mostly use JSONB unless explicitly meaning the more simplistic JSON text format.

Read More