In a class, a few of my friends and I were perusing the old entries into the C obfuscation competition; it’s really neat to see all the clever ways people come up with to make it impossible to understand what their code does (that’s the aim of the competition, to make code that is borderline impossible to understand the function of by reading it). I personally like the flight simulator in which the code is shaped like an aeroplane. And it got us thinking – what if we wrote our own programming language with the intention of making it hard to follow?

Precedent

Esoteric programming languages are a tradition in the programming community. They come in a variety of forms and purposes; here are a few ones I like.

Brainfuck

This one’s a classic – it’s a minimalist language, and only takes these eight characters:

< > + – . , [ ]

Any other characters will be ignored. It’s operation is a little complicated, but it essentially defines a big long ‘memory’, full of a bunch of empty bytes (not bits). These command characters then allow you to move a pointer along the byte array, increment the value of the specific pointer the byte is at, or return the value of the byte. there’s a little more to it than that, but that’s the basic idea. Here’s a hello world program in Brainfuck:

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.

This code is actually not that complicated, it’s basically just adding bytes and moving around.

Malbolge

This one was created with the intention of being ridiculously complex. So complex, in fact, it took several years before someone made a program for it; and they used a computer to write the code! Not only is the language in base 3 (instead of binary), but the operation of the commands  changes depending on where the code pointer is – as the program runs, the compiler changes how it reacts to commands. Here’s another hello world program.

 (=<`#9]~6ZY32Vx/4Rs+0No-&Jk)"Fh}|Bcy?`=*z]Kw%oG4UUS0/@-ejc(:'8dc

TrumpScript

This one is very tongue in cheek. Most of its features have humorous connections to Trump’s crazy image. To quote a few from their Github page:

-All numbers must be greater than 1 million. The small stuff is inconsequential to us.

-The language is completely case insensitive.

-Assignments are done via the “make” keyword, e.g. “make America Great”

And so on.

Whitespace

This is honestly the best idea – the program only takes input of whitespace characters (space, tab, and return); it ignores all other characters. This means that whitespace programs can be hidden in programs for languages that ignore whitespace characters. Genius!

So what’s our silly idea?

Essentially, the language is based off having only the straight line type characters on the keyboard, such as these:

/ \ | – _ [ ] < >

Maybe some others, if we’re feeling fancy. The language is called Linear C, (a combination of Linear B, the language C, and the fact the characters are all straight lines)

To be honest, we’re still not sure what kind of language we want this to be – we could go with something like Brainfuck, and move a pointer along a data array – we could try and make a fully fledged Python or C like interpreter, or any number of things. Seeing as i’m lazy it’ll probably be the simplest one.

What’s the syntax gonna be like?

Seeing as we still don’t even know what kind of language we’re attempting to make, it’s hard to get overly detailed, but here’s some ideas we’ve been throwing around (keep in mind that the point of the language is to be overly complex, so any obfuscatory syntax is a bonus, because we can.)

Moving around

We’re probably going to use a Brainfuck style pointer system, that moves along the code one character at a time, and manipulates a byte array.

One idea is to use the forward slash / to tell the pointer to go up a line (to the character in the code directly above the current one), and the backslash \ to tell it to go down a line.

The Angle brackets < > would tell it to move one forward or backwards along the byte array.

Inputting information

Brackets tell it to input the byte array determined by a series of hyphens and underscores, where hyphens are 1 and underscores 0. E.g.

[- -_- -_ _-] is 11011001, and it inputs it at the current pointer.

This is all mostly in jest at the moment, but it would be kinda neat if we could make it.

 

 

 

Advertisements