Sunday, March 18, 2007

Part Tatlo - Looping, wonderful looping_

(This is the third part in my Diving into INTERCAL series. If you have read the first two parts then don't feel cocky because the following will probably not make any more sense to you than to a total INTERCAL beginner.)

In this excursion into the magical world of INTERCAL programming we will try to do the almost impossible, create a program with a while loop. If you have programmed in any imperative programming language, such as C, Java, Python or Commodore BASIC, you will know that loops are usually a straight-forward construct. However INTERCAL is designed from the ground up to be like no other language, so let's see if we can actually get a loop to work.

Having taken nearly two hours to write 'Hello, world' last week, I decided to set my sights fairly low for this week. I want to implement the following Python program in INTERCAL:
x = 0
while(x != 1):
x = int(raw_input("Please input a number\n"))
print x
This is by no means a useful program - it simply asks the user repeatedly to type in a number, printing it out each time, until the user enters a number which is not equal to 1.
Anyone with any amount of experience knows that it would be a folly to attempt to convert this program to INTERCAL in one shot. It is much more wise to perform the conversion in small steps. Here is my first iteration:
1         DON'T WORRY, BE HAPPY
2 (1) PLEASE NOTE THAT WE ARE NOT PROMPTING
3 DO WRITE IN .1
4 DO READ OUT .001
5 DO GIVE UP
The line numbers are not part of the program. (This is not BASIC.) Here is a line by line analysis of the program:

1 DON'T...: Any statement which starts with DO NOT, PLEASE NOT, DON'T or PLEASE DON'T will simply be ignored by INTERCAL. The rest of the statement does not need to be valid INTERCAL syntax because since you don't want to do it the compiler doesn't have to understand it.
2 PLEASE NOTE...: Because of the rule above, many INTERCAL programs include 'PLEASE NOTE' as a way of including comments. INTERCAL only cares about the first three letters of NOT, so PLEASE NOTCH MY BELT would also be ignored.
Line two also introduces the idea of a label, (1), which in this iteration is not actually used. Labels are given numbers, not names.
In keeping with good programming practices, the comment 'PLEASE NOTE THAT WE ARE NOT PROMPTING' is 100% accurate. If you have read 'Hello, world' earlier, you will know that writing character output is just not worth the effort. Usability is overrated.
3 DO WRITE IN .1: In this statement we are asking INTERCAL to accept numeric input from the user, storing the input in the 16-bit variable .1. 16-bit variables are represented with the 'spot' (.), whilst 32-bit variables are represented with the 'two spot' (:). In INTERCAL you have 65535 16-bit variables and the same number of 32-bit variables, from .1 to .65535 and :1 to :65535. Note that .1 and .001 are the same variable, but .1 and :1 are not.
4 DO READ OUT .001: In this statement we are asking INTERCAL to send the value of variable .001 (which is the same as .1, .01, etc) to the standard output. For your convenience, the number is output in Roman numerals.
5 DO GIVE UP: Ends the program.

If I compile and run the above program it will accept my input (eg. ONE TWO THREE), output the number (eg. CXXIII) and then exit. Too easy.

If at line 5 we insert the statement 'PLEASE DO (1) NEXT', the program will jump to our label (1) instead of exiting. The program will continue looping forever, which is certainly a step in the right direction of converting our Python program to INTERCAL.
The program continues looping until the user types in invalid input. For example if I type in the input 'WHAT IS UP DOC?', the program will respond:

ICL579I WHAT BASE AND/OR LANGUAGE INCLUDES WHAT?
ON THE WAY TO 4
CORRECT SOURCE AND RESUBNIT

I am not going to try to add input validation or error handling to this program because life is simply too short. But the program still falls short of the Python original - it does not exit if the input is equal to 1. We need an IF statement.

By now we should guess that INTERCAL will not have an IF statement, and we would be correct. In keeping with the 'unlike any other language' theme, there is nothing that really approximates an IF statement at all. It turns out that implementing an IF is quite a difficult task in INTERCAL.

Here is my the next iteration of our program, trying to get around INTERCAL's lack of an IF:
1         DON'T WORRY, BE HAPPY
2 DO (2) NEXT
3 (1) PLEASE NOTE THAT WE ARE NOT PROMPTING
4 DO WRITE IN .1
5 DO READ OUT .001
6 PLEASE RESUME .1
7 (2) DO (1) NEXT
8 DO GIVE UP
Explaining the additional lines:

2 DO (2) NEXT: When we added the DO (1) NEXT line earlier, you may have assumed that it was a BASIC-style GOTO. Unlike BASIC's GOTO, the position of the next statement after a DO (in this case line 3) is placed onto a 'NEXT' stack. This NEXT stack can be manipulated by the RESUME or FORGET statements, discussed later.
7 (2) DO (1) NEXT: This is the statement which the previous statement jumped to. We simply jump straight back to label (1). Note that this statement will also push the position of line 8, DO GIVE UP, onto the NEXT stack. At this point the NEXT stack will look like this:

Line 8
Line 3

6 PLEASE RESUME .1: This is the most important line in the program. When we reach this point we have the user's input in the variable .1 and have just sent it to the output. We need to decide whether to continue looping. The RESUME statement will evaluate the expression, in this case the value of .1, and pop that many items off the NEXT stack. It will then jump to the last item popped.
So if the user types in 'ONE' the program will pop the top item from the NEXT stack and then jump to Line 8, the end of the program. This is exactly what we wanted.
Alternatively, if the user types in 'TWO' (or maybe 'DALAWA') then two items will be popped and the program will jump to line 3. This is also exactly what we wanted, because the user will then continue the loop at line 3.

So we are finished right?

Um, no. If the user types anything other than ONE or TWO (or any number at all on the second iteration) then they will receive the following message:

ICL632I THE NEXT STACK RUPTURES. ALL DIE. OH, THE EMBARRASSMENT!
ON THE WAY TO 8
CORRECT SOURCE AND RESUBNIT

Oh dear, it seems we still have a bit more learning to do before we will have mastered the IF construct in INTERCAL. Unfortunately I am very sleepy, and also incredibly confused, so the completion of this program will need to wait for another day. Stay tuned.

Are you an INTERCAL guru? Please feel free to post the solution as a comment. (It will sure save me from figuring it out.)

Desperately waiting to see how this program is written in INTERCAL? Subscribe to the 'Diving into INTERCAL' feed and you will be the first to know.

Labels: , ,

Sunday, March 11, 2007

Part Dalawa - Still trying to write 'Hello, world'_

For those few of you who read the first part of this tutorial series, "Part Isa - Introduction, Manners & 'Hello, world'", you would know that we are roughly 8% of our way to completing the classic program 'Hello, world' in INTERCAL. In short, we had managed to write the character 'h' to our standard output but we were not entirely sure how we had achieved this. In this tutorial we will boldly attempt to finish writing our first INTERCAL program and possibly try to understand how the program actually works.

INTERCAL has very good support for the input and output of numbers. In accepting numeric input the following would be allowed:

EIGHT OH NINE
EIGHT ZERO NINER
WALO WALA SIYAM

:where all of these inputs mean '809', (the last being written in the Tagalog language). The 'WRITE IN' statement accepts numbers written in English, Sanskrit, Basque, Tagalog, Classical Nahuatl, Georgian, Kwakiutl, and Volapuk. INTERCAL will output numbers in Roman numerals, so in the example above we could use 'READ OUT' to output the value 809 as DCCCIX.

Character input

To write our 'Hello, world' program, we are much more interested in character output rather than Roman numeric output. To understand character output in INTERCAL, you must first understand character input. When speaking of character input, the INTERCAL manual says:

The programmer desiring to handle input on a character basis should consider using another language.

Indeed character input in INTERCAL is handled in a fashion that is significantly different from all other programming languages and as we will see soon character output is even more unique. To understand text input and output in INTERCAL we need to understand the Turing Text Model. It is possibly best described using the following diagram which I drew on the back of an envelope:



We can imagine that INTERCAL has a circular input tape with all of the 256 available characters printed on it. INTERCAL also has an 'input head' which is positioned at the location of the last character entered by the user. The input head starts at position 0 (ASCII 0) when your INTERCAL program starts. If the user types 'g', as in the diagram, the input head will be moved to the 'g' on the input tape and the decimal value 103 (g is ASCII 103) will be stored in the first position of your array.

So far, so good. It is when the user keeps on typing that things get tricky. If the user finishes typing the word 'goat', by typing 'oat', the following will result:

o: The input head will move to the right by 8 positions to reach the 'o' from its initial position of 'g', so the decimal value 8 will be stored in the second position of your array.

a: The input head can only travel to the right. So to reach the letter 'a', it must travel past the end of the 256 available characters and keep traveling until it reaches 'a'. To do this, it must travel 242 positions. So the decimal value 242 will be stored in the third position of the array.

t: As with the simple case of 'o', the input head travels from 'a' to 't', storing the decimal value 19 in the fourth position of the array.

Simple.

Character Output

When it comes to character output, there is some good news and some bad news. The good news is that the input tape and output tape (and their corresponding heads) are independent, which is much simpler than if they were connected. The bad news is that the previous sentence is the only good news.

As with input in INTERCAL, described in the previous section, there is an output tape with all 256 ASCII characters printed on it and an output head. The tape travels in the same direction as the input tape, but the output tape is on the inside of the tape. This results in two subtle differences:

1. The numbers required to move the head from one position to another are different because the output head is effectively moving in the opposite direction to the corresponding input head.
2. Because the output head is on the inside of the tape, it sees the binary representation of the ASCII characters printed on it backwards. For example, to print the ASCII character 'b', binary 0110 0010, you would need to move the output head to the ASCII 'F', binary 0100 0110.

As with the input head, the output head starts at position zero. We can calculate the required head moves for the string 'Hello, world' in the following table:



Head positionRequired outputRequired binaryReverse binaryRequired head positionMove head by
0H0100 10000001 001018238
18e0110 01011010 0110166108
166l0110 11000011 011054112
54l0110 11000011 0110540


:and so on. Continuing on these calculations you end up with the program:

DO ,1 <- #13
PLEASE ,1SUB#1 <- #238
DO ,1SUB#2 <- #108
DO ,1SUB#3 <- #112
DO ,1SUB#4 <- #0
DO ,1SUB#5 <- #64
PLEASE ,1SUB#6 <- #194
PLEASE ,1SUB#7 <- #48
DO ,1SUB#8 <- #22
DO ,1SUB#9 <- #248
DO ,1SUB#10 <- #168
DO ,1SUB#11 <- #24
DO ,1SUB#12 <- #16
DO ,1SUB#13 <- #214
DO READ OUT ,1
PLEASE GIVE UP

:which when compiled and run gives the enormously satisfying output:

Hello, world

I have written 'Hello, world' in many different programming languages over the past many years but I have never felt the sense of achievement that writing 'Hello, world' in INTERCAL has given me. Imagine the pride I would I feel if I built a small operating system using INTERCAL?

'Hello, world' has only scratched the surface of the power and flexibility of INTERCAL. The astute reader will note that our program is quite linear, running from start to finish without branching or looping. In the next tutorial in my INTERCAL series we will explore some of the options that INTERCAL provides us with to create more complex programs.

Labels: , , , ,

Friday, March 9, 2007

Part Isa - Introduction, Manners & 'Hello, world'_

A good programmer constantly strives to learn new techniques, better ways of building a mouse-trap. Anyone who is happy to stick with the tools that they know is likely to become stagnant and bored. There are many in the IT industry who I would call 'band-wagon hoppers' who are always looking to jump from one latest fad to the next. Java one day, Ruby the next, Haskell on Saturday, Groovy on Sunday.

A language that has stood the test of time is INTERCAL, created by Donald Woods and James Lyon in 1972. As a programming language, INTERCAL remains every bit as useful as it was over thirty years ago.

Despite its longevity, INTERCAL has never seen the peaks of popularity that more 'mainstream' languages like C, C++, Java and even Visual Basic have experienced. (This is despite it being superior to Visual Basic in nearly every respect.) But in the same way that learning a functional language like Haskell can make you a better Java programmer, I suggest that you learn at least some rudimentary INTERCAL to expand your field of knowledge of alternative programming techniques.

For my examples, I will be using the C-INTERCAL compiler which can be downloaded on the official site. To build the compiler you will need to jump through the standard './configure' and 'make' hoops. (Microsoft Windows users should upgrade to Linux or Mac OS X before completing this step.)

After building the compiler, create a new file called 'hello.i' with the following contents:

DO ,1 <- #1
DO ,1SUB#1 <- #234
DO READ OUT ,1
DO GIVE UP

I don't expect you to understand this code at the moment; all will be explained in time. Let's just attempt to compile this file using the command 'ick hello.i'. You should receive the following output:

ICL079I PROGRAMMER IS INSUFFICIENTLY POLITE
ON THE WAY TO 4
CORRECT SOURCE AND RESUBNIT(1)

Taken from the revised INTERCAL manual: "INTERCAL was inspired by one ambition: to have a compiler language which has nothing at all in common with any other major language."

Already we have encountered one way that INTERCAL differs from other languages - it doesn't like being bossed around. You can't just tell it to 'DO, DO, DO'. If you want it to respond nicely you will have to say 'PLEASE' at least 1/5th of the time. However INTERCAL will not stand for brown-nosing either, so don't say PLEASE more than 1/3rd of the time. To assist in INTERCAL programming there is an intercal.el Emacs mode included with the distribution that randomly expands 'DO ' to 'PLEASE DO ' 1/4th of the time. Apart from good manners, there is no semantic difference between 'DO', 'PLEASE DO' or simply 'PLEASE'.

So correcting our source:

DO ,1 <- #1
DO ,1SUB#1 <- #234
DO READ OUT ,1
PLEASE GIVE UP

:we will then be able to compile with no error(2). If you run your compiled program, you will find that it outputs the single letter 'h' then exits. Congratulations, you are 8% of the way to writing 'hello, world!'.

Let's dissect our program, line by line. The statement 'DO ,1 <- #1' places the constant value '1' into the first element of an array called ',1'. Constants are prefixed by a 'mesh' (#) and 16-bit arrays are prefixed by a 'tail' (,). (Of course 32-bit arrays are prefixed by a 'hybrid' (;).)

On line two we have a similar command 'DO ,1SUB#1 <- #234' which takes another constant, 234, and places it into the second element of array ,1 using the SUB syntax. (Surprisingly, INTERCAL arrays start at index 0.)

On line three we have the command 'DO READ OUT ,1' which will send the text in our array ,1 to the standard output (which I will assume is a dot-matrix printer in your environment).

On line four we have the command 'PLEASE GIVE UP' which of course ends our program.

Worth noting is the fact that the constant '234' has no obvious correlation to the character output by our program 'h'. I discuss the powerful INTERCAL output algorithm in 'Diving into INTERCAL - Part Dalawa'.

I hope I have sparked your interest in the wonderful world of INTERCAL programming. In future installments we will look at the Turing Text Model and other language features such as FORGET, REMEMBER, ABSTAIN, IGNORE and COME FROM.

(1): Yes, the C-INTERCAL compiler always misspells the word RESUBMIT

(2): You may receive a 'random compiler bug' that will appear at run-time. This seems to happen about 10% of the time.

PS. A massive prize for the first person who leaves a comment explaining why this post was called 'Diving into INTERCAL - Part Isa'

Labels: ,