Remarkable history of formal languages and parsing

I found this history fascinating. It begins … really really retro (the 4th century BCE) and proceeds to the modern day. The author is an expert who has developed a language processing framework called Marpa that is arguably the culmination of the entire 2400 year history.


I’m only half-way through, but I too am finding it fascinating. A year-by-year account, almost, and very clear on the theoretical and technical challenge that has perhaps fallen out of view since practice became adequate to the task. Although, I haven’t finished reading yet…

I think hardware makes more a difference, with the character data type in parsing,
and base registers for run time use, 1970’s general usage.
With modern tools, you got the hidden code and data needed by the langauge run time.
This prevents real cross compilers, say with a word based machine and a character based machine.I can’t say all things have been a improvment, I hate the ; in modern langauges.
It is a more character rather than a stop character.
FORTRAN was nice in that blanks were ignored, big num,bigger num, over sized number
made easy reading vs big$num …

Some of this I knew, much of it is new to me; however I think at least one area is poorly reported - he talks about Alick Gennie’s work in the early 50’s but much of what he ascribes to Glennie was the work of Tony Brooker. He somewhat glosses over Brooker’s parsing algorithm and doesn’t recognise (or at least emphasize) that PEG parsing is basically the rediscovery of Brooker’s parsing algorithm. He suggests that Glennie’s work was not used much but that type of parser was pretty much the underlying mechanism for all the parsers written at Edinburgh until the late 90’s when Edinburgh’s compiler writing efforts started to fizzle out. He doesn’t say much about the Compiler Compiler or the difference between parser generators and extensible languages, or the close correlation between Brooker/Glennie’s work and that of Ned Irons - who were all working on similar projects. And I think he entirely missed out on the issue of the trivial conversion of a procedural parser into a table-driven one - a process that makes the distinction almost irrelevant. One other minor issue is that he doesn’t quite grasp that “autocode” is basically just the word that was used in the 50’s for “compiler” - before that term was used for this process, the conversion of a high-level representation of an arithmetic statement into machine code was described as “automatic coding”. In fact a “compiler” in the early days specifically referred to compiling a library of procedures and (he does mention this) was what we would now call a “linker”. But those few minor niggles aside (which might be improved in a later revision) I do think this is a good paper and brings together a vast sprawling field fairly neatly.

1 Like

PS I have a personal interest in this area as I’ve been refining over the years my own attempt at a Brooker/Edinburgh-style parser based on what I learned as an undergrad. Current version is at GitHub - gtoal/uparse: Unicode Parser generator. Similar to PEG parsing. Top-down, table-driven, arbitrary lookahead, right-recursive. should you want to try it. Especially if you’re interested in 60’s programming languages, as I included an Algol60 parser I hacked up as a demo - the download may actually be of more interest to Algol buffs than to parser hackers.

1 Like

Thank you for making all of us aware of Brooker; I was not. I’m tempted to try and contact Kegler, who seems willing to make updates to his history, but I’m not sure I should be the one to do it. Since Kegler has published a CPAN module for his Marpa parser generator, metacpan has an email address, which I’ll distort as “jkegl cpan org” if you’d like to send him an email along these lines.

EDIT: there’s a nice history of Tony Brooker’s work at the University of Manchester’s site (30 page MS Word document):


An excellent document! (It’s a PDF, in fact, which is handy for me.)

Of all the software associated with the Ferranti Atlas computer, two systems were destined to be remembered as landmarks. One was the Supervisor, which was the first multitasking, multi-user operating system. The other was the Compiler Compiler which, when provided with a syntactic and semantic description of a programming language, would automatically generate a compiler for that language. Both the Supervisor and the Compiler Compiler (CC) were conceived and developed at the University of Manchester between about 1960 to 1964, in the spirit of making a high-performance computer more efficient and user-friendly. The Compiler Compiler was the idea of R A (Tony) Brooker.

This article begins by describing the pre-history of the CC and then goes on to discuss its implementation and use on Atlas

1 Like