The Future Of Editing: Literate Programming / Reproducible Research

yeti · May 17, 2020, 11:00am

We should be able to do better than just edit today:

Knut’s TeX and METAFONT written this way (“Literate Progrmming”) was a start and this method got extended by the ability to execute code and capture it’s output without leaving the environment (“Reproducible Research”: Jupyter Noteook, Emacs/Org/Babel). This holds all pieces together and less switching between writing code, documenting, running, reviewing logs and test output makes staying focused easier too.

I’m trying to get into this with Emacs/Orgmode/Babel.

…and folding comes back like “naturally” with Org/Babel too.

EdS · May 17, 2020, 12:00pm

Bit of background from Wikipedia:

Literate programming was first introduced by Knuth in 1984. The main intention behind this approach was to treat a program as literature understandable to human beings. This approach was implemented at Stanford University as a part of research on algorithms and digital typography. This implementation was called “WEB” by Knuth since he believed that it was one of the few three-letter words of English that hadn’t already been applied to computing.

jhi · May 19, 2020, 10:17am

Sorry, in cynical mood, but “better editing” seems to be in the same category as flying cars and practical nuclear fusion, always “just a few more years” to the future.

jhi · May 19, 2020, 2:06pm

To balance my cynicism: I like literate programming.

whartung · May 19, 2020, 2:44pm

The most recent version of Literate Programming that I’ve had any real contact with is Jonesforth.

This is a large assembly language program with very extensive comments within it.

I think it’s an interesting technique, but limited in modern systems simply because most modern systems are not composed of a single, monolithic file. Now, in theory, they COULD be. But most system simply aren’t necessarily understandable from the top down. Most systems of complexity are better attacked from the top, from the bottom, from the side, etc.

Several books have been written where software is described, and grown, organically over time. Individual parts explained in detail, changed in situ, etc. But even after that, the actual presentation of the code is a bit lousy. For example, many concepts are better explained at a higher level, without the minutiae of the implementation details. So, code is presented in an elided form. Not suitable as is for final compilation. The descriptive code is not the running code, typically presented later on after other concepts have been described.

So, literate programming I think has its place but only for robust examples.

elb · May 19, 2020, 3:27pm

I think jonesforth is the perfect example of literate programming.

I agree that this sort of small-but-complex system is the ideal scenario for a good literate program, but I think its scalability to larger programs and more complex source code organizations is heavily dependent on tooling. In something like jonesforth where no document processing language is used and no particular machine-navigable structure is present, more than just a few files would be overwhelming. Projects using more robust markup like Web or org-mode scale much farther.

NoLand · May 19, 2020, 5:09pm

Another example would have been Friedrich A. Kittler (1943 – 2011), a German literary scholar and media theorist (he dubbed his field media science, always used with a silent exclamation mark). Kittler wasn’t only writing about the intersection of culture and technology, he was also programming (there’s an essay “Protected Mode” on the perceived scandal of institutional structures invading the very heart of computing, by this depriving non-institutional users from access to the bare metal), and gave introductory lessons in C for his students. Apparently, the source code was richly annotated, not only descibing the algorithms, but also providing historic background and putting them into perspective. Since Kittler was also an inspiring writer, after his death in 2011 some of his former students felt that these text enclosed in the source code were well worth preserving, which gave rise to the beginnings of a project “source code as text”. (I was as asked, if I could provide a web platform for this, which wouldn’t only run the code in emulation, but would also somehow link the code running and the texts in the source code by some debugger traps.) Sadly, there was no funding and the project eventually fizzled out…

whartung · May 19, 2020, 6:13pm

Indeed. I think the point I was trying to get across is that I feel that the presentation of code differs from the execution of code. The Jonesforth example is simply because, essentially, there’s but a single path through the code (most specifically the NEXT and the organization of the dictionary, and the compiler), followed by a litany of individual words which (mostly) can be taken as isolated context.

Most complicated systems have many disparate paths through their system. This is where you get the “impedance mismatch” of a “reference guide” and “user’s guide”.

The reference guide is many of the component in isolation whereas the user’s guide gives a higher level view of how the components are stitched together. Those two documents, while related, are fundamentally different on their approach.

Now, again, I don’t have any vast experience with examples of Literate Programming, but I find it hard to see a single document filling both roles.

The other example to cite is the Software Tools books by Kernighan and Plauger. These books go in to the detail of implementing non-trivial programs as well as a mechanism to espouse their software “tools” philosophy. The programs are presented as isolated routines, some routines are evolved during the presentation. In the end you get the entire program, but not in a single throw, and not necessarily in the proper order.

Mind, this is not Literate Programming. You can’t run the book through a tool and get working code – but it’s pretty close. With some effort, it could be made “literate” with little more than the robust use of language comments ala Jonesforth.

But in the end, I think, most developers (me, notably) want the code with “notes” more so than “literate programming”. In that I’d rather read the code and ask questions (i.e. refer to notes) than read the english prose and then the code.

An interesting compromise in this is Forth and Shadow Screens.

Consider (I appreciate this is wide)

\ 16 bit Stack Operations                             22Aug83map | \ 16 bit Stack Operations                             02AUG83HHL
CODE SP@     (S -- n )                                           | SP@
   SP AX MOV   1PUSH END-CODE                                    |      Return the address of the next entry on the parameter stack
CODE SP!     (S n -- )                                           | SP!  ( Warning, this is different from FIG Forth )
   SP POP   NEXT END-CODE                                        |      Sets the parameter stack pointer to the specified value.
CODE RP@     (S -- addr )                                        | RP@
   RP AX MOV   1PUSH END-CODE                                    |      Return the address of the next entry on the return stack.
CODE RP!     (S n -- )                                           | RP!  ( Warning, this is different from FIG Forth )
   RP POP   NEXT END-CODE                                        |      Sets the return stack pointer to the specified value.
                                                                 |
                                                                 |
                                                                 |
                                                                 |
                                                                 |
                                                                 |
                                                                 |

In original Forth, code was organized in 1K blocks called screens. Screens are numbered. Screens are typically just mapped to raw disk blocks. A 360K floppy would have, basically, 360 Screens for code.

A Shadow Screen is where some range of Screens are considered code, and then another range, at a specific offset, are considered documentation. So, on a 360K floppy, you can see the first 180 screens being code, and the latter 180 being documentation. Screen 1 is code, screen 1 + 180 (181) is documentation.

The example above demonstrates this concept, with the code screen displayed alongside it’s shadow screen.

So, to the point, a nice affect of this is that you can look at the code simply as code. If you have any questions about the code, you can just pop over to the shadow screen and, ideally, get some more insight. its nice because the documenation can be expressive and general, or it can even be very specific (<-- THIS LINE HERE DOES XXX).

As a developer in Java, they like to document code with bodies of text as headers to the code. As a developer, if I’m using a code folding editor, I hide all of this. By this point, I’m editing, not learning, and the documentation is, literally (wait, did I say that?) in my way. I have to scroll/page through it all. It’s just bulky and in the way. If I’m not using a code folding editor, I’m just flat stuck scrolling.

The shadow screen concept is nice simply because the documentation is actually side by side the code, but not IN the code. It’s still limited, naturally, only so much you can do in a 64x16 character window. It would be interesting to see something akin to this for other code.