OK, I think I see what he’s getting at. He never comes straight out and says it in simple terms like anyone else would use, so it’s very difficult to say for sure that I understand him. So I’m just throwing this out for discussion. I hope someone else will make an effort to watch some of the videos and comment on whether this is right. Everything here should be considered a tentative interpretation of Ivan’s words.
As you observe, the Mill system has an MMU (he calls it the “TLB”) and does paging. So programs are doing everything in their own virtual addresses, as you observe, and the virtual addresses are being mapped to physical addresses the program can’t know, just like on any modern system.
On most computers, every program uses the same virtual addresses. When the linker links your program, it assigns virtual addresses for code starting at some address “Code0” and assigns addresses for static data at “Data0”. And the stack starts growing from some address “Stack0”. These addresses Code0, Data0, and Stack0 are the same in every process, because there’s no reason to do anything more complicated. So my code starts at, say, address 0x1000, my static data follows it, and my stack starts growing downward from, say, 0xFFFF0000, and these virtual addresses are the same in every program linked for the system.
(More recently, these addresses may be randomized by the linker (correction, no, it has to be done at load time by the OS) to make various types of malware attacks a little more difficult; this is called Address Space Layout Randomization, ASLR. I don’t think this is relevant, so let’s ignore it for now and imagine a 1990s world in which Code0, Data0, and Stack0 are all fixed virtual addresses.)
At runtime, each contiguous chunk of virtual addresses (“page”) that the program actually touches is assigned to a like-sized, fixed-sized chunk of physical addresses so the program can actually run: this is paging.
I think what Ivan is saying is that he wants to assign every process in the system a unique range of virtual addresses. One can imagine that when it’s time to prepare a program for execution, the program loader (not the compile-time linker) goes through and adds an offset to every fixed address in the program. The offset grows over time as more and more processes execute. So at runtime, every process has a unique Code0, Data0, and Stack0, and every address used in every program is correspondingly changed.
I think this is what he means by “… the 60-bit space”. It’s a space of program virtual addresses that is big enough to accommodate every running process having a unique range of programmer-visible virtual addresses. So if I were to run a pipeline with two copies of a program, and both programs print the numerical value of, say, a pointer to static data, the printed values would differ.
Now I can create a sort of “bundle” of {start address, number of bytes, permission}
and pass it to another process. When the target process receives it, the target process can just use that virtual address and voilà! The page is already mapped if the sending process had “faulted it in” before sending it. Or the receiving process can fault it in; it doesn’t matter–the virtual address used by both processes hits on one single page mapping (TLB entry) so you get sharing.
How new and different is this? Well, I hate to say it, but modern Intel CPUs support assigning each active process a context identifier (PCID) which is stored in the TLB. You can think of the PCID as being similar to having 12 high order address bits, so this is most of Ivan’s concept.
It means that the MMU (TLB) doesn’t have to be “flushed” when the OS switches from one process to another. The new process may force the previous process’ entries out of the TLB, but it all happens gracefully; none of “its” TLB entries are tagged with “my” PCID, so there’s never a collision an ambiguity; my pages are mine, yours are yours. We can always collide on a TLB slot, but this just causes replacement, not a problem. When the previous process runs again, it will still have the same PCID and some of its entries might still be in the TLB.
With the Intel architecture, though, only one PCID can be “active” on a CPU at a time (or maybe N, if the CPU is N-way hyperthreaded; again, let’s ignore that complexity). Ivan is generalizing on this idea of PCIDs and saying they can be part of an address that is passed from process to process.
The other thing Ivan is adding is a layer of permissions checks for variable-sized chunks. Most systems associate the permissions with the pages, so if processes share memory it has to be on a page-by-page basis. Ivan’s idea involves permissions down to the byte level, so this would require a separate level of checking.
To me, this all seems entirely “doable” and I think it would add value. As someone points out in one of the videos, managing the 60-bit virtual space that all the processes share could become complicated (recycling a space of variable sized chunks). But if it were 64 bits, and it probably could be, it would allow 2^32 ~= 4 billion processes each with 4GB of memory. People do routinely run processes much larger than 4GB these days; in my last job, we ran hundreds of 32GB processes that absorbed logs and allowed text search. But 64 bits is still a lot. It seems doable.
The flight into fantasy starts when he begins criticizing operating system designs and saying how mechanisms like the one he proposes could help. The funny thing is, his criticisms are completely valid, and his remedies make complete sense. The fantasy is to imagine that anyone is going to deeply restructure the permissions model of, say, the Linux kernel, because some hardware-specific mechanism specific to a particular machine architecture is going to be helpful. This is ridiculous. Thousands and thousands of engineering years have gone into the Linux kernel. Ivan is good computer architect, but I don’t think he understands “capital investment”.
For the record, Agner Fog has also designed a completely new architecture. He is brilliant and very practical. He’s a got a complete FPGA softcore that implements the architecture, so he seems to be much further along than the Mill, after all these years. And there are some really, really good ideas in his architecture. And it’s generated less interest even than the Mill. You can read about it here: https://www.forwardcom.info/