Very interesting! Slightly related
Doug McIlroy’s list of most surprising Unix programs
and
Lorinda Cherry 1944-2022 (bc, dc, eqn, libplot, …)
While I was working in dictionary computer support, and therefore professionally interested in spell-checking, I read an article that suggested when the database was increased in size, its usefulness is reduced.
This tends to happen with words that occur with lower frequency, yet are still correctly spelled. For example, a common typo for the conjunction whether is wether. This is a rare but correctly-spelled noun, a term for a castrated male sheep. So the bigger the word list, the greater chance of false-positives. I hope that checking takes word frequency into account now, but I don’t know.
(How spell checking performed adequately on 8-bit computers, I’ll never know. I have a UX anecdote about how spell-checking a PhD thesis was better on an 8-bit Amstrad PCW compared to a 80486 PC, but this isn’t the thread for it.)
This point and some related others come up in the very interesting section Choosing The Word List in the linked paper, Development of a Spelling List (by Douglas McIlroy, 1982)
The spelling checker ran twice as fast and questioned fewer good words than it had when the traditional dictionary three times as large was used as a standard. The new list was nevertheless incomplete…
…
… the field work was indispensable, and produced a thousand unquestionably useful words that don’t occur in traditional lists.
…
The stop list made possible the use of reckless derivation rules that would otherwise allow likely misspellings to slip by.
Re. 8-bit spell checkers, a word processor I used on my 8-bit Atari 130XE, called AtariWriter Plus, had a spell checker on it that worked great. I used it for writing most of my papers in college. The word processor operated in 128K of memory, part of which was reserved for the document being checked. The spell checker had a dictionary disk. The way it operated was the document resided in memory, and the spell checker read through the dictionary on disk to check it. As I remember, the process didn’t take that long for something like a 16-page document, maybe 5 mins. I haven’t looked at an implementation of it, but it seems like it must’ve used a hashing algorithm to quickly find the most relevant part of the disk for each word. Since the 130XE used bank selection for the extended memory, it’s possible the dictionary was divided up into 16K chunks on disk, which would’ve enabled the spell checker to load a chunk into a bank, and search through it in memory. I imagine this would’ve been faster than just searching off the disk. Word frequency could’ve also been used to organize words in the chunks, leading to more “hits” with fewer chunks being swapped.
The word processor was designed for the Atari 1050 disk drive, which used a format called “enhanced density,” storing 127K on one side of a disk. So, the dictionary had to fit, at most, in 127K. Though, I don’t remember if the spell checker resided on the dictionary disk, as well.
I brought this up in a conversation at a conference last week, when someone had the temerity to question how useful a 16 bit Unix was. How very dare they!