Computer Language Magazine

Michael_Barry · February 12, 2023, 12:33am

Maybe it’s just a matter of getting the proper software tools in order. I don’t mind investing some time trying to digitize and upload what I have … it’s the cash expenditure that’s more difficult to justify for something of dubious value like this.

I have several other similar dusty collections, all slowly oxidizing away. It would be a minor shame to lose them entirely without at least attempting to save them from the eventual land fill.

EdS · February 12, 2023, 9:40am

I think the usual approach is to cut the spines off and feed into a sheet-fed scanner, which is effective but perhaps a bit brutal.

For the sake of having something rather than nothing, I’ve ‘scanned’ a book or two just using a handheld smartphone, some propping-up, and a sheet of glass from a photoframe. Results are variable, but legible and good for OCR. Example: 550 pages. It would be possible to then batch-process the images for exposure and colour correction, but as you see, I didn’t. I did a little cropping and straightening. My phone is visible as a faint reflection in some cases, and one page was lost to auto-focus mishap, but replaced by another scan. For me, this whole thing was good enough, and certainly better than nothing.

The propping up in a V shape, and having a sheet of glass or similar, is crucial, but also quite cheap.

Michael_Barry · February 12, 2023, 10:05am

Thanks, Ed. So, my scanner should be more than adequate. I don’t mind cutting off the spines to get the best results, as long as I know that my efforts won’t be wasted. Forgive my ignorance and/or laziness, but do you have any pointers or links to instructions about how exactly to upload? I’m talking about dpi, format, file type, metadata, etc. When does the OCR happen? Can I immediately add to an existing collection, or do I have to stay in my own space, and let a moderator merge at a later time?

I’m the legal owner of these collections, but am I exposing myself to potential legal trouble if I share them thusly?

EdS · February 12, 2023, 10:31am

The uploading is very easy, and the OCR happens as a batch job postprocess. It’s also easy to re-upload if for example you had an improved scan, or fixed some misaligned pages. You need an account, which means you need to give an email: I used a throwaway email, which means I can’t have a proper two-way conversation with support, but otherwise hasn’t had any downsides. There is a copyright issue, which I chose to ignore, including ignoring that I had to declare I had the right to upload. The Internet Archive themselves will deal with any takedown request. My feeling is that any IP owner is very very unlikely ever to try to find the uploading party. But this is not legal advice!

So, once you have a bunch of JPEGs, just pop them in a zip file with the correct name, and upload away! They might or might not get added to a collection at some point - it might be that there’s a way to request that.

(You can at a later point remove the data from the archive entry, including the metadata, but you can’t remove the entry itself, or modify the URL you chose.)

oldben · February 12, 2023, 5:18pm

pdf format from bitsavers

The PDF Document Format

Documents here are kept in a minimal subset of PDF format, just using it as a
container for lossless Group 4 fax compression (ITU-T recommendation T.6) images.
Contributions are normally post-processed by tools to put them in exactly this format.

Documents were scanned using a Ricoh IS520 400dpi 30ppm B&W duplex production scanner
from the late 90’s through 2007.

Conversion to higher performance Kodak DS 2500D scanning occured in July, 2007.
The 2500D is an OEM version of the Panasonic KV-S2055 scanner.

In 2008, the Kodak was replaced by a Panasonic KV-S3065W, which
is capable of duplex color 600dpi scanning, and has the capability to scan
sheets 100 inches long.

Post-processing is done using Lemkesoft’s
TIFF to PDF conversion is done using Eric Smith’s tumble
A final OCR step is done with Acrobat Pro.
I’ve continued to use tumble since it is MUCH faster than Acrobat for tif to pdf conversion.

The preferred form for any contributed text scan is as a collection of lossless
Group 4 fax compression (ITU-T recommendation T.6) images saved as TIFF
files with a minium scan resolution of 400 dpi.

Lower scan resolutions produce noticable artifacts if a page needs to be
straightened in post-processing.

Lossy compression formats, such as JPEG, should NEVER be used to save pages
of text, since the compression format destroys edge resolution and contrast

OCR

OCR has been part of the post-processing of scans for many years now
and is slowly being applied to older pdf files. It is a slow process and
it will take many years to complete.

jecel · February 12, 2023, 5:34pm

There are newer scanners which are basically a smartphone camera held in an interesting position above your table with some lights in nice angles. These can deal with non flat materials by shining a laser as a line and noting the distortion.

Some of the Youtubers who have reviewed such scanners include Curious Marc, Retro Mac Cave and Tech Tangents.

Michael_Barry · February 16, 2023, 4:50pm

Well, I decided to try April 1987, because it was only held together by staples. My scanner software crashed on page 66, complaining about “insufficient memory”, and I lost it all. I’ll snoop around for some “better” free software and try again later.

EdS · February 16, 2023, 4:55pm

Aargh! I feel bad enough when I lose a one-paragraph draft!

Michael_Barry · February 27, 2023, 4:32pm

Well, it was a bumpy path, but I finally got April 1987 uploaded. It’s yellowed around the edges, and the edges have also shrunk a bit, giving the pages a wavy appearance that I can’t quite eliminate, even with a rather heavy weight. Are there any constructive criticisms before I discard the original and move on to the next?

https://archive.org/details/computer-language-vol-4-no-4-1987-04-miller-freeman-publications-us/mode/1up

EdS · February 27, 2023, 4:58pm

Well done - looks very good to me. Just double check that you haven’t missed or duplicated a page! (I think it’s fine)

andyh2o · February 27, 2023, 9:11pm

Great job, well done.

The advert on Page 3 for Btrieve reminds me that we were using that still on one of our Netware 4 estate in the early 2000s. It was my responsibility as the Novell Admin to update the license for it…

Michael_Barry · March 2, 2023, 4:42am

May 1987 is now available. I won’t be pinging this thread for further individual issues unless asked to do so (there are at least 67 remaining, but I don’t have an estimate for how long it will take to do all of them).

Computer Language Vol 4 No 5 ( 1987 05)( Miller Freeman Publications)( US) : Free Download, Borrow, and Streaming : Internet Archive

EdS · March 6, 2023, 10:02pm

Nice! A couple of articles stood out for me:

Pick - a multilingual operating system

The six languages in the Pick environment provide a rich toolset for data base application development. These languages are Pick assembler; Data/ BASIC, a compiled, highly enhanced version of BASIC; Access, a data base query language; PROC, a sort of job control language, like batch files in MS-DOS or shell scripts in UNIX; Terminal Control Language (TCL), the command language used at the operating system prompt; and Correlatives and Conversions, the data description language used in the data dictionary of the Pick relational data base management system.
Conversing with Tiny ELIZA

Listing 1 shows the fundamentals of a general-purpose symbolic pattern matcher and applies it to a Tiny ELIZA. ELIZA falls into the category of first efforts and is essentially a language template. It was written in a language called SLIP (Symbolic List Processor) by Joseph Weizenbaum at MIT

Subsequently and elsewhere I came across this:

the road from language model to existential crisis is short indeed. Joseph Weizenbaum, who created ELIZA, the first chatbot, in 1966, spent most of the rest of his life regretting it. The technology, he wrote ten years later in Computer Power and Human Reason, raises questions that “at bottom … are about nothing less than man’s place in the universe.” The toys are fun, enchanting, and addicting, and that, he believed even 47 years ago, will be our ruin: “No wonder that men who live day in and day out with machines to which they believe themselves to have become slaves begin to believe that men are machines.”

Michael_Barry · March 25, 2023, 8:24pm

I didn’t realize until now, but I think I can just link you all to my upload page, which should enable you to keep track of my progress at your leisure:

https://archive.org/details/@michael_barry538