Accessing the modern Web from retro machines

NoLand · June 21, 2020, 11:51pm

You may know the scenario: a retro machine, maybe some kind of network connection, even a fairly capable browser like one of the Netscape variety, but no way to access the modern web.

Now Topi Talvitie has provided a solution to the problem in form of a (local) proxy service: render the website on the proxy, slice it to images and provide a browsable low-end version with the help of a bit of (retro-compatible) JavaScript.

Browservice: Browser as a Service

A web “proxy” server that enables browsing the modern web on historical browsers. It works by rendering the browser viewport into images, which are then shown by a JavaScript application running on the client browser.

In retrocomputing, the modern web is inaccessible, because up-to-date web browsers are not available for old operating systems, and old browsers do not support modern web standards. Furthermore, old operating systems and browsers should not be connected directly to the Internet because they typically have unpatched security vulnerabilities. Browservice circumvents these issues by offloading the web rendering to a proxy server running an up-to-date web browser; the actual client browser connecting to the proxy only needs to show the images sent by the proxy server and forward the user input back to the proxy.

This idea of using a proxy to render the browser view into images has been used before by WRP (Web Rendering proxy). Browservice differs from WRP in that it uses JavaScript on the client browser to animate the browser view and gather user input events, while in WRP, the user has to use web forms and image maps to provide the input, and the page has to be reloaded for every update in the view. Thus Browservice gives the user a more immersive web browsing experience, but also requires a newer client browser and more powerful hardware. While WRP can run on browsers as old as NCSA Mosaic 2.0, the earliest supported client browsers for Browservice are from late 90s and early 00s.

Compatible OSes comprise anything like Windows for Workgroups 3.11, OS/2 Warp 4.52, Win 95 and newer. (I guess, anything capable of running a Netscape or Mosaic browsers.)

Via HN: Show HN: Browservice – Browse the modern web on historical browsers | Hacker News

whartung · June 22, 2020, 3:38am

Opera did essentially this to implement a browser for the early iPhones. Basically doing “server side rendering” and sending images with clickable zones in them.

The other side of the coin is something akin to modern RDP (Remote Desktop Protocol), which was promoted early on by the company Citrix. For Windows, they essentially replaced the graphics DLL and streamed the events and commands over the wire.

I’ve used this on crummy 286’s over a 9600 baud modem. And it worked really well – until you got those horrid monster bitmap “splash” screens that were all the rage back then.

If you have the bandwidth to ship and display full screen images, then something like this can work. But you still have to be careful, especially on the modern web with all of it’s animations and such. You can send differential bitmaps to the client, but on a slow Win 3.11 box, it could still be pretty taxing.

Windows 95 machines were more powerful, and probably better able to keep up.

NoLand · June 22, 2020, 4:15am

The nice thing here ist that you have full control over the proxy, because it’s you who is running the service.
Citrix was quite popular, I’ve even known some businesses who were apt to move all browsing, etc, to Citrix connections. (If I remember this correctly, there have been security issues with Citrix later, so this strategy would have backfired quite bit.)
BTW, another way (which is, I believe, also mentioned in the HN comments) may be running a modern browser a on remote machine, provided there’s an X-server for your system. However, I’m afraid, that might be even more resource hungry.

EdS · June 22, 2020, 9:22am

It would be interesting to hear of - or see - some suitably retro machines accessing the web, by whichever chain of intermediaries.

Hackaday has a retro edition, “a lo-fi version of Hackaday without CSS or Javascript or any other cruft. It’s hand-written HTML (assembled by a script) of the first ten thousand or so Hackaday posts.”

Here’s Matthias Koch’s photo of his PC3 (“Atari’s PC compatible with an 8088 running at 8Mhz, 640k of RAM and a 20 MB hard drive”) showing the retro site:

I’d love to see a retro non-PC like an Archimedes or Amiga showing this page! There is a Psion 3A example…

Another old-web gateway mentioned in the HN discussion is http://theoldnet.com/ which re-serves pages from The Internet Archive in a relatively satisfactory way.

dasteph · June 24, 2020, 9:27am

I’ve thought for a while that a ‘simple mark-up’ language’ would be nice. Something easy enough for 8bit computers to handle that could be paired with a kind of client side BBS redering engine.

Don’t know how practical that would be, but limited to text only with minimal formatting tags it needn’t require much on top of a basic telnet client to render.

whartung · June 24, 2020, 2:18pm

HTML is a simple markup language. Especially when you simply ignore the elements that you don’t support. For most simple rendering aspects, a large amount of the tags are identical.

Tags are either blocks, or they’re inline. If it’s a block tag, you break and make a new paragraph. If it’s an inline tag, you simply ignore it (or you can apply a terminal effect like change the color, bold, underline, depends on what your terminal supports).

CSS can be ignored.

Next you have simple formatting tasks. Consider the BLOCKQUOTE tag. That can be as simply as proceeding each line with a | and a blank.

Next, you have the list and bullet elements. More paragraph filling with negative indents to place the list markers.

Tables are the biggest problem. They’re challenging to format in the first place, but on a restricted screen space of an 80x25 terminal, it gets even more challenging.

The final problem is simply size. To do it properly, some of the elements will need a couple of passes to do properly. A simple example is a numbered list. If you’re have 9 elements, you can just stream it simply. But when that 10th element arrives, you need to “go back”, and redo the earlier elements and pad the list item with a space. Obviously, this happens with every magnitude change (10, 100, 1000).

What should happen is the formatter should look ahead in the file to count the elements BEFORE it starts to render. Which is fine, but now you run in to resource constraints.

What if you’re trying to render a 100K byte HTML file? What if this particular list of elements is more than free memory?

Issues like this make it difficult to render HTML (or any markup language with aspects such as this) in a “streamable” fashion. That is, start at the top, format as you go, and render it on the way. Many formatting cases may require several passes. And on a large element, well, you have a problem.

The easiest solution is simply “don’t do that”. Don’t render large documents, don’t support rendering of large elements, etc. Consider them edge cases and simply let the cards lay as they may. Because they are edge cases. Sure, lists with 100+ elements exist. But, they’re rare. Why throw the baby out with the bathwater.

If you can stream the formatting, then you can render any size document.

In the end, to me, the singular problem to solve is TABLE rendering. As it manifests pretty much all of the issues. (For example, you could internally convert a bunch of LI (list) elements in to table rows and cells, and toss it to the table rendering code to do the actual work.)

jecel · June 24, 2020, 3:23pm

What about Markdown? It can even be used in posts on this forum.

NoLand · June 24, 2020, 5:01pm

Incidentially, I was just thinking of implementing a simple HTML renderer for the Kyocera portables (TRS 80 Model 100, etc).

Something like HTML 0.9 is really simple, and as we control the preprocessor, it may be even simpler. E.g., from the HEAD section, all we may care about is probably the title, so we may drop the HEAD section, omit the BODY-tags and have the title as an optional first element.

Regarding tables: as we may prerender them to terminal dimensions in the preprocessor, there isn’t necessarily much of a difference to a PRE section. We could even convert the former to the latter. (Especially, if there’s just a single font available and therefore no visual distinction on the terminal side.)

Forms are probably the most challenging elements, if there’s no built-in provision for them in the terminal, since the functionality has to be implemented on the client side.

Modern CSS layouts are probably the biggest obstacle to converting a web page to a simple stream, because of the various regions, their placement, which may be reordered depending on the client, etc. Starting with a screen reader as the input for the preprocessor may be a good idea, ARIA-tags and accessibility hints may also help with streamlining this to a simple text stream.

whartung · June 24, 2020, 5:44pm

At its core, Markdown is no less complicated to render than HTML. It suffers the same issues with lists and nesting etc as the rest. Original Markdown doesn’t support tables, but others do.

elb · June 24, 2020, 5:54pm

It does have the advantage that it prohibits arbitrarily deep nesting of most structures, and I think is closer to something that is manageable on a retro system. It’s not dissimilar to the markup used by historical word processors (such as AppleWorks or WordStar), which are provably usable on old hardware.

Tables are indeed a significant problem if they are not associated with dimensions at declaration time (as many older technologies require). Lists were historically handled by providing substantial indentation for every item in the list, and then laying out the numerals in that generous space. (This is, for example, what many versions of runoff or roff did.)

oldben · June 24, 2020, 6:28pm

I have no idea about modern web formats, but is not 99% of that just to put an anoying floating ad in the middle of what you are reading. If one had more text friendly web pages I would see that as a valid idea.
I wonder if this site would even render under Lynx

NoLand · June 24, 2020, 6:36pm

Regarding tables, these are a problem only, if done on the client side (terminal), since you have to collect data about all rows before rendering, which requires a significant amount of memory.
If done on the preprocessor in a higher level language, it should be fairly easy:

Start with an average weight for each column
Loop over rows and collect the max character count per cell, take note of max length per column
Collapse empty columns and shorten columns, which do not require the previously assigned width, distribute spare width over columns with remaining extra width (by weight).
Loop over rows and break cells to lines, then output by line. (Add empty lines resulting from vertical spans, where needed.)
Bonus for distributing weights for horizontal spans.

This should work most of the time (assuming vertical-align: top, which makes everything much easier).

whartung · June 24, 2020, 6:42pm

NoLand:

Forms are probably the most challenging elements, if there’s no built-in provision for them in the terminal, since the functionality has to be implemented on the client side.[/quote]

Sure, form fields need to be implemented, but that’s a lot easier than the initial problem of formatting the text on a resource constrained system in the first place. Checkboxes are simply fields with an X in them or not, radio buttons aren’t difficult, and just click buttons (“Submit”, “OK”) are essentially just tab stops and spacebar to click.

But, again, if you’re only RENDERING (vs interacting with), then it’s no big deal.

[quote]Modern CSS layouts are probably the biggest obstacle to converting a web page to a simple stream, because of the various regions, their placement, which may be reordered depending on the client, etc. Starting with a screen reader as the input for the preprocessor may be a good idea, ARIA-tags and accessibility hints may also help with streamlining this to a simple text stream.

CSS isn’t an obstacle at all – it’s ignored.

CSS is a style sheet for rendering, but HTML, especially more modern HTML, has moved towards its original goal of a more “semantic” markup.

Does that mean pages will be “pretty”? Nope. But you’ll be able to read them.

Consider this forum. This forum is a “web app”, not a “web page”. It’s not a page of formatted HTML waiting to be rendered, rather it’s mess of javascript and CSS tricks to empower it. For example, on long threads, it dynamically loads messages.

However, it’s also well written and degrades nicely. It works fine in Lynx (a terminal based browser that does not support Javascript nor CSS).

In Lynx, there’s a Next Page button instead of the dynamic loading and scrolling of messages.

See, while the modern web is all fancified and such with its themes, and CSS templates, animations, and JS libraries, HTML actually has an expectation to be rendered in to plain text. Early HTML relied a lot on things like tables for display formatting and such.

Now that’s mostly relegated to CSS, and blocks of text are are just blocks of text. Because in the end, that’s all HTML is. It’s either blocks or inline formatting. CSS handles all the rest (fonts, “bold”, positioning, etc.).

Now are all pages designed as such? No. But that’s an issue with the designer, separate from HTML.

NoLand · June 24, 2020, 6:50pm

One of the problems with modern CSS is things like flex-layout and grid allow the reordering of the elements in the stream. So the stream doesn’t necessarily represent a sensible order of elements. The same may be true for various regions, which may be handled by going for an accessible view and providing indicators and/or navigation for regions and other ARIA markup (like current element/page for navigational elements).

NoLand · June 24, 2020, 7:01pm

Regarding form elements: here’s the code for a simple password element in Commodore BASIC – similar may be used for a normal text input (however, we’d have to take cursor movements and horizontal scrolling into account, which are not an issue with passwords):

2998 REM PASSWORD INPUT TO PW$, ML: MAX LENGTH (1..255)
2999 REM CF (CURSOR FLAG) SET UP FOR PET, FOR C64 USE CF=204
3000 ML=20:CF=548:IF PEEK(50003) THEN CF=167:REM PET ROM DETECTION
3010 GET C$:IF C$<>"" THEN 3010
3020 PW$="":POKE CF,0
3030 PL=LEN(PW$)
3040 GET C$:IF C$="" THEN 3040
3050 A=ASC(C$):IF A=13 THEN POKE CF,1:PRINT " ":RETURN
3060 IF A=20 AND PL>0 THEN 3090:REM DEL/BACKSPACE
3070 IF (A AND 127) > 31 AND PL<ML THEN PW$=PW$+C$:PRINT "*";
3080 GOTO 3030
3090 PW$=MID$(PW$,1,PL-1):PRINT" ";CHR$(157);CHR$(157);" ";CHR$(157);:GOTO 3030

Using a pop-up dialog or a universal status line for these things (instead of inlining) may simplify things significantly. Textareas, however, are a completely different story.