Automatic saving of mentioned links to the wayback machine?

A random idea: wouldn’t it be useful if any link mentioned in the forum would get pushed to the Wayback Machine via their API?

1 Like

Yes! I can think of several ways to do it:

  • client-side as a browser plugin or as a bookmarklet
  • client-side but using a script to crawl the forum
  • client-side with the assistance of a plugin installed on the server
  • server-side but separately from serving pages to clients

I already from time to time ask the Internet Archive to save things - there’s a web front-end for the casual user, which includes a ‘save outlinks’ feature.

My preference is to keep the forum software and installation customised to the minimum extent possible, to keep it low-maintenance and reliable. But if there were a Discourse plugin, or a theme, we might well install it. The place to find and discuss Discourse machinery is here.

Some existing threads which turned up on a quick search:

1 Like

I do manually save links to the WM myself, but I am an unreliable robot, and my interests are not everyone’s. Something like a forum-side plugin would sound ideal.

1 Like

I have to confess, I’m not volunteering to try to understand this!

If I were to tackle it, knowing only what I know now, it would probably involve running httrack [or grab-site] on my machine, scraping for urls, and then (somehow) submitting them to the IA.

Just for the record, there are presently 1109 forum urls in the wayback machine, 359 of which look like threads, which is less than half the threads we have (997). Of course, this conversation isn’t so much about archiving the forum, as archiving its outlinks.

I reckon the forum is still small enough for anyone to archive it relatively casually, to scrape for URLs. A full forum backup is 700 MByte, but the actual HTML for the 997 threads would be a great deal less. (Try not to download all images, or all the JS.) Threads have numbered URLs: this thread can be found at Counting up to 2000 is pretty straightforward. The front page’s HTML should readily yield the number of the newest thread.

Neither am I volunteering, am just hoping to nerd-snipe someone into doing it :slight_smile:

Yeah, I was definitely thinking of archiving the outlinks, not the discussions. While the discussions in this august forum are most enjoyable, the primary sources are far more important.

1 Like