Distributing a Large IMC Website P2P

It's always a concern that an IMC can be taken offline. As the sites get larger, it's more of a challenge to keep it backed up and distributed. There's also the problem of software upgrades. While software changes, the website must stay up - the only way to do this is to port the site to new software, or archive the site to HTML files.

I used to be an advocate of porting, but now I'm really starting to change my mind. Porting has a big problem, in that the data is changed as its ported. Going from 1999 to 2012, I had to change the encoding from Latin-1 to UTF-8. It was mostly OK, but there were definitely glitches.

Also, the layout and format changes. Websites are no longer collections of pages. They are bits of text composed into pages. Increasingly, the composition is done via Javascript. So it's not even possible to "go back in time" and see what a site looked like a few years ago.

So, we're left with archiving.

IMC has a goal of also being resilient. That means, to me, that we also figure out how to distribute the archives to as many people as possible, constantly. The simplest system out there is probably Bittorrent. It's fast, distributed, and easy to use.

I think that dividing time up into months and years can work. Each archive would usually be under one gigabyte, and include all the posts and comments, hidden posts and comments, and linked images, audio, and video.

The structure of an archive would be
index.html - the complete index, regenerated daily of course

For the current month, there would be a daily addendum, which would be a .tar.gz file with the latest updates. The dailies would be generated by rebuilding all the changed articles, then doing a scan of the archives and gathering up all the changes. Using a daily would require the user to unpack the tar into the root of their archive. The DHTs would be distributed via an RSS feed.

there would also be monthly and annual update packs.

every few years, the entire archive should be rebuilt and re-seeded.


