TimeMachine for a website

TimeMachine for a website

Introduction

As already mentioned, I am developing kabanashvili.com from scratch. Visual changes occur rapidly. Because of that, it became important for me to keep track of progress and development in time. So I implemented a simple utility called TimeMachine.

TimeMachine uses httrack to fully clone periodically https://kabanashvili.com. The result can be seen at TimeMachine. There you can choose versions of the website by date and it will be opened as a separate tab. You can browse how the website looked at that time. The implementation is easy, but it's already enough for such a job. I will show you how to implement a similar tool in your NodeJS app.

Install httrack

httrack is an offline browser utility. It has very rich functionality and history. Developed by Xavier Roche it's the most famous cloning tool for websites. The full documentation can be seen following the link.

To install it on Ubuntu/Debian based distributions run:

sudo apt-get install httrack

Add TimeMachine in NodeJS app

I placed the TimeMachine script named timemachine.js under the project's root/utils directory.

const fs = require('fs');
const cp = require('child_process');

const URL = 'https://kabanashvili.com/';
const ARCHIVE_PATH = 'archive/';

function archive() {
  const checkHTTrack = cp.spawn('httrack', ['-v']);

  checkHTTrack.on('error', () => {
    console.log("HTTrack isn't installed on your system!");
    process.exit(1);
  });

  checkHTTrack.on('exit', () => {
    console.log(`Archiving: ${URL}..`);
    const date = new Date();
    const formatedDate = `${date.getFullYear()}-${date.getMonth() + 1}-${date.getDate()}`;
    console.log(`Date: ${formatedDate}`);

    const dir = ARCHIVE_PATH + formatedDate;
    if (!fs.existsSync(dir)) {
      fs.mkdirSync(dir);
      console.log('Created dir..');
    } else {
      console.log('Dir already exists..');
    }

    console.log('Executing..');
    const child = cp.spawn('httrack', [URL, '--path', dir]);

    child.stdout.on('data', (data) => {
      console.log(`Archiving stdout: ${data}`);
    });

    child.on('error', (error) => {
      console.log(`${error}`);
    });

    child.on('exit', (code, signal) => {
      console.log(`Archiving finished with ${code} and signal ${signal}`);
    });
  });
}

archive();

Change the website URL with your own and directory if needed.

Add the run script in your package.json:

"timemachine": "node ./utils/timemachine.js"

Now when you run:

npm run timemachine

A new directory will be created and httrack will clone your website in it.

Displaying versions

I am using EJS templating and Express, but you can easily adapt this code to your stack.

First, let's prepare the route for TimeMachine.

const router = require('express').Router();
const fs = require('fs');

router.get('/', async (req, res) => {
  let versions = [];

  fs.readdir('archive', (err, items) => {
    for (let i = 0; i < items.length; i += 1) {
      versions.push({
        name: items[i],
        url: `archive/${items[i]}/kabanashvili.com/index.html`,
      });
      console.log(versions[i].date);
      versions = versions.sort((a, b) => new Date(a.name) - new Date(b.name));
    }

    res.render('timemachine/index.ejs', {
      navbar: -1,
      title: 'Timemachine | Giorgi Kabanashvili',
      description: 'Different versions of my website in time to visually track development.',
      image: 'https://kabanashvili.com/img/background.jpg',
      url: 'https://kabanashvili.com/timemachine',
      versions,
    });
  });
});

module.exports = router;

We need to sort versions, otherwise, it won't be ascending by date.

Finally, we can display via EJS.

<% include ../partials/header %>

<div class="container">
  <div class="py-5 text-center">
    <h1>TimeMachine</h1>
  </div>
  <hr>
  <div class="row">
    <div class="col-md-12">
      <h2>Versions by date:</h2>
      <% for(var i = 0; i < versions.length; i++) { %>
      <p>
        <a href="<%= versions[i].url %>" target="_blank">
          <%= versions[i].name %>
        </a>
      </p>
      <% } %>
    </div>
  </div>
</div>

<% include ../partials/footer %>

I used target blank for anchors since navigation back to the original website isn't currently possible without manipulation of a URL bar.

Conclusion

That's all! This small addition gave me the possibility to travel back in time without git. Now time to time I can visit TimeMachine and check my progress.

I have several ideas on how to improve this tool, but they won't be implemented any time soon. For example, the first addition would be timemachine.js accepting version, description as arguments, and creating README file inside the cloned directory. This way route could read this data and display more informative entry per date. Another idea is to inject JavaScript code in all cloned pages to display the return button on the original website. This would remove the necessity of opening a new tab for a version. And the last idea is to decouple TimeMachine as a separate npm module!