Hako: Stupidly simple DIY web archiving tool
I can't code to save my life, but that doesn't stop me from trying. One of my latest creations is a case in point. Since stuff tends to disappear unceremoniously from the Web, I usually save local copies of interesting articles. Up until recently, I used the SingleFile Firefox add-on for that, but the process involved too many manual steps for my liking. After several failed attempts to make Archivebox work, I decided to roll out my own tool based on monolith. The latter a simple command-line utility that saves complete web pages as single HTML files. It took me a few hours to cobble together a crude but usable tool that I named Hako (it means box in Japanese, and it sounds a bit like hacky, which I find somewhat appropriate).
Here's how Hako works. To archive the currently opened web page, select the title and click on the Hako bookmarklet. This sends the URL and the title of the page to the Hako PHP page that passes the received values to monolith. The latter then saves the page using the title as its file name. The very same page also shows a list of all archived pages. So it also acts as a no-frills read-it-later tool. It's also possible to archive pages manually using the dedicated Add form. That's all there is to it, really.
To deploy Hako on a local machine, install PHP along with the php-xml
and php-mbstring
packages on your system. To do this on Debian and Linux Mint, run the sudo apt install php php-xml php-mbstring
command. Clone then the project's Git repository using the git clone https://github.com/dmpop/hako.git
command. Switch to the resulting hako directory, open the config.php file for editing, and replace the default value of the $password
variable with the desired password. Save the changes and start the PHP server using the php -S 0.0.0.0:3000
command.
Next, add the following bookmarklet to the Bookmarks toolbar of your browser (replace 127.0.0.1 with the actual IP address of the machine running Hako and secret
with the string that matches the value of the $password
variable in the hako/config.php file):
javascript:var title=window.getSelection();location.href='https://127.0.0.1/index.php?url=%27+encodeURIComponent(location.href)+%27&title=%27+title+%27&password=secret
Now navigate to the page you want to archive, select the title, and click on the Hako bookmarklet. If the page has been archived successfully, you should see it in the list of saved pages.
If everything works properly, you might want to create a system service to start Hako automatically. Run the sudo nano /etc/systemd/system/hako.service
command and add the following definition (replace /path/to/hako with the actual path to the hako directory):
[Unit]
Description=Hako
Wants=syslog.service
[Service]
Restart=always
ExecStart=/usr/bin/php -S 0.0.0.0:3000 -t /path/to/hako
ExecStop=/usr/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl enable hako.service
sudo systemctl start hako.service
If you have a remote machine running a web server like Apache and PHP, deploying Hako is even easier. Move the entire hako folder to the document root of the server, and make the hako/archive directory writable by the server using the chown www-data:www-data -R /path/to/hako/archive
command.
Keep in mind that Hako is a very simple tool with its fair share of shortcomings. It doesn't provide any feedback, so the only indication that an archival action has been completed successfully is a created HTML file. The web UI lists the archived files with links to the original pages, and lets you delete desired items. And since there is no password protection, all the saved web pages are publicly accessible.