Introducing clustd, a new project for failover systems. The goal is to provide a fully automated cluster service across multiple systems in case a system goes down for either maintenance or an unexpected event.
When @steemdunk had an hour of downtime, I made this project as high priority to ensure downtime will be a thing of the past. This will be used to ensure the service stays alive with minimal impact in case a server goes down.
This has been a work in progress alongside some other projects for quite a while and it was time to publish the first part to this project. It has been rewritten a few times to get the design I wished to see.
Connections are made through web sockets. This allows for simplifying connection management internally and avoiding doing the initiation handshake for every client, which can get expensive on a server.
The system will maintain a persistent connection to each of its remote machines. There is active health checking for every machine. Each machine is responsible for health checking its peers. Pings are made every 1.5 seconds, allowing for a maximum of 3 seconds of downtime in a worst case scenario.
Running as a server and a client, it can handle inbound and outbound connections for full connection duplexing. For efficiency reasons and simplicity reasons, each machine will keep one connection open to each other rather than two.
When a master gets disconnected another machine will automatically become the master based on a naive consensus algorithm. Incorrect configurations can cause an error when determining the next master for the cluster.
This project does not make use of WSS/HTTPS, the approach is different but similar! The clear reason for this is that it is still subject to an attack if you turn off certificate checking. In most cases, since the connections are made directly through an IP instead of a domain, certificate checking will always fail.
The encryption protocol scheme is similar to that of SSL with some differences. First there is a handshake that occurs where the server assigns a random number to the client, called a "ticket", which gets concatenated with the secret key. Secondly, there is a message counter that also gets concatenated with the secret key.
This provides full replay protection and key rotation on top of the random IV to ensure complete security for each message. Attempting to replay any messages will result in a decryption error and the machine's connection will be deemed unreliable and closed.
The algorithm being used is AES-128-GCM. This takes care of the message MAC to ensure the message wasn't tampered with. A completely random IV is generated for each message on top of the additional key rotation protection.
It is recommended to use Node v9. Knowledge of node and npm are recommended but not required.
- Clone the repository: https://github.com/steemdunk/clustd.git
npx gulp build
The configuration is noted in the README of the project:
It is possible to run the cluster on the same machine using different ports for local testing. This has no use in a production environment, however. ;)
There are some additional configuration variables to be mentioned:
export DEBUG='clustd:*'To enable debugging the entire system
export CLUSTD_CONFIG=./my-config.ymlTo specify a configuration. By default the configuration path is set to
./config.yml, this allows changing the path if necessary.
Once everything is configured appropriately. Starting it up is easy:
Sample screenshot of a 3 machine cluster, with the 3rd machine down and the 2nd machine is the master. Full debug is enabled and the activity is clearly visible.
Drivers are next for implementation in the next part. They will be what controls a system (i.e. starting and stopping a service) when a server becomes the master or secondary.
While the cluster itself is ready, it's not fully ready to be useful yet. This project is still in the alpha stages and ongoing improvements will be made as progress continues.
Checkout the project
Posted on Utopian.io - Rewarding Open Source Contributors