This is an article that I've been meaning to write for some time (i think going back to a few conversations with Chad Crowell at Leiden EECI 2010), but with one thing and another never got around to doing until now. I'll also use this opportunity to promote my services at this juncture if anyone would like this for one of their clients.
The purpose of this article is to provide a load-balanced / distributed site running EE on multiple servers, running in different datacentres - hell, even different countries. This solution would be particularly useful for sites which expect large visitor traffic where performance is potentially in question by reducing load on any one server/site.
Firstly, there are a few pre-requisites to implementing this solution:
1) You require dedicated servers at each of your mirror locations (or at best, the ability to install some command-line software)
2) You need to have finite control of your domains DNS
3) Balls of Steel
Ok - so, you don't need number 3 but it does help if you don't mind going against the curve once in a while
We're going to start by structuring the master site in a way to allow us to easily set up the mirror.
Firstly, we need to ensure that any specific areas are in a central location. My implementation suggestion is as follows:
- templates (This is where your templates will be stored)
- css (This is your site CSS)
- img (This is master site images)
- uploads (This is your client uploads folder)
- scripts (for js/php scripts that your site use)
- sql (this folder should be set something specific to your own)
Now what we need to do is to set up our mirroring system. I've looked a various techniques to do this from rsync, ftp, etc - but in the end I opted for DropBox as a much more resilient solution.
I'm sure you all know what DropBox is and how it works - which is why for this solution it's a perfect fit. If you don't, I recommend you sign up for it.
If you're running Windows servers, then you can simply install the Windows version, or if you're running Linux/Unix, there are instructions on installing it command line available here.
Depending on the size of your site and the size of the ./assets folder will depend on whether the 2Gb free version is going to suffice or not.
Once you have installed DropBox on your master server, update the ~/Dropbox folder that it creates and point it to your ./www/assets/ folder as per the instructions on the dropbox wiki.
Thats now stage #1 complete. you have your assets folder set up and syncing into DropBox.
Next thing we need to do is to set up SQL. I originally looked a creating triggers to do what we need (especially for user generated content such as comments etc), but in the end for this solution decided it was easier just to do regular dumps of the database using CRON. Depending on the size/scale of the site and the frequency of site updates, depends on how often you want the CRON to take place. I set mine up to update every 30 minutes which seemed to work well enough, but you could equally have a lower or higher granulation.
The command which creates a SQL dump of your database is as follows:
mysqldump --opt --user=db_username --password=db_password db_name > www/assets/sql/slave1/db_name_dump.sql
If you run something like cPanel/WHM - you can use the built in Cron option to run this command at the granulation of your choice. If not, I recommend reading up on crontab.
Thats now stage #2 complete. You now have SQL dumps of your master data.
Now we need to set up the secondary server. The first thing you need to do is to manually replicate the install of EE with the same data structure as your master. It doesn't necessarily need to have the same database name since we can handle that, but the directory structure should be the same.
Once you're setup and you have an empty install, with the correct structure - repeat the DropBox install on this secondary server and link your account to the ./www/assets/ folder. Once you've done this, it should now start syncing the files from the master server. Once the sync has taken place, you might have to run a 'Synchronise Templates' the first time to get them into the system.
The next thing you need to do is to have a cron job on this server to *import* the SQL dumps into your secondary server. I did this with a simple shell script which polls the ./www/assets/sql/slave1/ folder for the dump file, then runs a SQL import as follows:
if [ -f db_name_dump.sql ] ; then
mysql -udb_user -pdb_pass db_name < db_name_dump.sql
Simply set up your cron on the slave server to run this shell script every 5 or 10 minutes - it doesn't use a lot of processing power and will ensure that your secondary/slave machine gets updated as quickly as possible.
Now we've got a secondary server with the exact same database and files as the master.
Thats now stage #3 complete. You have a secondary server with remote syncing.
We're now on the final run. Distributing the DNS.
My solution was to implement 'DNS Round Robin'. There are some advantages and disadvantages of this solution - but it is seamless to the end-user which is a big win plus it's also cost-effective to implement.
To implement a DNS Round Robin - you simply have to create multiple DNS records for the same domain name with the two IP addresses of both servers:
www.yourcompany.com. 60 IN A 126.96.36.199
www.yourcompany.com. 60 IN A 4188.8.131.52
The 60 in the above example represents TTL (Time to Live) - this is an indicator to tell clients how long to cache the DNS entry before looking again. Setting this relatively low (e.g. 60 seconds) increases the effectiveness of the round robin technique, however it could easily be set up for 5 minute intervals. The rule of thumb here however is the higher the number, the less effective the solution.
Finally, we need to create one more entry in the DNS and this one is for the client. Since we're running a master/secondary system - we need to make sure that the client *always* edits the master server. This can either be done using an alias such as 'master.yourcompany.com' or manually adding an entry into the hosts file for the client which points www.yourcompany.com to the master IP address.
It should be noted at this point that this technique is a load sharing mechanism rather than a load balancing mechanism. It does not gauge the "load" on the server in any way, but it shares the load among multiple hosts. One or more of the hosts in the pool will tend to get more activity than the other servers. DNS Round Robin should be quite effective up to about 10 servers though. If you are implementing more than one, basically duplicate your SQL dumps to appropriate folders (slave1/slave2/slave3 etc).
DNS also has no way of detecting physical failure e.g. if server 2 fails. As requests come in for www.yourcompany.com, the DNS will continue to forward one out of every two requests to the secondary server - which will subsequently fail. Effectively, 50% of all requests to www.yourcompany.com are now connecting to a black hole. This is an improvement over having just one web server and having all the requests being lost due to a hardware failure, but only to a certain degree.
I'm hoping this will give you a good idea of how, if you have a particularly busy site - you can implement an effective load balance solution for you or your clients.