Need input on building a highly available setup in aws

Hi all,

I want to build a highly available setup of sugarcrm in aws (nothing too special).

There are a few small thing probably more specific to sugarcrm that I need input from you!

There will be a load balancer (ELB) that will be distributing load to 3 apache server.

There will be 3 apache servers

There will be an elasticsearch cluster (3 node cluster) on the side for sugarcrm to talk to.

There will be 3 memcached server for sessions storage.

Finally, there will be a rds mysql instance in multi-az.

Of course all of the apache/elasticsearch/memcached server will also be in multiple az. For those less familiar on aws terminologie. AZ (availability zone)is a term used to refer to multiple data center. Even though they are in different data center they share a fast low latency network.

The objective is that if I loose a single machine or even a complete AZ, I want sugarcrm to keep running with no downtime.

Now to my questions:

I saw in quite a few post that people have been using nfs to share sugarcrm code between web server. Can I go on without it? That would be a single point of failure in my setup.

Considering that development is done on a single server. Can I juste tar the code or even better git clone on each web server and then do a build and repair on each web server and be done with it?

Regarding the cron.php. If each server has the code locally I guess I need to run the cron.php on every of my web server?

Thanks in advance for your time and help !

  • I use ObjectiveFS and it "works" but it's abysmally slow due to the Vardef caching (and who knows what other file I/O I might not be aware of currently) that can easily be offloaded to the cache layer.

    I expect NFS to be the same.

    The atomic rename will happen on the network filesystem itself because it doesn't have a configurable local location (although it should be configurable in many other ways like using memcached/redis instead...)

  • Instead of NFS or glusterfs, we've currently migrated to using EFS with provisioned throughput mode when re-building cache or doing upgrades. We can drop down to bursting mode for most workloads, however. It takes a little planning, as you can only change throughput modes once every 24 hours.

    To perform system upgrades, we reduce to a single application server then sync the files from the shared directories to local disk. The upgrade is performed, the httpd service is started and cache is rebuilt, and then the local contents are synced back to the EFS volume(s).

    Regarding the cron jobs, we have a single "worker" EC2 instances that's sized smaller than the application servers. It doesn't run the httpd service, but rather only runs the scheduled cron jobs. If we ran cron on multiple web servers at once, we would get duplicate inbound emails.