A question that is often asked in Drupal circles, is how much can a Drupal site scale to, and what is the hardware necessary to make it do that. The answers often go on tangents, with some advocating multiple web boxes, with a reverse proxy in front of them, and multiple master/slave MySQL boxes too, like we have on Drupal.org.

Sign up to get FREE CRM Trial

The real answer is : it depends …

Depends on what? Many things, like:

  • How many and what modules do you have on the site?
  • Is your traffic mainly anonymous or logged in users?
  • The hardware you have.
  • The software configuration you have.

While there is no canned answer, there are plenty of things you can do to increase performance.

Here is a case study of a site that can handle a million page views a day, including a Digg front page.

Normal traffic pattern

This is the normal traffic patter for this site. The weekends are low traffic, but weekdays are busy. Monday is the busiest day of the week, where people login from work to catch up on the content that was posted over the weekend.

On a busy day, the site does close to 880,000 page views per day.

The site does around 17 to 18 million page views a month.

Much of the traffic on this site is by logged in users,
checking new content, commenting on it, and creating their own node.

Day & Date
Visits
Page Views
Hits
Bandwidth
Mon 25 Feb 2008
53,636
879,777
7,636,793
90.69 GB
Mon 26 Feb 2008
53,636
879,777
7,636,793
90.69 GB
Mon 27 Feb 2008
53,636
879,777
7,636,793
90.69 GB
– – – –
Mon 01 Mar 2008
53,636
879,777
7,636,793
90.69 GB
Mon 02 Mar 2008
53,636
879,777
7,636,793
90.69 GB
– – – –
Mon 03 Mar 2008
53,636
879,777
7,636,793
90.69 GB
Mon 04 Mar 2008
53,636
879,777
7,636,793
90.69 GB
Mon 05 Mar 2008
53,636
879,777
7,636,793
90.69 GB

On Digg’s front page

When this site recently got on Digg’s front page, it was on a Sunday, but the long tail was well into Monday. This caused Monday’s traffic to be 985,000 page views, with double the normal visits.

If the site was on Digg on a weekday, the 1 million page per view mark would have been broken easily.

Since Digg’s traffic is all anonymous, caching protects well against that.

Day & Date
Visits
Page Views
Hits
Bandwidth
Mon 25 Feb 2008
53,636
879,777
7,636,793
90.69 GB
Mon 26 Feb 2008
53,636
879,777
7,636,793
90.69 GB
Mon 27 Feb 2008
53,636
879,777
7,636,793
90.69 GB
Mon 27 Feb 2008
53,636
879,777
7,636,793
90.69 GB

Server Configuration

The site is on a single server, with all the LAMP stack running on it. It has dual Quad core Xeons (8 cores total), and 8GB or memory with 64bit Linux (Ubuntu server).

There is no reverse proxy.

There is no InnoDB.

There are no multiple boxes.

The video content from the site is served from another box, but all the images are still on the same box.

We use memcache without the database caching.

We use a few custom performance patches (alias whitelist, last access time is written every 5 minutes only not every page view).

More importantly, the site has only 46 modules, as opposed to the 80-120 we find on sites these days.

Server Resource Utilization

During the Digg, it was interesting to see resource utilization. Here are some graphs to illustrate.

This is the Apache number of processes. You can see how it jumped up to reach the MaxClients limit of 375 briefly. This is a safety mechanism to avoid swapping which can kill the server, at the expense of queuing some users.

This is the total number of processes in the server, mainly Apache processes.

This is the number of access (i.e. hits not pages!) per second that Apache is serving.This is around 340 per second, as opposed to the normal of 150 per second.

This is the number of bits going in and out of the internet ethernet facing port (eth0).

 

Click here to Contact Us.

Leave a Comment