O'Reilly logo

Monitoring with Ganglia by Daniel Pocock, Peter Phaal, Matt Massie, Frederiko Costa, Jeff Buchbinder, Brad Nicholes, Bernard Li, Vladimir Vuksan, Dave Josephsen, Robert Alexander, Alex Dean

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. The Ganglia Web Interface

Vladimir Vuksan

Alex Dean

So far, this book has dealt with the collection of data. Now we will discuss visualizing it. Visualization of these data is the primary responsibility of a web-based application known as gweb. This chapter is an introduction to gweb and its features. Whether the job is understanding how a problem began in your cluster or convincing management that more hardware is required, a picture is worth a thousand data points.

Navigating the Ganglia Web Interface

gweb is organizaed into a number of top-level tabs: Main, Search, Views, Aggregated Graphs, Compare Hosts, Events, Automatic Rotation, Live Dashboard, and Mobile. These tabs allow you to easily jump right to the information you need.

The gweb Main Tab

gweb navigation overview
Figure 4-1. gweb navigation overview

gweb’s navigation scheme is organized around Ganglia’s core concepts: grids, clusters, and nodes. As you click deeper into the hierarchy, breadcrumb-style navigation links allow you to return to higher-level views. Figure 4-1 shows how you can easily navigate to exactly the view of the data you want.

Grid View

The grid view (Figure 4-2) provides the highest-level view available. Grid graphs summarize data across all hosts known to a single gmetad process. Grid view is the jumping-off point for navigating into more details displays dealing with individual clusters and the hosts that compose those clusters:

  1. Clicking on any grid-level summary graphs brings up the all time periods display. Clicking again enlarges the graph you’re interested in.

  2. Clicking on any cluster-level graph displays the cluster view.

Cluster View

A cluster is a collection of gmonds. They may be grouped by physical location, common workload, or any other criteria. The top of the cluster view (Figure 4-3) displays summary graphs for the entire cluster. A quick view of each individual host is further down the page.

Grid view
Figure 4-2. Grid view
Cluster view
Figure 4-3. Cluster view
  1. Clicking on a cluster summary shows you that summary of a range of time periods.

  2. Clicking on an individual host takes you to the host display.

The background color of the host graphs is determined by their one-minute load average. The metric displayed for each host can be changed using the Metric select box near the top of the page.

The utilization heatmap provides an alternate display of the one-minute load averages. This is a very quick way to get a feeling for how evenly balanced the workload is in the cluster at the present time. The heatmap can be disabled by setting

$conf["heatmaps_enabled"]=0

in conf.php.

When working with a cluster with thousands of nodes, or when using gweb over a slow network connection, loading a graph for each node in the cluster can take a significant amount of time. $conf["max_graphs"] can be defined in conf.php to address this problem: to set an upper limit on the number of host graphs that will be displayed in cluster view.

Physical view

Cluster view also provides an alternative display known as physical view (Figure 4-4), which is also very useful for large clusters. Physical view is a compressed text-only display of all the nodes in a cluster. By omitting images, this view can render much more quickly than the main cluster view.

Physical view
Figure 4-4. Physical view

Clicking on a hostname in physical view takes you to the node view for that host. Node view is another text-only view, and is covered in more detail in Host View.

Adjusting the time range

Grid, cluster, and host views allow you to specify the time span (Figure 4-5) you’d like to see. Monitoring an ongoing event usually involves watching the last few minutes of data, but questions like “what is normal?” and “when did this start?” are often best answered over longer time scales.

Choosing a time range
Figure 4-5. Choosing a time range

You are free to define your own time spans as well via your conf.php file. The defaults (defined in conf_default.php) look like this:

  #
  # Time ranges
  # Each value is the # of seconds in that range.
  #
  $conf['time_ranges'] = array(
     'hour' => 3600,
     '2hr'  => 7200,
     '4hr'  => 14400,
     'day'  => 86400,
     'week' => 604800,
     'month'=> 2419200,
     'year' => 31449600
  );

All of the built-in time ranges are relative to the current time, which makes it difficult to see (for example) five minutes of data from two days ago, which can be a very useful view to have when doing postmortem research on load spikes and other problems. The time range interface allows manual entry of begin and end times and also supports zooming via mouse gestures.

In both cluster and host views, it is possible to click and drag on a graph to zoom in on a particular time frame (Figure 4-6). The interaction causes the entire page to reload, using the desired time period. Note that the resolution of the data displayed is limited by what is stored in the RRD database files. After zooming, the time frame in use is reflected in the custom time frame display at the top of the page. You can clear this by clicking clear and then go. Zoom support is enabled by default but may be disabled by setting $conf["zoom_support"] = 0 in conf.php.

Zooming in on an interesting time frame
Figure 4-6. Zooming in on an interesting time frame

Host View

Metrics from a single gmond process are displayed and summarized in the host view (Figure 4-7). Summary graphs are displayed at the top, and individual metrics are grouped together lower down.

Host Overview contains textual information about the host, including any string metrics being reported by the host, such as last boot time or operating system and kernel version.

Host view
Figure 4-7. Host view

Viewing individual metrics

The “inspect” option for individual metrics, which is also available in the “all time periods” display, allows you to view the graph data interactively:

  1. Raw graph data can be exported as CSV or JSON.

  2. Events can be turned off and on selectively on all graphs or specific graphs.

  3. Trend analysis can make predictions about future metric values based on past data.

  4. Graph can be time-shifted to show overlay of previous period’s data.

Node view

Node view (Figure 4-8) is an alternative text-only display of some very basic information about a host, similar to the physical view provided at the cluster level.

Node view
Figure 4-8. Node view

Graphing All Time Periods

Clicking on a summary graph at the top of the grid, cluster, or host views leads to an “all time periods” view of that graph. This display shows the same graph over a variety of time periods: typically the last hour, day, week, month, and year. This view is very useful when determining when a particular trend may have started or what normal is for a given metric.

Many of the options described for viewing individual metrics are also available for all time periods, include CSV and JSON export, interactive inspection, and event display.

Search allows you to find hosts and metrics quickly. It has multiple purposes:

  • Find a particular metric, which is especially useful if a metric is rare, such as outgoing_sms_queue.

  • Quickly find a host regardless of a cluster.

Figure 4-9 shows how gweb search autocomplete allows you to find metrics across your entire deployment. To use this feature, click on the Search tab and start typing in the search field. Once you stop typing, a list of results will appear. Results will contain:

  • A list of matching hosts.

  • A list of matching metrics. If the search term matches metrics on multiple hosts, all hosts will be shown.

Searching for load_one metrics
Figure 4-9. Searching for load_one metrics

Click on any of the links and a new window will open that will take you directly to the result. You can keep clicking on the results; for each result, a new window will open.

The gweb Views Tab

Views are an arbitrary collection of metrics, host report graphs, or aggregate graphs. They are intended to be a way for a user to specify things of which they want to have a single overview. For example, a user might want to see a view that contains aggregate load on all servers, aggregate throughput, load on the MySQL server, and so on. There are two ways to create/modify views: one is via the web GUI, and the other by programatically defining views using JSON.i

Creating views using the GUI

To create views click the Views tab, then click Create View. Type your name, then click Create.

Adding metrics to views using the GUI

Click the plus sign above or below each metric or composite graph; a window will pop up in which you can select the view you want the metric to be added. Optionally, you can specify warning and critical values. Those values will appear as vertical lines on the graph. Repeat the process for consecutive metrics. Figure 4-10 shows the UI for adding a metric to a view.

Metric actions dialog
Figure 4-10. Metric actions dialog
Defining views using JSON

Views are stored as JSON files in the conf_dir directory. The default for the conf_dir is /var/lib/ganglia/conf. You can change that by specifying an alternate directory in conf.php:

$conf['conf_dir'] = "/var/www/html/conf";

You can create or edit existing files. The filename for the view must start with view_ and end with .json (as in, view_1.json or view_jira_servers.json). It must be unique. Here is an example definition of a view that will result with a view with three different graphs:

{
 "view_name":"jira",
 "items":[
  { "hostname":"web01.domain.com","graph":"cpu_report"},
  { "hostname":"web02.domain.com","graph":"load_report"},
  { "aggregate_graph":"true",
      "host_regex":[
        {"regex":"web[2-7]"},
        {"regex":"web50"}
      ],
      "metric_regex":[
        {"regex":"load_one"}
      ],
      "graph_type":"stack",
      "title":"Location Web Servers load"
  }
  ],
  "view_type":"standard"
 }

Table 4-1 lists the top-level attributes for the JSON view definition. Each item can have the attributes listed in Table 4-2.

Table 4-1. View items
KeyValue
view_nameName of the view, which must be unique.
view_typeStandard or Regex. Regex view allows you to specify regex to match hosts.
itemsAn array of hashes describing which metrics should be part of the view.
Table 4-2. Items configuration
KeyValue
hostnameHostname of the host that we want metric/graph displayed.
metricName of the metric, such as load_one.
graphGraph name, such as cpu_report or load_report. You can use metric or graph keys but not both.
aggregate_graphIf this value exists and is set to true, the item defines an aggregate graph. This item needs a hash of regular expressions and a description.
warning(Optional) Adds a vertical yellow line to provide visual cue for a warning state.
critical(Optional) Adds a vertical red line to provide visual cue for a critical state.

Once you compose your graphs, it is often useful to validate JSON—for example, that you don’t have extra commas. To validate your JSON configuration, use Python’s json.tool:

$ python -m json.tool my_report.json

This command will report any issues.

The gweb Aggregated Graphs Tab

Aggregate graphs (Figure 4-11) allow you to create composite graphs combining different metrics. At a minimum, you must supply a host regular expression and metric regular expression. This is an extremely powerful feature, as it allows you to quickly and easily combine all sorts of metrics. Figure 4-12 includes two aggregate graphs showing all metrics matching host regex of loc and metric regex of load.

Aggregate line graph
Figure 4-11. Aggregate line graph
Aggregate stacked graph
Figure 4-12. Aggregate stacked graph

Decompose Graphs

Related to aggregate graphs are decompose graphs, which decompose aggregate graphs by taking each metric and putting it on a separate graph. This feature is useful when you have many different metrics on an aggregate graph and colors are blending together. You will find the Decompose button above the graph.

The gweb Compare Hosts Tab

The compare hosts feature allows you to compare hosts across all their matching metrics. It will basically create aggregate graphs for each metric. This feature is helpful when you want to observe why a particular host (or hosts) is behaving differently than other hosts.

The gweb Events Tab

Events are user-specified “vertical markers” that are overlaid on top of graphs. They are useful in providing visual cues when certain events happen. For example, you might want to overlay software deploys or backup jobs so that you can quickly associate change in behavior on certain graphs to an external event, as in Figure 4-13. In this example, we wanted to see how increased rrdcached write delay would affect our CPU wait IO percentage, so we added an event when we made the change.

Event line overlay
Figure 4-13. Event line overlay

Alternatively, you can overlay a timeline to indicate the duration of a particular event. For example, Figure 4-14 shows the full timeline for a backup job.

Event timeline overlay
Figure 4-14. Event timeline overlay

By default, Ganglia stores event in a JSON hash that is stored in the events.json file. This is an example JSON hash:

[
 { "event_id":"1234",
     "start_time":1308496361,
     "end_time":1308496961,
     "summary":"DB Backup",
     "description":"Prod daily db backup",
     "grid":"*",
     "cluster":"*",
     "host_regex":"centos1"
 },
 { "event_id":"2345",
     "start_time":1308497211,
     "summary":"FS cleanup",
     "grid":"*",
     "cluster":"*",
     "host_regex":"centos1"
 }
]

It is also possible to use a different backend for events, which can be useful if you need to scale up to hundreds or thousands of events without incurring the processing penalty associated with JSON parsing. This feature is configured with two configuration options in your conf_default.php file. You should have PHP support for MySQL installed on your gweb server before attempting to configure this support. The database schema can be imported from conf/sql/ganglia.mysql:

# What is the provider used to provide events
# Examples: "json", "mdb2"
$conf['overlay_events_provider'] = "mdb2";
# If using MDB2, connection string:
$conf['overlay_events_dsn'] = "mysql://dbuser:dbpassword@localhost/ganglia";

Alternatively, you can add events through the web UI or the API.

Events API

An easy way to manipulate events is through the Ganglia Events API, which is available from your gweb interface at /ganglia/api/events.php. To use it, invoke the URL along with key/value pairs that define events. Key/value pairs can be supplied as either GET or POST arguments. The full list of key/value pairs is provided in Table 4-3.

Table 4-3. Events options
KeyValue
actionadd to add a new event, edit to edit, remove or delete to remove an event.
start_timeStart time of an event. Allowed options are now (uses current system time), UNIX timestamp, or any other well-formed date, as supported by PHP’s strtotime function.
end_timeOptional. Same format as start_time.
summarySummary of an event. It will be shown in the graph legend.
host_regexHost regular expression, such as web-|app-.

Examples

To add an event from your cron job, execute a command such as:

curl "http://mygangliahost.com/ganglia/api/events.php?\
  action=add&start_time=now&\
  summary=Prod DB Backup&host_regex=db02"

or:

curl -X POST --data " action=add&start_time=now\
  &summary=Prod DB Backup&host_regex=db02" \
  http://mygangliahost.com/ganglia/api/events.php

API will return a JSON-encoded status message with either status = ok or status = error.

If you are adding an event, you will also get the event_id of the event that was just added in case you want to edit it later, such as to add an end_time.

The gweb Automatic Rotation Tab

Automatic rotation is a feature intended for people in data centers who need to continuously rotate metrics to help spot early signs of trouble. It is intended to work in conjunction with views. To activate it, click Automatic Rotation and then select the view you want rotated. Metrics will be rotated until the browser window is closed. You can change the view while the view is rotated; changes will be reflected within one full rotation. Graphs rotate every 30 seconds by default. You can adjust the rotation delay in the GUI.

Another powerful aspect of automatic rotation is that if you have multiple monitors, you can invoke different views to be rotated on different monitors.

The gweb Mobile Tab

gweb mobile represents the Ganglia web interface optimized for mobile devices. This mobile view is found by visiting /ganglia/mobile.php on your gweb host. It is intended for any mobile browsers supported by the jQueryMobile toolkit. This display covers most WebKit implementations, including Android, iPhone iOS, HP webOS, Blackberry OS 6+, and Windows Phone 7. The mobile view contains only a subset of features, including views optimized for a small screen, host view, and search.

Custom Composite Graphs

Ganglia comes with a number of built-in composite graphs, such as a load report that shows current load, number of processes running, and number of CPUs; a CPU report that shows system CPU, user CPU, and wait IO CPU all on the same graph; and many others. You can define your own composite graphs in two ways: PHP or JSON.

Defining graphs via PHP is more complex but gives you complete control over every aspect of the graph. See the example PHP report for more details.

For typical use cases, JSON is definitely the easiest way to configure graphs. For example, consider the following JSON snippet, which will create a composite graph that shows all load indexes as lines on one graph:

{
   "report_name" : "load_all_report",
   "title" : "Load All Report",
   "vertical_label" : "load",
   "series" : [
      { "metric": "load_one", "color": "3333bb", "label": "Load 1",
        "line_width": "2", "type": "line" },
      { "metric": "load_five", "color": "ffea00", "label": "Load 5",
        "line_width": "2", "type": "line" },
      { "metric": "load_fifteen", "color": "dd0000", "label": "Load 15",
        "type": "line" }
   ]
}

To use this snippet, save it as a file and put it in the graph.d subdirectory of your gweb installation. The filename must contain _report.json in it to be considered by the web UI. So you can save this file in your gweb install as load_all_report.json.

There are two main sections to the JSON report. The first is a set of configurations for the overall report, and the second is a list of options for the specific data series that you wish to graph. The configuration options passed to the report are shown in Table 4-4.

Table 4-4. Graph configuration
KeyValue
report_nameName of the report that web UI uses.
titleTitle of the report to show on a graph.
vertical_labelY-axis description (optional).
seriesAn array of metrics to use to compose a graph. More about how those are defined in Table 4-5.

Options for series array are listed in Table 4-5. Note that each series has its own instance of the different options.

Table 4-5. Series options
KeyValue
metricName of a metric, such as load_one and cpu_system. If the metric doesn’t exist it will be skipped.
colorA 6 hex-decimal color code, such as 000000 for black.
labelMetric label, such as Load 1.
typeItem type. It can be either line or stack.
line_widthIf type is set to line, this value will be used as a line width. If this value is not specified, it defaults to 2. If type is stack, it’s ignored even if set.

Once you compose your graphs, it is often useful to validate JSON. One example would be to verify that there are no extra commas, etc. To validate your JSON configuration, use Python’s json.tool:

$ python -m json.tool my_report.json

This command will report any issues.

Other Features

There are a number of features in gweb that are turned off by default or can be adjusted:

Metric groups initially collapsed

By default, when you click on a host view, all of the metric groups are expanded. You can change this view so that only metric graph titles are shown and you have to click on the metric group to expand the view. To make this collapsed view the default behavior, add the following setting to conf.php:

$conf['metric_groups_initially_collapsed'] = true;
Filter hosts in cluster view

If you’d like to display only certain hosts in the cluster view, it is possible to filter them out using the text box that is located next to the “Show Node” dropdown. The filter accepts regular expressions, so it is possible to show any host that has “web” in its name by entering web in the filter box; to show only webservers web10−web17, type web1[0-7]; or, to show web03 and web04 and all MySQL servers, type (web0[34]|mysql). Note that the aggregate graphs will still include data from all hosts, including those not displayed due to filters.

Default refresh period

The host and cluster view will refresh every 5 minutes (300 seconds). To adjust it, set the following value in conf.php:

$conf['default_refresh'] = 300;
Strip domain name from hostname in graphs

By default, the gweb interface will display fully qualified domain names (FQDN) in graphs. If all your machines are on the same domain, you can strip the domain name by setting the strip_domainname option in conf.php:

$conf['strip_domainname'] = true;
Set default time period

You can adjust the default time period shown by adjusting the following variable:

$conf['default_time_range'] = 'hour';

Authentication and Authorization

Ganglia contains a simple authorization system to selectively allow or deny users access to certain parts of the gweb application. We rely on the web server to provide authentication, so any Apache authentication system (htpasswd, LDAP, etc.) is supported. Apache configuration is used for examples in this section, but the system works with any web server that can provide the required environment variables.

Configuration

The authorization system has three modes of operation:

$conf['auth_system'] = 'readonly';

Anyone is allowed to view any resource. No one can edit anything. This is the default setting.

$conf['auth_system'] = 'disabled';

Anyone is allowed to view or edit any resource.

$conf['auth_system'] = 'enabled';

Anyone may view public clusters without login. Authenticated users may gain elevated privileges.

If you wish to enable or disable authorization, add the change to your conf.php file.

When a user successfully authenticates, a hash is generated from the username and a secret key and is stored in a cookie and made available to the rest of gweb. If the secret key value becomes known, it is possible for an attacker to assume the identity of any user.

You can change this secret value at any time. Users who have already logged in will need to log in again.

Enabling Authentication

Enabling authentication requires two steps:

  1. Configure your web server to require authentication when accessing gweb/login.php, and to provide the $_SERVER['REMOTE_USER'] variable to gweb/login.php. (This variable is not needed on any other gweb page.)

  2. Configure your web server to provide $_SERVER['ganglia_secret']. This is a secret value used for hashing authenticated user names.

If login.php does not require authentication, the user will see an error message and no authorization will be allowed.

Sample Apache configuration

More information about configuring authentication in Apache can be found here. Note that Apache need only provide authentication; authorization is provided by gweb configuration. A sample Apache configuration is provided here:

SetEnv ganglia_secret yourSuperSecretValueGoesHere
<Files "login.php">
  AuthType Basic
  AuthName "Ganglia Access"
  AuthUserFile /var/lib/ganglia/htpasswd
  Require valid-user
</Files>

Other web servers

Sample configurations for other web servers such as Nginx and Lighttpd are available on the gweb wiki.

Access Controls

The default access control setup has the following properties:

  • Guests may view all public clusters.

  • Admins may view all public and private clusters and edit configuration (views) for them.

  • Guests may not view private clusters.

Additional rules may be configured as required. This configuration should go in conf.php. The GangliaAcl configuration property is based on the Zend_Acl property. More documentation is available here.

Note that there is no built-in distinction between a user and a group in Zend_Acl. Both are implemented as roles. The system supports the configuration of hierarchical sets of ACL rules. We implement user/group semantics by making all user roles children of the GangliaAcl::GUEST role, and all clusters children of GangliaAcl::ALL:

NameMeaning

GangliaAcl::ALL_CLUSTERS

Every cluster should descend from this role. Guests have view access on GangliaAcl::ALL_CLUSTERS.

GangliaAcl::GUEST

Every user should descend from this role. (Users may also have other roles, but this one grants global view privileges to public clusters.)

GangliaAcl::ADMIN

Admins may access all private clusters and edit configuration for any cluster.

GangliaAcl::VIEW

This permission is granted to guests on all clusters, and then selectively denied for private clusters.

GangliaAcl::EDIT

This permission is used to determine whether a user may update views and perform any other configuration tasks.

Actions

Currently, we only support two actions, view and edit. These are applied on a per-cluster basis. So one user may have view access to all clusters, but edit access to only one.

Configuration Examples

These should go in your conf.php file. The usernames you use must be the ones provided by whatever authentication system you are using in Apache. If you want to explicitly allow/deny access to certain clusters, you need to spell that out here.

All later examples assume you have this code to start with:

$acl = GangliaAcl::getInstance();
Making a user an admin

$acl->addRole( 'username', GangliaAcl::ADMIN );
Defining a private cluster
$acl->addPrivateCluster( 'clustername' );
Granting certain users access to a private cluster
$acl->addPrivateCluster( 'clustername' );
$acl->addRole( 'username', GangliaAcl::GUEST );
$acl->allow( 'username', 'clustername', GangliaAcl::VIEW );
Granting users access to edit some clusters
$acl->addRole( 'username', GangliaAcl::GUEST );
$acl->add( new Zend_Acl_Resource( 'clustername' ), GangliaAcl::ALL_CLUSTERS );
$acl->allow( 'username', 'clustername', GangliaAcl::EDIT );

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required