layout | title | canonical |
---|---|---|
default |
Monitoring Puppet Server metrics |
/puppetserver/latest/puppet_server_metrics.html |
Puppet Server tracks several advanced performance and health metrics, all of which take advantage of the metrics API. You can track these metrics using:
- Customizable, networked Graphite and Grafana instances
- A built-in experimental developer dashboard
- HTTP client metrics
- Metrics API endpoints
Note: The
grafanadash
andpuppet-graphite
modules referenced in this document are not Puppet-supported modules. They are provided as testing and demonstration purposes only.
Graphite is a third-party monitoring application that stores real-time metrics and provides customizable ways to view them. Puppet Server can export many metrics to Graphite, and exports a set of metrics by default that is designed to be immediately useful to Puppet administrators.
Note: A Graphite setup is deeply customizable and can report many Puppet Server metrics on demand. However, it requires considerable configuration and additional server resources. For a web-based dashboard of Puppet Server metrics built into Puppet Server, see the developer dashboard. To retrieve metrics through HTTP requests, see the metrics API.
To start using Graphite with Puppet Server, you must:
Grafana provides a web-based customizable dashboard that's compatible with Graphite, and the grafanadash
module installs and configures it by default.
The grafanadash
Puppet module quickly installs and configures a basic test instance of Graphite with the Grafana extension. When installed on a dedicated Puppet agent, this module provides a quick demonstration of how Graphite and Grafana can consume and display Puppet Server metrics.
WARNING: The
grafanadash
module is not a Puppet-supported module. It is designed for testing and demonstration purposes only, and tested against CentOS 6 only.Also, install this module on a dedicated agent only. Do not install it on the node running Puppet Server, because the module makes security policy changes that are inappropriate for a Puppet master:
SELinux can cause issues with Graphite and Grafana, so the module temporarily disables SELinux. If you reboot the machine after using the module to install Graphite, you must disable SELinux again and restart the Apache service to use Graphite and Grafana.
The module disables the
iptables
firewall and enables cross-origin resource sharing on Apache, which are potential security risks.
Install the grafanadash
Puppet module on a *nix agent. The module's grafanadash::dev
class installs and configures a Graphite server, the Grafana extension, and a default dashboard.
-
Install a *nix Puppet agent to serve as the Graphite server.
-
As root on the Puppet agent node, run
puppet module install cprice404-grafanadash
. -
As root on the Puppet agent node, run
puppet apply -e 'include grafanadash::dev'
.
Grafana runs as a web dashboard, and the grafanadash
module configures it to use port 10000 by default. To view Puppet metrics in Grafana, you must create a metrics dashboard, or edit and import a JSON-based dashboard that includes Puppet metrics, such as the sample Grafana dashboard that we provide.
-
In a web browser on a computer that can reach the Puppet agent node running Grafana, navigate to
http://<AGENT'S HOSTNAME>:10000
.There, you'll see a test screen that indicates whether Grafana can successfully connect to your Graphite server.
If Grafana is configured to use a hostname that the computer on which the browser is running cannot resolve, click view details and then the Requests tab to determine the hostname Grafana is trying to use. Next, add the IP address and hostname to the computer's
/etc/hosts
file on Linux or OS X, orC:\Windows\system32\drivers\etc\hosts
file on Windows. -
Download and edit our sample Grafana dashboard,
sample_metrics_dashboard.json
.a. Open the
sample_metrics_dashboard.json
file in a text editor on the same computer you're using to access Grafana.b. Throughout the file, replace our sample hostname of
master.example.com
with your Puppet Server's hostname. (Note: This value must be used as themetrics_server_id
setting, as configured below.)c. Save the file.
-
In the Grafana UI, click search (the folder icon), then Import, then Browse.
-
Navigate to and select the edited JSON file.
This loads a dashboard with nine graphs that display various metrics exported from the Puppet Server to the Graphite server. (For details, see Using the Grafana dashboard.) However, these graphs will remain empty until you enable Puppet Server's Graphite metrics.
Configure Puppet Server's metrics.conf
file to enable and use the Graphite server.
-
Set the
enabled
parameter to true inmetrics.registries.puppetserver.reporters.graphite
:metrics: { server-id: localhost registries: { puppetserver: { ... reporters: { ... # enable or disable Graphite metrics reporter graphite: { enabled: true } } } } }
-
Configure the Graphite host settings in
metrics.reporters.graphite
:- host: The Graphite host's IP address as a string.
- port: The Graphite host's port number.
- update-interval-seconds: How frequently Puppet Server should send metrics ot Graphite.
-
Verify that
metrics.registries.puppetserver.reporters.jmx.enabled
is not set to false. Its default setting is true.
Tip: In the Grafana UI, choose an appropriate time window from the drop-down menu.
The sample Grafana dashboard provides what we think is an interesting starting point. You can click on the title of any graph, and then click edit to tweak the graphs as you see fit.
-
Active requests: This graph serves as a "health check" for the Puppet Server. It shows a flat line that represents the number of CPUs you have in your system, a metric that indicates the total number of HTTP requests actively being processed by the server at any moment in time, and a rolling average of the number of active requests. If the number of requests being processed exceeds the number of CPUs for any significant length of time, your server might be receiving more requests than it can efficiently process.
-
Request durations: This graph breaks down the average response times for different types of requests made by Puppet agents. This indicates how expensive catalog and report requests are compared to the other types of requests. It also provides a way to see changes in catalog compilation times when you modify your Puppet code. A sharp curve upward for all of the types of requests indicates an overloaded server, and they should trend downward after reducing the load on the server.
-
Request ratios: This graph shows how many requests of each type that Puppet Server has handled. Under normal circumstances, you should see about the same number of catalog, node, or report requests, because these all happen once per agent run. The number of file and file metadata requests correlate to how many remote file resources are in the agents' catalogs.
-
Communications with PuppetDB: This graph tracks the amount of time it takes Puppet Server to send data and requests for common operations to, and receive responses from, PuppetDB.
-
JRubies: This graph tracks how many JRubies are in use, how many are free, the mean number of free JRubies, and the mean number of requested JRubies.
If the number of free JRubies is often less than one, or the mean number of free JRubies is less than one, Puppet Server is requesting and consuming more JRubies than are available. This overload reduces Puppet Server's performance. While this might simply be a symptom of an under-resourced server, it can also be caused by poorly optimized Puppet code or bottlenecks in the server's communications with PuppetDB if it is in use.
If catalog compilation times have increased but PuppetDB performance remains the same, examine your Puppet code for potentially unoptimized code. If PuppetDB communication times have increased, tune PuppetDB for better performance or allocate more resources to it.
If neither catalog compilation nor PuppetDB communication times are degraded, the Puppet Server process might be under-resourced on your server. If you have available CPU time and memory, increase the number of JRuby instances to allow it to allocate more JRubies. Otherwise, consider adding additional compile masters to distribute the catalog compilation load.
-
JRuby Timers: This graph tracks several JRuby pool metrics.
-
The borrow time represents the mean amount of time that Puppet Server uses ("borrows") each JRuby from the pool.
-
The wait time represents the total amount of time that Puppet Server waits for a free JRuby instance.
-
The lock held time represents the amount of time that Puppet Server holds a lock on the pool, during which JRubies cannot be borrowed.
-
The lock wait time represents the amount of time that Puppet Server waits to acquire a lock on the pool.
These metrics help identify sources of potential JRuby allocation bottlenecks.
-
-
Memory Usage: This graph tracks how much heap and non-heap memory that Puppet Server uses.
-
Compilation: This graph breaks catalog compilation down into various phases to show how expensive each phase is on the master.
The following example shows only the targets
parameter of a dashboard to demonstrate the full names of Puppet's exported Graphite metrics (assuming the Puppet Server instance has a domain of master.example.com
) and a way to add targets directly to an exported Grafana dashboard's JSON content.
"panels": [
{
"span": 4,
"editable": true,
"type": "graphite",
...
"targets": [
{
"target": "alias(puppetlabs.master.example.com.num-cpus,'num cpus')"
},
{
"target": "alias(puppetlabs.master.example.com.http.active-requests.count,'active requests')"
},
{
"target": "alias(puppetlabs.master.example.com.http.active-histo.mean,'average')"
}
],
"aliasColors": {},
"aliasYAxis": {},
"title": "Active Requests"
}
]
See the sample Grafana dashboard for a detailed example of how a Grafana dashboard accesses these exported Graphite metrics.
The following HTTP and Puppet profiler metrics are available from the Puppet Server and can be added to your metrics reporting. Each metric is prefixed with puppetlabs.<MASTER-HOSTNAME>
; for instance, the Grafana dashboard file refers to the num-cpus
metric as puppetlabs.<MASTER-HOSTNAME>.num-cpus
.
Additionally, metrics might be suffixed by fields, such as count
or mean
, that return more specific data points. For instance, the puppetlabs.<MASTER-HOSTNAME>.compiler.mean
metric returns only the mean length of time it takes Puppet Server to compile a catalog.
To aid with reference, metrics in the list below are segmented into three groups:
-
Statistical metrics: Metrics that have all eight of these statistical analysis fields, in addition to the top-level metric:
-
max
: Its maximum measured value. -
min
: Its minimum measured value. -
mean
: Its mean, or average, value. -
stddev
: Its standard deviation from the mean. -
count
: An incremental counter. -
p50
: The value of its 50th percentile, or median. -
p75
: The value of its 75th percentile. -
p95
: The value of its 95th percentile.
-
-
Counters only: Metrics that only count a value, or only have a
count
field. -
Other: Metrics that have unique sets of available fields.
Note: Puppet Server can export many, many metrics -- so many that enabling all of them at large installations can overwhelm Grafana servers. To avoid this, Puppet Server exports only a subset of its available metrics by default. This default set is designed to report the most relevant metrics for administrators monitoring performance and stability.
To add to the default list of exported metrics, see Modifying Puppet Server's exported metrics.
Puppet Server exports each metric in the lists below by default.
-
puppetlabs.<MASTER-HOSTNAME>.compiler
: The time spent compiling catalogs. This metric represents the sum of thecompiler.compile
,static_compile
,find_facts
, andfind_node
fields.-
puppetlabs.<MASTER-HOSTNAME>.compiler.compile
: The total time spent compiling dynamic (non-static) catalogs.To measure specific nodes and environments, see Modifying Puppet Server's exported metrics.
-
puppetlabs.<MASTER-HOSTNAME>.compiler.find_facts
: The time spent parsing facts. -
puppetlabs.<MASTER-HOSTNAME>.compiler.find_node
: The time spent retrieving node data. If the Node Classifier (or another ENC) is configured, this includes the time spent communicating with it. -
puppetlabs.<MASTER-HOSTNAME>.compiler.static_compile
: The time spent compiling static catalogs. -
puppetlabs.<MASTER-HOSTNAME>.compiler.static_compile_inlining
: The time spent inlining metadata for static catalogs. -
puppetlabs.<MASTER-HOSTNAME>.compiler.static_compile_postprocessing
: The time spent post-processing static catalogs.
-
-
puppetlabs.<MASTER-HOSTNAME>.functions
: The amount of time during catalog compilation spent in function calls. Thefunctions
metric can also report any of the statistical metrics fields for a single function by specifying the function name as a field.For example, to report the mean time spent in a function call during catalog compilation, use
puppetlabs.<MASTER-HOSTNAME>.functions.<FUNCTION-NAME>.mean
.
-
puppetlabs.<MASTER-HOSTNAME>.http.active-histo
: A histogram of active HTTP requests over time. -
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-catalog-/*/-requests
: The time Puppet Server has spent handling catalog requests, including time spent waiting for an available JRuby instance. -
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-environment-/*/-requests
: The time Puppet Server has spent handling environment requests, including time spent waiting for an available JRuby instance. -
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-environment_classes-/*/-requests
: The time spent handling requests to theenvironment_classes
API endpoint, which the Node Classifier uses to refresh classes. -
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-environments-requests
: The time spent handling requests to theenvironments
API endpoint requests. -
The following metrics measure the time spent handling file-related API endpoints:
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-file_bucket_file-/*/-requests
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-file_content-/*/-requests
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-file_metadata-/*/-requests
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-file_metadatas-/*/-requests
-
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-node-/*/-requests
: The time spent handling node requests, which are sent to the Node Classifier. A bottleneck here might indicate an issue with the Node Classifier or PuppetDB. -
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-report-/*/-requests
: The time spent handling report requests. A bottleneck here might indicate an issue with PuppetDB. -
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-static_file_content-/*/-requests
: The time spent handling requests to thestatic_file_content
API endpoint used by Direct Puppet with file sync.
Puppet Server uses an embedded JRuby interpreter to execute Ruby code. JRuby spawns parallel instances known as JRubies to execute Ruby code, which occurs during most Puppet Server activities.
See Tuning JRuby on Puppet Server for details on adjusting JRuby settings.
-
puppetlabs.<MASTER-HOSTNAME>.jruby.borrow-timer
: The time spent with a borrowed JRuby. -
puppetlabs.<MASTER-HOSTNAME>.jruby.free-jrubies-histo
: A histogram of free JRubies over time. This metric's average value should greater than 1; if it isn't, more JRubies or another compile master might be needed to keep up with requests. -
puppetlabs.<MASTER-HOSTNAME>.jruby.lock-held-timer
: The time spent holding the JRuby lock. -
puppetlabs.<MASTER-HOSTNAME>.jruby.lock-wait-timer
: The time spent waiting to acquire the JRuby lock. -
puppetlabs.<MASTER-HOSTNAME>.jruby.requested-jrubies-histo
: A histogram of requested JRubies over time. This increases as the number of free JRubies, or thefree-jrubies-histo
metric, decreases, which can suggest that the server's capacity is being depleted. -
puppetlabs.<MASTER-HOSTNAME>.jruby.wait-timer
: The time spent waiting to borrow a JRuby.
The following metrics measure the time that Puppet Server spends sending or receiving data from PuppetDB.
-
puppetlabs.<MASTER-HOSTNAME>.puppetdb.catalog.save
-
puppetlabs.<MASTER-HOSTNAME>.puppetdb.command.submit
-
puppetlabs.<MASTER-HOSTNAME>.puppetdb.facts.find
-
puppetlabs.<MASTER-HOSTNAME>.puppetdb.facts.search
-
puppetlabs.<MASTER-HOSTNAME>.puppetdb.report.process
-
puppetlabs.<MASTER-HOSTNAME>.puppetdb.resource.search
-
puppetlabs.<MASTER-HOSTNAME>.http.active-requests
: The number of active HTTP requests. -
The following counter metrics report the percentage of each HTTP API endpoint's share of total handled HTTP requests.
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-catalog-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-environment-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-environment_classes-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-environments-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-file_bucket_file-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-file_content-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-file_metadata-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-file_metadatas-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-node-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-report-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-resource_type-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-resource_types-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-static_file_content-/*/-percentage
-
puppetlabs.<MASTER-HOSTNAME>.http.puppet-v3-status-/*/-percentage
-
-
puppetlabs.<MASTER-HOSTNAME>.http.total-requests
: The total requests handled by Puppet Server.
-
puppetlabs.<MASTER-HOSTNAME>.jruby.borrow-count
: The number of successfully borrowed JRubies. -
puppetlabs.<MASTER-HOSTNAME>.jruby.borrow-retry-count
: The number of attempts to borrow a JRuby that must be retried. -
puppetlabs.<MASTER-HOSTNAME>.jruby.borrow-timeout-count
: The number of attempts to borrow a JRuby that resulted in a timeout. -
puppetlabs.<MASTER-HOSTNAME>.jruby.request-count
: The number of requested JRubies. -
puppetlabs.<MASTER-HOSTNAME>.jruby.return-count
: The number of JRubies successfully returned to the pool. -
puppetlabs.<MASTER-HOSTNAME>.jruby.num-free-jrubies
: The number of free JRuby instances. If this number is often 0, more requests are coming in than the server has available JRuby instances. To alleviate this, increase the number of JRuby instances on the Server or add additional compile masters. -
puppetlabs.<MASTER-HOSTNAME>.jruby.num-jrubies
: The total number of JRuby instances on the server, governed by themax-active-instances
setting. See Tuning JRuby on Puppet Server for details.
These metrics measure raw resource availability and capacity.
-
puppetlabs.<MASTER-HOSTNAME>.num-cpus
: The number of available CPUs on the server. -
puppetlabs.<MASTER-HOSTNAME>.uptime
: The Puppet Server process's uptime. -
Total, heap, and non-heap memory that's committed (
committed
), initialized (init
), and used (used
), and the maximum amount of memory that can be used (max
).-
puppetlabs.<MASTER-HOSTNAME>.memory.total.committed
-
puppetlabs.<MASTER-HOSTNAME>.memory.total.init
-
puppetlabs.<MASTER-HOSTNAME>.memory.total.used
-
puppetlabs.<MASTER-HOSTNAME>.memory.total.max
-
puppetlabs.<MASTER-HOSTNAME>.memory.heap.committed
-
puppetlabs.<MASTER-HOSTNAME>.memory.heap.init
-
puppetlabs.<MASTER-HOSTNAME>.memory.heap.used
-
puppetlabs.<MASTER-HOSTNAME>.memory.heap.max
-
puppetlabs.<MASTER-HOSTNAME>.memory.non-heap.committed
-
puppetlabs.<MASTER-HOSTNAME>.memory.non-heap.init
-
puppetlabs.<MASTER-HOSTNAME>.memory.non-heap.used
-
puppetlabs.<MASTER-HOSTNAME>.memory.non-heap.max
-
For details about HTTP client metrics, which measure performance of Puppet Server's requests to other services, see their documentation.
In addition to the above default metrics, you can also export metrics measuring specific environments and nodes.
The metrics.registries.puppetserver.metrics-allowed
parameter in metrics.conf
takes an array of strings representing the metrics you want to enable.
Omit the puppetlabs.<MASTER-HOSTNAME>
prefix and field suffixes (such as .count
or .mean
) from metrics when adding them to this class. Instead, suffix the environment or node name as a field to the metric.
For example, to track the compilation time for the production
environment, add compiler.compile.production
to the metrics-allowed
list. To track only the my.node.localdomain
node in the production
environment, add compiler.compile.production.my.node.localdomain
to the metrics-allowed
list.
Optional metrics include:
-
compiler.compile.<ENVIRONMENT>
andcompiler.compile.<ENVIRONMENT>.<NODE-NAME>
, and all statistical fields suffixed to these (such ascompiler.compile.<ENVIRONMENT>.mean
). -
compiler.compile.evaluate_resources.<RESOURCE>
: Time spent evaluating a specific resource during catalog compilation.
While not as customizable as a Graphite dashboard, the developer dashboard is a simple, built-in way to view basic Puppet Server metrics.
The developer dashboard features metrics particularly relevant to developers of Puppet manifests and modules, which are drawn from the metrics v1 API.
The dashboard charts the current and mean number of free and requested JRuby interpreters, as well as the mean JRuby borrow and wait times in milliseconds. It also lists the top 10 aggregate API endpoint requests, function calls, and resource declarations by total, mean, and aggregate counts. For more information about these metrics, see the documentation for the metrics v1 endpoints.
Open a web browser and go to https://<DNS NAME OF YOUR MASTER>:8140/puppet/experimental/dashboard.html
.