Nomad
check Block
Placement | job -> group -> service -> check job -> group -> task -> service -> check |
The check
block instructs Nomad to register a check associated with a service
into the Nomad or Consul service provider.
job "example" {
datacenters = ["dc1"]
group "cache" {
network {
port "db" { to = 6379 }
}
service {
provider = "nomad"
name = "redis"
port = "db"
check {
name = "redis_probe"
type = "tcp"
interval = "10s"
timeout = "1s"
}
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
}
}
check
Parameters
address_mode
(string: "host")
- Same asaddress_mode
onservice
. Unlike services, checks do not have anauto
address mode as there's no way for Nomad to know which is the best address to use for checks. Consul needs access to the address for any HTTP or TCP checks. See below for details. Unlikeport
, this setting is not inherited from theservice
. If the serviceaddress
is set and the checkaddress_mode
is not set, the serviceaddress
value will be used for the check address.args
(array<string>: [])
- Specifies additional arguments to thecommand
. This only applies to script-based health checks.check_restart
- Seecheck_restart
block.command
(string: <varies>)
- Specifies the command to run for performing the health check. The script must exit: 0 for passing, 1 for warning, or any other value for a failing health check. This is required for script-based health checks. Only supported in the Consul service provider.Caveat: The command must be the path to the command on disk, and no shell exists by default. That means operators like
||
or&&
are not available. Additionally, all arguments must be supplied via theargs
parameter. To achieve the behavior of shell operators, specify the command as a shell, like/bin/bash
and then useargs
to run the check.grpc_service
(string: <optional>)
- What service, if any, to specify in the gRPC health check. gRPC health checks require Consul 1.0.5 or later.grpc_use_tls
(bool: false)
- Use TLS to perform a gRPC health check. May be used withtls_skip_verify
to use TLS but skip certificate verification. May be used withtls_server_name
to specify the ServerName to use for SNI and validation of the certificate presented by the server being checked.initial_status
(string: <enum>)
- Specifies the starting status of the service. Valid options arepassing
,warning
, andcritical
. Omitting this field (or submitting an empty string) will result in the Consul default behavior, which iscritical
. Only supported in the Consul service provider. In the Nomad service provider, the initial status of a check ispending
until Nomad produces an initial check status result.success_before_passing
(int:0)
- The number of consecutive successful checks required before Consul will transition the service status topassing
. Only supported by the Consul service provider and not applicable for health checks of type "script".failures_before_critical
(int:0)
- The number of consecutive failing checks required before Consul will transition the service status tocritical
. Only supported by the Consul service provider and not applicable for health checks of type "script".failures_before_warning
(int:0)
- The number of consecutive failing checks required before Consul will transition the service status towarning
. Only supported by the Consul service provider and not applicable for health checks of type "script".interval
(string: <required>)
- Specifies the frequency of the health checks that Consul or Nomad service provider will perform. This is specified using a label suffix like "30s" or "1h". This must be greater than or equal to "1s".method
(string: "GET")
- Specifies the HTTP method to use for HTTP checks. Must be a valid HTTP method.body
(string: "")
- Specifies the HTTP body to use for HTTP checks.name
(string: "service: <name> check")
- Specifies the name of the health check. If the name is not specified Nomad generates one based on the service name.path
(string: <varies>)
- Specifies the path of the HTTP endpoint which will be queried to observe the health of a service. Nomad will automatically add the IP of the service and the port, so this is just the relative URL to the health check endpoint. This is required for HTTP-based health checks.expose
(bool: false)
- Specifies whether an Expose Path should be automatically generated for this check. Only compatible with Connect-enabled task-group services using the default Connect proxy. If set, checktype
must behttp
orgrpc
, and checkname
must be set. Only supported in the Consul service provider.port
(string: <varies>)
- Specifies the label of the port on which the check will be performed. Note this is the label of the port and not the port number unlessaddress_mode = driver
. The port label must match one defined in thenetwork
block. If a port value was declared on theservice
, this will inherit from that value if not supplied. If supplied, this value takes precedence over theservice.port
value. This is useful for services which operate on multiple ports.grpc
,http
, andtcp
checks require a port whilescript
checks do not. Checks will use the host IP and ports by default. Numeric ports may be used ifaddress_mode="driver"
is set on the check.protocol
(string: "http")
- Specifies the protocol for the HTTP-based health checks. Valid options arehttp
andhttps
.task
(string: "")
- Specifies the task associated with this check. Scripts are executed within the task's environment, andcheck_restart
blocks will apply to the specified task. Inherits theservice.task
value if not set. Must be unset or equivalent toservice.task
in task-level services.timeout
(string: <required>)
- Specifies how long to wait for a health check query to succeed. This is specified using a label suffix like "30s" or "1h". This must be greater than or equal to "1s"Caveat: Script checks use the task driver to execute in the task's environment. For task drivers with namespace isolation such as
docker
orexec
, setting up the context for the script check may take an unexpectedly long amount of time (a full second or two), especially on busy hosts. The timeout configuration must allow for both this setup and the execution of the script. Operators should use long timeouts (5 or more seconds) for script checks, and monitor telemetry forclient.allocrunner.taskrunner.tasklet_timeout
.type
(string: <required>)
- This indicates the check types supported by Nomad. For Consul service checks, valid options aregrpc
,http
,script
, andtcp
. For Nomad service checks, valid options arehttp
andtcp
.tls_server_name
(string: "")
- Indicates the ServerName to use for SNI and validation of the certificate presented by the server being checked, when performing TLS enabled checks (https
andgrpc
withgrpc_use_tls
). If left unspecified, the ServerName will be inferred from the address of the server being checked unless the address is an IP address. There are two common cases where this is beneficial:When the check address contains an IP,
tls_server_name
can be specified for SNI. Note: settingtls_server_name
will also override the hostname used to verify the certificate presented by the server being checked.When the check address contains a hostname which isn't be present in the SAN (Subject Alternative Name) field of the certificate presented by the server being checked. Note: setting
tls_server_name
will also override the hostname used for SNI.
This field is only supported in the Consul service provider.
tls_skip_verify
(bool: false)
- Skip verification of certificates forhttps
andgrpc
withgrpc_use_tls
checks.on_update
(string: "require_healthy")
- Specifies how checks should be evaluated when determining deployment health (including a job's initial deployment). This allows job submitters to define certain checks as readiness checks, progressing a deployment even if the Service's checks are not yet healthy. Checks inherit the Service's value by default. The check status is not altered in the service provider and is only used to determine the check's health during an update.require_healthy
- In order for Nomad to consider the check healthy during an update it must report as healthy.ignore_warnings
- If a Service Check reports as warning, Nomad will treat the check as healthy. The Check will still be in a warning state in Consul.ignore
- Any status will be treated as healthy.
Caveat:
on_update
is only compatible with certaincheck_restart
configurations.on_update = "ignore_warnings"
requires thatcheck_restart.ignore_warnings = true
.check_restart
can however specifyignore_warnings = true
withon_update = "require_healthy"
. Ifon_update
is set toignore
,check_restart
must be omitted entirely.
header
Block
HTTP checks may include a header
block to set HTTP headers. The header
block parameters have lists of strings as values. Multiple values will cause
the header to be set multiple times, once for each value.
service {
# ...
check {
type = "http"
port = "lb"
path = "/_healthz"
interval = "5s"
timeout = "2s"
header {
Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="]
}
}
}
HTTP Health Check
This example shows a service with an HTTP health check. This will query the
service on the IP and port registered with Nomad at /_healthz
every 5 seconds,
giving the service a maximum of 2 seconds to return a response, and include an
Authorization header. Any non-2xx code is considered a failure.
service {
check {
type = "http"
port = "lb"
path = "/_healthz"
interval = "5s"
timeout = "2s"
header {
Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="]
}
}
}
Multiple Health Checks
This example shows a service with multiple health checks defined. All health checks must be passing in order for the service to register as healthy.
service {
check {
name = "HTTP Check"
type = "http"
port = "lb"
path = "/_healthz"
interval = "5s"
timeout = "2s"
}
check {
name = "HTTPS Check"
type = "http"
protocol = "https"
port = "lb"
path = "/_healthz"
interval = "5s"
timeout = "2s"
method = "POST"
}
check {
name = "Postgres Check"
type = "script"
command = "/usr/local/bin/pg-tools"
args = ["verify", "database", "prod", "up"]
interval = "5s"
timeout = "2s"
on_update = "ignore_warnings"
}
}
gRPC Health Check
gRPC health checks use the same host and port behavior as http
and tcp
checks, but gRPC checks also have an optional gRPC service to health check. Not
all gRPC applications require a service to health check.
service {
check {
type = "grpc"
port = "rpc"
interval = "5s"
timeout = "2s"
grpc_service = "example.Service"
grpc_use_tls = true
tls_skip_verify = true
}
}
In this example Consul would health check the example.Service
service on the
rpc
port defined in the task's network resources block. See
Using Driver Address Mode for details on address
selection.
Script Checks with Shells
Note that script checks run inside the task. If your task is a Docker container, the script will run inside the Docker container. If your task is running in a chroot, it will run in the chroot. Please keep this in mind when authoring check scripts.
This example shows a service with a script check that is evaluated and interpolated in a shell; it
tests whether a file is present at ${HEALTH_CHECK_FILE}
environment variable:
service {
check {
type = "script"
command = "/bin/bash"
args = ["-c", "test -f ${HEALTH_CHECK_FILE}"]
}
}
Using /bin/bash
(or another shell) is required here to interpolate the ${HEALTH_CHECK_FILE}
value.
The following examples of command
fields will not work:
# invalid because command is not a path
check {
type = "script"
command = "test -f /tmp/file.txt"
}
# invalid because path will not be interpolated
check {
type = "script"
command = "/bin/test"
args = ["-f", "${HEALTH_CHECK_FILE}"]
}
Healthiness versus Readiness Checks
Multiple checks for a service can be composed to create healthiness and readiness
checks by configuring on_update
for the check.
service {
# This is a healthiness check that will be used to verify the service
# is responsive to tcp connections and behaving as expected.
check {
name = "connection_tcp"
type = "tcp"
port = 6379
interval = "10s"
timeout = "2s"
}
# This is a readiness check that is used to verify that, for example, the
# application has elected a leader by making a request to its /leader endpoint.
# Failures of this check are ignored during deployments.
check {
name = "leader_elected"
type = "http"
path = "/leader"
interval = "10s"
timeout = "2s"
on_update = "ignore"
}
}
For checks registered into the Nomad service provider, the status information will
indicate Mode = readiness
for readiness checks and Mode = healthiness
for health
checks.
Check status on CLI
For checks registered into the Nomad service provider, the status information of
checks can be viewed per-allocation. The alloc status
command now includes
summary information for Nomad service checks.
$ nomad alloc status <allocation-id>
Nomad Service Checks:
Service Task Name Mode Status
database task db_tcp_probe readiness success
web (group) healthz healthiness failure
web (group) index-page healthiness success
The alloc checks
command can be used for viewing complete check status information
for all checks in an allocation.
$ nomad alloc checks <allocation-id>
Status of 3 Nomad Service Checks
ID = d8651d93a50b9e28375a7beb9418c418
Name = db_tcp_probe
Group = example.group[0]
Task = task
Service = database
Status = success
Mode = readiness
Timestamp = 2022-08-22T10:41:23-05:00
Output = nomad: tcp ok
ID = 0413b61bda7014f02671675d7e146373
Name = index-page
Group = example.group[0]
Task = (group)
Service = web
Status = success
StatusCode = 200
Mode = healthiness
Timestamp = 2022-08-22T10:41:23-05:00
Output = nomad: http ok
ID = c3cce3f0c97975f84bbf39bdd50deaea
Name = healthz
Group = example.group[0]
Task = (group)
Service = web
Status = failure
Mode = healthiness
Timestamp = 2022-08-22T10:41:23-05:00
Output = nomad: Get "http://:9999/": dial tcp :9999: connect: connection refused
1 Script checks are not supported for the QEMU driver since the Nomad client does not have access to the file system of a task for that driver.