Fail Fast mechanism is a set of defensive measures that Candlepin automatically takes in event of unavailability of external service (such as Qpid Broker) or misconfiguration of such external service. Some of the main challenges that Candlepin faces when external service fails include:
Fail Fast mechanism currently defends against problems with Qpid Broker. It uses feature called Suspend Mode to temporarily stop Candlepin’s operations.
Candlepin can operate in two modes: NORMAL mode and SUSPEND mode. The NORMAL corresponds to standard Candlepin operation. SUSPEND mode is a state in which Candlepin stops responding to most of the requests. When in SUSPEND mode, Candlepin will return HTTP code 503. Also, all the scheduled jobs are suspended. The only available resoruce is
/status endpoint. Thus, when in SUSPEND mode, clients cannot use Candlepin. The current mode of Candlepin can be discovered using
/status endpoint. It is recommended that this endpoint is used for polling the Candlepin mode.
The feature is enabled by default and can be controlled using config property
Candlepin automatically checks the external service to see if it is responsive and transitions to/from SUSPEND/NORMAL mode.
It does so every 10 seconds by default. This can be controlled using property
candlepin.amqp.suspend.transitioner_initial_delay . The detailed information about the operation of the automatic transitions can be enabled by logging config
log4j.logger.org.candlepin.controller.SuspendModeTransitioner=DEBUG. When Candlepin enters SUSPEND mode, the frequency of connectivity checks is growing. The growth is following the following formula:
DELAY = INITIAL_DELAY + (DELAY_GROWTH * FAILED_ATTEMPTS)
Where the defaults for the variables are:
candlepin.amqp.suspend.transitioner_delay_growth = 10 candlepin.amqp.suspend.transitioner_initial_delay = 10 candlepin.amqp.suspend.transitioner_max_delay = 300
FAILED_ATTEMPTS is the number of failed reconnection tries to the Qpid Broker. The idea behind this functionality is that we don’t want to check the connectivity too often so as not to produce massive amounts of log statements.
There is also
candlepin.amqp.suspend.transitioner_max_delay that gives ability to put upper limit to the resulting delay (-1 means unbounded).
Qpid Broker can be in three states:
eventexchange is flow stopped (overloaded). When the flow stopped, its not possible to send more messages to the exchange (the JMS client throws exception)
There is a special spec test
qpid_spec.rb that contains integration tests that expect running Candlepin and Qpid.
During the test, it is usually necessary to manipulate Qpid (start, stop, create queue).
candlepin_scenarios.rb contains class
CandlepinQpid that helps with that. Note that the spec tests need run in our Docker images, and thus must make sure to be compatible with supervisord.
The spec tests assume existence of special queue. To create it, it is necessary to deploy Candlepin with -q switch.
Qpid Management Framework is proprietary Qpid, message base, protocol. It is used to manage Qpid Broker and also to retrieve information about the broker. Candlepin uses QMF (implementation in
QpidQmf.java) in order to figure out whether an exchange is flow stopped.
At the startup of Candlepin, Qpid QMF is used to check for the connectivity to the broker. If it is not CONNECTED, then Candlepin immediatelly fails and stops the startup. This behavior can be controlled by property