Process Startup / Join

Process Startup

When an application instance (process) starts, the group-membership subsystem is started as a go-routine within the application.

Member Communication

The group-membership service communicates between processes via web-sockets connections. Each process accepts group-membership messages on the address:port specified by the ‘internal_address’ key in the application configuration file. This port is not secured, and the group-membership messages are not encrypted in any way. It is advisable that the address:port used here is not accessible to the outside world.

Due to the round-robin-every-process-pings-every-other-process model used by the failure detector, the group-membership service can be rather busy.

Group Leadership Overview

Election Overview

In order to participate in a group, each starting process needs to know who the current group-leader is. Group-leadership is determined internally via the failure-detector and a bully-type leader-election. Each process knows who the current group-leader is at any given time. If a failure of the group-leader is detected, an election is held whereby the process with the highest process-id (PID) assumes leadership of the group.

Assertion of leadership is disseminated to all group-members by the new leader via the COORDINATOR message. This takes care of the current group-members, but processes wishing to join the group must have a way to determine who the current group-leader is.

Joining Overview

Joining processes must access a well-known persistent store that holds the current group-leader information. This means that each joining process must know the location of, and the manner in which the persistent store can be queried. The persistent store exists outside of the group-membership and is typically some sort of name-value key-store like redis or memcached.

To this end, the .xxx.config.json file offers the option of running the application in stand-alone mode (self==leader), or with redis, memcached or sluggo as KVS repositories for the group-leader information.

A general interface is provided by Jiffy to allow the implementing developer to add support for the persistent store of their choice. Enhancement points have not been added for the “custom_kvs_impl” key yet, but it should be quite easy to insert the required code. The rationale here is that many implementers will have a preference in terms of the key-value store they use, and may already have a suitable KVS in their productive landscape. The interface provided by Jiffy for this purpose is gmcom.GMLeaderSetterGetter and it must be implemented in the appobj/lead_set_get.go file.

Interface gmcom.GMGetterSetter

Interface gmcom.GMGetterSetter contains methods to facilitate read/write access to the one-and-only persisted leader record in any key-value-store. Applications implementing the interface may choose to store the persisted current leader in a number of mediums; flat-file, db table record, redis, memcached etc.

Interface gmcom.GMGetterSetter
GetDBLeader() (*GMLeader, error)
GetDBLeader is provided in order to allow the implementer to retrieve the current group-leader information from the persistent store. The implementation of this method is intended to be self-contained.

For example, if the implementation calls for the current leader information to be persisted in redis, the implementer should code a self-contained method to connect to redis, retrieve the leader information and return it in the GMLeader pointer. Failure to read a current leader from the persistent store should result in the return of a nil in place of the *GMLeader pointer and a non-nil error value.

SetDBLeader(l GMLeader) error
SetDBLeader is provided in order to allow the implementer to persist a newly elected leader's information in the persistent store. The implementation of this method is intended to be self-contained.

For example, if the implementation calls for the current leader information to be persisted in redis, the implementer should code a self-contained method to connect to redis and store the leader information provided by input parameter *l*. Failure to persist the provided leader information should result in the implementer returning a non-nil value in the *error* return parameter.

Cleanup() error
Cleanup is provided in order to allow the implementer to cleanup connection pools, open and/or busy connections before the hosting application shuts down in response to a SIGKILL, os.Interrupt or SIGINT event. If the implementation does not require any cleanup, this method can be implemented to simply return nil.

Interface gmcom.GMSetterGetter Sample Implementation

// LeadSetGet provides a sample implementation of the GMLeaderSetterGetter
// interface in order to support persistence of the current group-leader
// information.  re-implement these methods as you see fit to facilitate
// storage and retrieval of the leader information to and from the persistent
// storage.  This example uses a quick and dirty web-socket-based cache to
// handle the persistence.  It works well enough for testing, but you should
// use something more robust like a database, redis etc.  The methods in the
// GMLeaderSetterGetter interface are called when a new process is attempting
// to join the group and also when a new leader is selected via the coordinator
// process.
//
// To test with the delivered interface implementation, install and run sluggo:
// go get -u github.com/1414C/sluggo
//
// Execute sluggo from the command-line as follows:
// go run main.go -a <ipaddress:port>
// For example:
// $ go run main.go -a 192.168.1.40:7070
//
type LeadSetGet struct {
  gmcom.GMLeaderSetterGetter
}

// GetDBLeader retrieves the current leader information from
// the persistence layer.
func (sg *LeadSetGet) GetDBLeader() (*gmcom.GMLeader, error) {

  // access the database here to read the current leader
  l := &gmcom.GMLeader{}
  wscl.GetCacheEntry("LEADER", l, "192.168.1.40:7070")
  return l, nil
}

// SetDBLeader stores the current leader information in the
// persistence layer.
func (sg *LeadSetGet) SetDBLeader(l gmcom.GMLeader) error {

  // access the database here to set a new current leader
  wscl.AddUpdCacheEntry("LEADER", &l, "192.168.1.40:7070")
  return nil
}

// Cleanup closes connections to group-leadership KVS when
// prior to application shutdown.
func (sg *LeadSetGet) Cleanup() error {
  
  // perform cleanup(s)
  return nil
}

Startup & Group Leader Identification

When the application starts in group-membership mode, the process must determine who the group-leader is as described in the preceding sections. The process tries to join an existing group four times before giving up, and within each join attempt the process tries to read the current leader PID from the persistent store three times.

Group Leader Identification

When the starting process is able to read the PID and IP Address:port of the current group-leader from the persistent store, it proceeds to the next step where an attempt to contact the leader is made.

If the starting process is unable to read the leader information from the persistent store, the process waits for a randomized period of between 1 and 1000 ms. Once the wait is over, the process tries to read the current leader from the persistent store again. This loop will be repeated up to four times before the process gives up and attempts to assume the role of group-leader itself.

A distinction is not currently made between being unable to contact the persistent store and not finding a current leader after reading the persistent store.

Group Leader is Found

Once a starting process has obtained a set of group-leader information from the persistent store that looks valid, it uses the information to send a Join message to the group-leader. There are a few possible outcomes to the sending of the Join message:

  • The group-leader accepts the Join message. When the group-leader accepts the Join message from the starting process it checks for problems as outlined below. If all goes well, the group-leader will return an Ack message containing a PID for use by the starting process. At this point, the starting process will initialize it’s failure detector loop and start participating in the group.

  • The group-leader could not be contacted. In this case, the starting process waits for a randomized period of time between 1 and 1000ms, then loops back to read the group-leader information from the persistent store again in hope that it has been updated. This can occur up to four times, after which the starting process will attempt to become the group-leader.

  • The group-leader information read from the persistent store was not correct. It is possible that the leader information in the persistent store is stale during a cold-startup. In this case, although a process was reachable at the group-leader’s ip-address:port, the so-called group-leader process was not in fact the leader. The starting process will wait for a randomized period of time between 1 and 1000ms, then loop back to read the group-leader information from the persistent store again in hope that it has been updated. This can occur up to four times, after which the starting process will report an error and gracefully exit.

  • A group-leader was found but has the current PID’s ip-address. It is possible that the starting PID’s address matches that of the group-leader that was read from the persistent store. This can happen in scenarios where the group-leader is the last process standing and then terminates without warning.

    The persistent store retains the last group-leader information (Pi) resulting in a successful leader-ping if the first process to restart in the group happens to hold the same ip-address as that of the old group-leader. A successful ping (Pi->Pi) looks like there is a functioning group-leader.

    If the joining process sees it’s own ip-address in the group-leader slot within the persistent store, it is best to act like there is no current group-leader and wipe the read (leader) information clean. This will repeat four times as described above, then the current PID will attempt to assert itself as group-leader. The randomized time is used to offset potentially simultaneous multiple group-leadership assertions in mass cold-start situations.

    It is important to remember/know that a starting process does not start participating in the group failure-detector until it has either joined an existing group, or has become the leader thereby starting a new group.

  • The leader reports that the ip-address:port of the starting process is already in use within the group.
    This can occur due to a misconfigured failure threshold value (too high), or when a process is killed and restarted immediately. In this case, the starting process waits 5 seconds and then loops back to read the group-leader information and make another attempt to join the group. This can occur up to four times, after which the starting process will report an error and gracefully exit.

  • An unknown error occurs. This is a catch-all for failed join attempts. In this case, the starting process waits 5 seconds and then loops back to read the group-leader information and make another attempt to join the group. This can occur up to four times, after which the starting process will report an error and gracefully exit.

A group-leader is not found

If a valid group-leader has not been found after four attempts, the starting process attempts to become the group-leader. The starting process will:

  • Set it’s PID to 1.
  • Add itself to the local process member map.
  • Call gm.SendSetDBLeader to set itself as group-leader in the persistent store.
  • Start it’s failure-detector loop in a go-routine.