Troubleshooting

Overview

If a container crashes when executing the entrypoint, there are a few things that can be done to help figure out what is going on.

Start the Container in Interactive Mode

Docker allows containers to be started with flags that direct the console output to the starter’s session. For example, to start a containerized jiffy application in this manner use the following command/flags:

Aardvark:pgelibraryapp stevem$ docker start pgelibraryapp -ia

When Docker starts the container, the ENTRYPOINT specified in the image’s Dockerfile will be executed. Due to the removal of the -d flag and the addition of the -ia flags in the docker start command, anything written to the container’s stdout/stderr will be output in the starter’s terminal session. The -i flag allows the starter to supply input if user-input is required (probably unusual in the jiffy-application case).

console output:

2018/06/23 00:09:41 package sqac init is running
2018/06/23 00:09:41 successfully loaded the config file...
2018/06/23 00:09:41 config: {172.17.0.2:8080 172.17.0.2:4444 dev 1 5 secret-pepper-key {postgres 192.168.1.65 5432 godev gogogo123 glrestgen}           jwtkeys/ecdsa384/ec384.priv.pem jwtkeys/ecdsa384/ec384.pub.pem   ES384 120 [{Library true} {Book true}]}
2018/06/23 00:09:41 ECDSA256PrivKeyFile:
2018/06/23 00:09:41 ECDSA256PubKeyFile:
2018/06/23 00:09:41 ECDSA384PrivKeyFile: jwtkeys/ecdsa384/ec384.priv.pem
2018/06/23 00:09:41 ECDSA384PubKeyFile: jwtkeys/ecdsa384/ec384.pub.pem
2018/06/23 00:09:41 ECDSA521PrivKeyFile:
2018/06/23 00:09:41 ECDSA521PubKeyFile:
2018/06/23 00:09:41 RSA256PrivKeyFile:
2018/06/23 00:09:41 RSA256PubKeyFile:
2018/06/23 00:09:41 RSA384PrivKeyFile:
2018/06/23 00:09:41 RSA384PubKeyFile:
2018/06/23 00:09:41 RSA512PrivKeyFile:
2018/06/23 00:09:41 RSA512PubKeyFile:
2018/06/23 00:09:41 Login() signing jwt's with ES384
Development settings selected...
Starting http server on: 172.17.0.2:8080
2018/06/23 00:09:41 listening for ws traffic on 172.17.0.2:4444
2018/06/23 00:09:43 gm.count: 1
2018/06/23 00:09:43 gm.count: 2
2018/06/23 00:09:44 gm.count: 3
2018/06/23 00:09:45 gm.count: 4
2018/06/23 00:09:45 JOIN: no leader detected - setting myID = 1 as leader
2018/06/23 00:09:45 gm.count: 5
2018/06/23 00:09:45 db and local leader set to 1, 172.17.0.2:4444
2018/06/23 00:09:45 initialization complete...
2018/06/23 00:09:45 leader info is: {1 172.17.0.2:4444}
2018/06/23 00:09:46 PING CYCLE: starting ping cycle...
2018/06/23 00:09:46 gm.count: 6
2018/06/23 00:09:46 sendPing assembled index slice: [0]
2018/06/23 00:09:46 sending PING: {PING 1 172.17.0.2:4444 1  0 1 172.17.0.2:4444 1 /ping ALIVE 1 0  true 0xc4205d2f30 false false  []}
2018/06/23 00:09:46 EncodeGMMessage &m.MemberMap: 0xc42005ece0
2018/06/23 00:09:46 websocket.Dial: ws://172.17.0.2:4444/ping
2018/06/23 00:09:46 websocket.Dial: ws://172.17.0.2:4444/ping complete
2018/06/23 00:09:46 gm.count: 7
...
...



The Container Crashes During Startup

Containers can crash during startup if something prevents the ENTRYPOINT from executing successfully. In the context of a jiffy application a common cause of failure during startup is failure to connect to the database (for example). When a container crashes on startup, the first thing to do is attempt to restart the container using the -ia flags as described above. It is possible that a corrective action can be determined and implemented by reviewing the stdout/stderr during the container startup.

If you are unable to correct the startup issue after reviewing the stdout/stderr output a good next step is logging into the container to investigate the issue. How can we do this if the container is not running?

Without going overboard on repeating the Docker online docs, it is helpful in these circumstances to remember that Docker images are constructed of so-called ‘layers’. Thinking back to the output of the docker build command, we see some numbers that look suspiciously like docker image id’s in each step of the image build output:

...
...
Step 13/18 : RUN apk add file
 ---> Running in 51acae660d55
(1/2) Installing libmagic (5.32-r0)
(2/2) Installing file (5.32-r0)
Executing busybox-1.27.2-r7.trigger
OK: 9 MiB in 13 packages
Removing intermediate container 51acae660d55
 ---> 566a1ee53ff0                              <---- this is interesting
Step 14/18 : RUN apk add busybox-extras
...
...

The Docker online docs tell us the following about image construction:

Each instruction in a Dockerfile creates a layer in the image. When you change the Dockerfile and rebuild the image, only those layers which have changed are rebuilt. This is part of what makes images so lightweight, small, and fast, when compared to other virtualization technologies.

This is important, because it informs us that Docker sees images as accretions of layers corresponding to the build-steps in the Dockerfile used to define the image. The next interesting point is that each layer corresponds to an intermediate image that inherits all of the intermediate images related to the previous docker build steps in the original Dockerfile. We can get a list of the image-layers for our container by running the docker history command:

  docker history pgelibraryapp

Running the command for the pgelibraryapp image results in output similar to:

IMAGE               CREATED             CREATED BY                                      SIZE  
2f8b3df80aea        About an hour ago   /bin/sh -c #(nop)  ENTRYPOINT ["/main"]         0B
7c6e201e86d2        About an hour ago   /bin/sh -c #(nop)  CMD ["-dev"]                 0B
0ccad193e938        About an hour ago   /bin/sh -c apk add postgresql-client            7.31MB
9e0557830716        About an hour ago   /bin/sh -c apk add openssh-client               3.09MB
1adf3e8bbaf2        About an hour ago   /bin/sh -c apk add busybox-extras               119kB
566a1ee53ff0        About an hour ago   /bin/sh -c apk add file                         5.12MB
e365a84b6aec        About an hour ago   /bin/sh -c apk update                           1.22MB
2963aa3091b8        About an hour ago   /bin/sh -c #(nop)  EXPOSE 8080                  0B
8f359a48314a        About an hour ago   /bin/sh -c #(nop)  ENV PORT=8080                0B
f8563aec3763        About an hour ago   /bin/sh -c #(nop) COPY dir:93ede63eb2fbf590a…   7.92kB
17afeeae80b5        About an hour ago   /bin/sh -c mkdir jwtkeys                        0B
1e9f1d632409        About an hour ago   /bin/sh -c sh setaddress.sh                     1.21kB
2381bd9d01b2        About an hour ago   /bin/sh -c /bin/chmod 755 setaddress.sh         352B
9996d0cbf44c        About an hour ago   /bin/sh -c #(nop) ADD file:b4cc3b29b753aaa7a…   352B
d2764e99b3f5        About an hour ago   /bin/sh -c #(nop) COPY file:1331fb51eea27513…   1.22kB
deec414c95e4        About an hour ago   /bin/sh -c #(nop) ADD file:5604ab9e3793fff27…   14MB
7bd0bd5537d4        2 days ago          /bin/sh -c #(nop)  LABEL maintainer=<stevem@…   0B
3fd9065eaf02        5 months ago        /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>           5 months ago        /bin/sh -c #(nop) ADD file:093f0723fa46f6cdb…   4.15MB

Starting from the bottom of the list and going up, we can see that each image builds on the images before it, all the way up to the image corresponding to the named pgelibraryapp image. This can be easily corroborated by running the docker image command with the pgelibraryapp image name as the command’s ls criteria:

docker image ls pgelib*

Resulting in output similar to:

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
pgelibraryapp       latest              2f8b3df80aea        About an hour ago   35MB

Compare the IMAGE ID of the pgelibraryapp image with the top-level image reported by the docker history pgelibraryapp command. They are the same. We can infer from this that if an image is an image is an image, we should be able to start any image in the history list.

Assuming that the container is crashing when attempting to execute the ENTRYPOINT, it may be useful to create and start a container using a source image that does everything but run the ENTRYPOINT step. We can do this by selecting the intermediate image that we would like to run as a container and running the following command:

docker run -it 0ccad193e938  /bin/sh

A check of the docker history output above tells us that intermediate image 0ccad193e938 contains everything that should be in our image with the exception of the ENTRYPOINT and CMD(flags) steps. Starting the intermediate image as shown tells Docker to create a container from the intermediate image, run it in the foreground starting an interactive session running /bin/sh:

  docker run -it 0ccad193e938 /bin/sh

Resulting in a root login to the container created from image 0ccad193e938:

/ # set
HISTFILE='/root/.ash_history'
HOME='/root'
HOSTNAME='1aac17789e87'
IFS='
'
OPTIND='1'
PATH='/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
PORT='8080'
PPID='0'
PS1='\w \$ '
PS2='> '
PS4='+ '
PWD='/'
SHLVL='1'
TERM='xterm'
/ #

From here we can try to figure out what the problem is by attempting to start the application directly, inspecting the Alpine Linux setup, adding and configuring missing packages, adjusting network setup (careful) etc. When the problem has been solved and the application can be started by running the docker-entrypoint.sh script in the container session the source Dockerfile/scripts can be updated to reflect the required changes.