Advanced Docker Tips: Pt.2 - Know When to Cache

Advanced Docker Tips: Pt.2 - Know When to Cache

By James MacGowan
on March 25, 2021

With the increasing use of docker in cloud environments, I decided to write down some of my experiences and knowledge that I’ve gained from the work that I have done and witnessed. Don’t rack your brain out on your next production docker project, as I’ve already done that for you in this series of articles.

Know When to Cache

Even though you are requesting the latest updates to be applied in your dockerfile, docker will cache that layer instead of re-running the check for latest update. To prevent this from happening, you will have to instruct docker to not cache anything and rebuild from scratch.

docker build --pull --no-cache

Doing this will cause your image builds to take the maximum amount of time, even if nothing has changed.

To get around this, you have two options.

Move the OS update layer to the end of the docker file and use a ARG/ADD hack just before it to force the layer to be rebuilt every time.
Break your docker image build process into 2 phases.

Option 1:

Move the OS updates layer to the end of the dockerfile right before any USER, CMD, or ENTRYPOINT lines. Add an ARG line right before the OS update layer. You will not be using this value, but its existence will trick docker into rebuilding every layer below it. The ARG value needs to be something that is dynamic. You will need to pass in a unique value every time you run the build.

ARG NOCACHE=0

Then pass in a random value on the command line.

docker build --build-arg NOCACHE=$(date +%s)

If you are actively developing your own software used in the image, this is not a good option. The majority of the image build time is caused by the install of updates. If you a are trying to solve a problem and making a series of small changes, you will be stuck siting through a series of long image build process over and over again. This option would only work if you are sourcing a third party pre-built image that gets updated infrequently.

Option 2:

If you have multiple docker images using the same base image, this is the best choice to go with.

In the first phase your dockerfile will source the mainstream image and only include the OS updates layer. You will then use the –pull –no-cache docker command-line switches on this phase only. After building the image you will want to flatten its layers. There is a –squash option you could use, but it is still marked experimental. The old method is to create a container from the image, export the container, and then re-import the image.

docker export my_container > /home/export.tar
cat /home/export.tar | docker import - my_base_image:latest

You will now push your smaller image to a registry for use later. You would schedule this build process to happen ever night or every week, depending on your risk tolerance.

In the second phase you would have your original image build process with a few changes. You would remove the OS updates layer and change the FROM line to source your own base image instead of the mainstream one.

About James MacGowan

James started out as a web developer with an interest in hardware and open sourced software development. He made the switch to IT infrastructure and spent many years with server virtualization, networking, storage, and domain management.
After exhausting all challenges and learning opportunities provided by traditional IT infrastructure and a desire to fully utilize his developer background, he made the switch to cloud computing.
For the last 3 years he has been dedicated to automating and providing secure cloud solutions on AWS and Azure for our clients.