So… here’s dibs, which stands for Docker Image Build System. Put it
very, very bluntly… it is what I use instead of docker build
lately.
It already comes with documentation and many frills, but we will start
very slow and explore features in a series of posts. Buckle up!
Curious about the whole Dibs Saga? See a list of all posts on dibs.
Table of Contents
- Table of Contents
- Quick “why dibs”
- Example?
- Installing dibs
- Getting The Target Software
- The Configuration File
- Generate The Image
- What Happened?
- Make
build
More Efficient - Make
build
even more efficient - Enhancing
bundle
- Where Are We Now?
Quick “why dibs”
Why did I start working on dibs? Part is due to my ignorance: I honestly didn’t know about multistage Dockerfiles, and I quickly got bored of managing it myself.
This is not the whole story though. Another aspect of the Dockerfile
system that is a bit too… raw in my opinion is how to execute things.
As far as I know, you either use the RUN
directive providing a usually
growing list of commands all stitched together with &&
s, or you put
those commands in a script, COPY
it into the container, then RUN
it.
I’d say this is basically what’s needed, but not the best from the
usability point of view.
Last, I also grew a bit tired of the caching mechanism provided by Docker
when building images via Dockerfile. Don’t get me wrong, it’s really neat;
my only concern is that sometimes it led to situations that required some
debugging before figuring out what was going wrong, especially when
building images from a remote repository dynamically cloned via git
. I
felt that having direct and explicit control over the caching mechanism
would help in a lot of my use cases.
Example?
In this series of posts I’ll be showing examples of growing complexity. In this starting one there will be actually nothing (I guess) that cannot be done more or less directly with a Dockerfile, but it’s helpful to set the stage.
The main goal is to generate a Docker image for sample-mojo. This
project is somehow purely focused on providing a simple web endpoint,
without wasting too much time on how it will be deployed, apart providing
an Heroku-compatible Procfile
:
web: perl ./app.pl daemon --listen "http://*:$PORT"
and declaring the Perl’s dependencies in a cpanfile
:
requires 'Mojolicious', '7.08';
So, we will have to do the heavy lifting:
- install the modules based on the
cpanfile
- provide a wrapper that uses
Procfile
This is not all though, we also want:
- avoid bloating the target image, i.e. we don’t want any build tools inside it
- compile and run under an unprivileged user
ada
, to avoidroot
as much as possible`
Hence, the sequence of operations will be like this:
- build the modules in a container were we install all needed building tools
- copy the final complete application in a temporary cache
- create another container with only the tools needed at runtime, namely Perl
- copy the application artifacts from the temporary cache into their final position in this container
- put a wrapper script to interpret and execute the
Procfile
in the container - set the proper
ENTRYPOINT
,CMD
andUSER
on the container - save the container as image.
Installing dibs
dibs needs Docker to do anything interesting, so chances are that you
already have Docker. In this case, installing dibs is as simple as
creating the following dibs
script somewhere in the PATH
, like this:
$ cat >dibs <<'END'
#!/bin/sh
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "$PWD:/mnt" -w /mnt -e "DIBS_HOST_REMAP_DIR=/mnt:$PWD" \
-- polettix/dibs:0.5 "$@"
END
$ chmod +x dibs
Getting The Target Software
In this example, we will work in the so-called alien mode, i.e. a mode where the software to be packaged is not aware of dibs.
First of all, we will create a directory that will be the base of our
operations. We will call it alien01
:
~$ mkdir alien01
~$ cd alien01
Next, we will get the software, which dibs expects to find in the
src
sub-directory:
alien01$ git clone https://gitlab.com/polettix/sample-mojo.git src
The Configuration File
dibs works based on a configuration file. In our case, the configuration is the following YAML file:
---
name: dibs-example-sample-mojo
actions:
default: [build, bundle]
build:
- from: 'alpine:3.6'
- name: build
pack:
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add build-base wget perl perl-dev
adduser -D -h /app ada
wget -O /bin/cpanm --no-check-certificate https://cpanmin.us/
chmod +x /bin/cpanm
cat >/tmp/as-ada.sh <<'END'
cd /app
cp -R /tmp/src/* .
cpanm -l local --notest --installdeps .
END
chmod +x /tmp/as-ada.sh
su - ada /tmp/as-ada.sh
cp -a /app /tmp/cache
bundle:
- from: 'alpine:3.6'
- name: install
pack:
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add perl
adduser -D -h /app ada
cp -a /tmp/cache/app /
cat >/procfilerun <<'END'
#!/bin/sh
set -e
PROCFILE_TYPE="${1:-"web"}"
export PERL5LIB='/app/local/lib/perl5'
cd /app
while read -r type command ; do
[ -n "$type" ] || continue
[ "x${type%${type#?}}" != 'x#' ] || continue
[ "x$type" = "x$PROCFILE_TYPE:" ] || continue
exec /bin/sh -c "exec $command"
printf >&2 'could not execute command "%s"\n' "$command"
exit 1
done <Procfile
printf >&2 'invalid process type %s, not in Procfile\n' "$PROCFILE_TYPE"
exit 1
END
chmod +x /procfilerun
commit:
entrypoint: [/procfilerun]
cmd: [web]
user: ada
- name: save bundled image
tags: sample-mojo-alien01:latest
This file can also be downloaded here has to be saved
as dibs.yml
in the project directory alien01
.
Generate The Image
Up to now, the layout of our alien01
directory should be the following:
alien01$ ls -l
total 8
-rw-r--r-- 1 poletti poletti 2073 Feb 24 18:18 dibs.yml
drwxr-xr-x 3 poletti poletti 4096 Feb 24 17:18 src
alien01$ ls -l src
total 16
-rwxr-xr-x 1 poletti poletti 234 Feb 24 17:18 app.pl
-rw-r--r-- 1 poletti poletti 32 Feb 24 17:18 cpanfile
-rw-r--r-- 1 poletti poletti 52 Feb 24 17:18 Procfile
-rw-r--r-- 1 poletti poletti 326 Feb 24 17:18 README.md
If not… look again at the previous sections! Otherwise, let’s start dibs:
alien01$ dibs --alien
base configuration from: /mnt/dibs.yml
=====> sketch default
=====> sketch build
-----> prepare (from: alpine:3.6)
remove tag dibs-example-sample-mojo:20190224-192933-1 if exists
tag alpine:3.6 to dibs-example-sample-mojo:20190224-192933-1
tagging alpine:3.6 as dibs-example-sample-mojo:20190224-192933-1
-----> stroke build
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
(1/23) Installing binutils-libs (2.30-r1)
[...]
(23/23) Installing wget (1.20.1-r0)
Executing busybox-1.26.2-r11.trigger
OK: 200 MiB in 36 packages
--2019-02-24 19:29:39-- https://cpanmin.us/
Resolving cpanmin.us... 151.101.130.217, 151.101.66.217, 151.101.2.217, ...
Connecting to cpanmin.us|151.101.130.217|:443... connected.
[...]
--> Working on .
Configuring /app ... OK
==> Found dependencies: Mojolicious
--> Working on Mojolicious
Fetching http://www.cpan.org/authors/id/S/SR/SRI/Mojolicious-8.12.tar.gz ... OK
Configuring Mojolicious-8.12 ... OK
Building Mojolicious-8.12 ... OK
Successfully installed Mojolicious-8.12
<== Installed dependencies for .. Finishing.
1 distribution installed
committing working container to dibs-example-sample-mojo:20190224-192933-1
sha256:f4cbc0ec9fa94937ab15ff1c4481edcf9bd5248bc5c687d3801dbadb44cacee7
removing working container
2dd042ebc612a352328316b624edeffaf008cce7cd495b01bb860ef6a62867a6
=====> sketch bundle
-----> prepare (from: alpine:3.6)
remove tag dibs-example-sample-mojo:20190224-192933-1 if exists
tag alpine:3.6 to dibs-example-sample-mojo:20190224-192933-1
tagging alpine:3.6 as dibs-example-sample-mojo:20190224-192933-1
-----> stroke install
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz
(1/2) Installing libbz2 (1.0.6-r5)
(2/2) Installing perl (5.24.4-r2)
Executing busybox-1.26.2-r11.trigger
OK: 40 MiB in 15 packages
committing working container to dibs-example-sample-mojo:20190224-192933-1
sha256:fb1aa5f8843953381d7e8608e4bbf956f06ce38d6f39c2fd4604d3244813e871
removing working container
e41fcfc70ab2b0d60c9a51624ccd51e0eff877cd397e0f35f6cbaa8504690009
+++++> frame save bundled image
tagging dibs-example-sample-mojo:20190224-192933-1 as sample-mojo-alien01:latest
removing tag dibs-example-sample-mojo:20190224-192933-1
Untagged: dibs-example-sample-mojo:20190224-192933-1
sample-mojo-alien01:latest
A bit too much to digest? Let’s take a look!
Actions
dibs executes actions, which can be of differen types. You can specify the type explicitly, but it’s normally not needed as it’s clear what an action is about.
At the top level, it always executes a sketch, that is a sequence of
other actions (including other sketches). In our case, we didn’t specify
any explicit action to be run on the command line, so the default
sketch
has been selected:
actions:
default: [build, bundle]
i.e. the actions build
and bundle
(which are other sketches
themselves, being lists) have to be executed. This accounts for the
“external” structure of the output:
alien01$ dibs --alien
base configuration from: /mnt/dibs.yml
=====> sketch default
=====> sketch build
[...]
=====> sketch bundle
[...]
The two sketches build
and bundle
have a similar structure: they
contain a list of actions (they’re sketches, after all!) that will result
in something done:
actions:
# ...
build:
- from: 'alpine:3.6'
- name: build
pack:
run: |
#!/bin/sh
# ...
bundle:
- from: 'alpine:3.6'
- name: install
pack:
run: |
#!/bin/sh
# ...
commit:
entrypoint: [/procfilerun]
cmd: [web]
user: ada
- name: save bundled image
tags: sample-mojo-alien01:latest
Actions that contain a from
should sound familiar to anyone used to
Dockerfiles: its goal is exactly the same, i.e. define the starting point
of a sequence of container layers. This action has type preparation.
After it, both have an action of type stroke, i.e. something that is
executed inside the container. This action is characterized by having
a pack
field inside, which in our case specifies an “immediate program”
that will be executed in the container. There are many other ways of
providing what has to be executed in the container, as we will see in the
future.
Last, the bundle
sketch also contains a closing action setting the tag
name for the image. This action is of type frame. Notice that build
does not have a corresponding one: we’re simply not interested in the
byproduct of that chain of action, or better we are not interested in
saving the resulting container as an image.
These action types should ring a bell about the metaphor that dibs adopts: as our goal is to generate an image, we assemble one or more sketches, in each of which we first prepare, then draw some strokes, then frame the result if we are happy with it.
What Happened?
TL;DR each of the sketches build
and bundle
started from the same
image alpine:3.6
and executed a sequence of commands inside a container.
The program for build
eventually saved the application with the compiled
modules in a cache staging area; the bundle
sketch eventually resulted
in saving a container image that contains only the strict necessary for
running the program, without building tools.
If you look carefully at all operations, you will see that we indeed stick
to all requirements: the final image is not bloated with unnecessary
tools, compilation is done under user ada
, as well as packing and
execution, the ENTRYPOINT
and CMD
are set right, …
As it is now, though, there’s little advantage over using a Dockerfile to
give to docker build
:
- it’s easier to provide the sequence of commands to be executed, because
you pass the text of a proper shell script instead of a single, long
escaped line to be fed to
/bin/sh -c
- the process is much heavier though, each run of the whole sequence starts from scratch and does not reuse anything already done.
Make build
More Efficient
dibs allows you to have direct control over caching, so there’s some more work to do but it allows us to always keep control of things.
The build process can be divided into a few phases:
- creation of user
ada
- installation of build tools
- installation of pre-requisites specific to the program (which amounts to nothing in our case)
- compilation of modules
Our strategy will be to divide the build
sketch into phases, and save
the places that we find interesting for reuse.
actions:
# ...
build-base: # new sketch for preparing a cached base image
- from: alpine 3.6
- name: base image preparation
pack:
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add build-base wget perl perl-dev
adduser -D -h /app ada
wget -O /bin/cpanm --no-check-certificate https://cpanmin.us/
chmod +x /bin/cpanm
#
# WE STOP HERE, no compilation at this stage!
#
- name: save base image for build
tags: sample-mojo-alien01-buildbase:latest
build:
- from: sample-mojo-alien01-buildbase:latest
- name: build user: ada pack: run: | #!/bin/sh set -e exec >&2 cd
user: ada
pack:
run: |
#!/bin/sh
set -e
exec >&2
cd /app
cp -R /tmp/src/* .
cpanm -l local --notest --installdeps .
cp -R /app /tmp/cache
# ...
It’s easy to see what we’re doing here: the old build
has been split
into two parts:
build-base
, which does most of the heavylifting preparing everything for the later compilation phase, but without doing it.build
now only strictly executes the compilation of modules and saves stuff in the cache. Note that we don’t have to use thesu - ada
trick to execute the build part as userada
, becausedibs
can execute that part asuser: ada
.
Now, the typical invocation pattern would be as follows:
# build the base image, once at the beginning
$ dibs --alien build-base
# ...
#
# now, execution of the build phase is much faster
$ dibs --alien
# ...
# get updates from remote repository into src, then regenerate image
$ cd src
$ git pull
$ cd ..
$ dibs --alien
Make build
even more efficient
As it turns out, our program is not installing modules like crazy, so we
can use a bit of caching for it too. We already leveraged /tmp/cache
as
a mechanism to let different actions communicate with each other (e.g.
to pass the compiled application from build
to bundle
), but nothing
prevents us from using it also across different build
s:
actions:
# ...
build:
- from: sample-mojo-alien01-buildbase:latest
- name: build
user: ada
pack:
run: |
#!/bin/sh
set -e
exec >&2
cd /app
cp -R /tmp/src/* .
# REUSE past compilations if available
if [ -d '/tmp/cache/app/local' ] ; then
cp -R /tmp/cache/app/local /app
fi
cpanm -l local --notest --installdeps .
cp -R /app /tmp/cache
# ...
In this way, we’re going also to reuse compilations by cpanm
every time
the /tmp/as-ada.sh
script is executed during the build phase, saving
more time.
Enhancing bundle
The bundle
sketch can use some enhancements too, because it re-creates
the whole thing from scratch over and over, whereas use creation and
runtime installation might be factored out and cached, much like what
happened with build
.
At this point, it’s also meaningful to think that the user creation
process might be factored out between build
and bundle
, as they have
the same goal. To do this, there are a few strategies:
- build a base image for the runtime, then use it as the base for the build base image
- build a pre-base image with user creation only, then use that as base image for the bundle base image and the build base image
- factor the user creation process out and reuse it in different sketches.
We will look into the three alternatives in the following sub-sections,
but in all cases we end up with a new
sample-mojo-alien01-bundlebase:latest
image that we will use as starting
point for bundling, like this:
actions:
# ...
bundle-base:
- from: alpine 3.6
# ...
- name: save base image for bundle
tags: sample-mojo-alien01-bundlebase:latest
bundle:
- from: sample-mojo-alien01-bundlebase:latest
# ...
# ...
Let’s take a closer look at the three alternatives now.
Bundle As Global Base
If we look into the bundle image carefully, we see that its pre-requisites are also in the build image, so why not use it as a starting point? This is our first alternative approach:
actions:
# ...
bundle-base: # new sketch for preparing a cached base image
- from: alpine 3.6
- name: bundle base image preparation
pack:
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add perl
adduser -D -h /app ada
- name: save base image for bundle
tags: sample-mojo-alien01-bundlebase:latest
build-base:
- from: sample-mojo-alien01-bundlebase:latest
- name: build base image preparation
pack:
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add build-base wget perl-dev
wget -O /bin/cpanm --no-check-certificate https://cpanmin.us/
chmod +x /bin/cpanm
- name: save base image for build
tags: sample-mojo-alien01-buildbase:latest
# ...
The advantage of this approach is that is quite simple; the drawback is that the bundle image might evolve in a direction that includes tools that are not needed for building. This might be a problem or not depending on circumstances; anyway, the build image is usually more bloated anyway, so it should not be an issue in the average case.
Common Pre-Base Image
A more “normalized” way of doing things would be to factor common operations like user creation in a single, simpler base image, then use it as a pre-base for generating two separated build image and a bundle image, which can then evolve independently:
actions:
# ...
base: # new sketch for preparing a cached base image
- from: alpine 3.6
- name: base image preparation
pack:
run: |
#!/bin/sh
set -e
exec >&2
adduser -D -h /app ada
- name: save base image
tags: sample-mojo-alien01-base:latest
bundle-base: # new sketch for preparing a cached base image
- from: sample-mojo-alien01-base:latest
- name: bundle base image preparation
pack:
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add perl
- name: save base image for bundle
tags: sample-mojo-alien01-bundlebase:latest
build-base:
- from: sample-mojo-alien01-base:latest
- name: build base image preparation
pack:
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add build-base wget perl-dev perl
wget -O /bin/cpanm --no-check-certificate https://cpanmin.us/
chmod +x /bin/cpanm
- name: save base image for build
tags: sample-mojo-alien01-buildbase:latest
# ...
The advantage of this approach is that the base image can then easily be put into a separate process, e.g. managed by another team.
Factoring a Common User Creation Stroke
In this case, the user creation process is a stroke by itself, defined once and reused by the base images:
actions:
# ...
create-user:
pack:
run: |
#!/bin/sh
set -e
exec >&2
adduser -D -h /app ada
build-base:
- from: alpine 3.6
- create-user
- name: base image preparation
pack:
# NOTE: no user creation in the script below
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add build-base wget perl perl-dev
wget -O /bin/cpanm --no-check-certificate https://cpanmin.us/
chmod +x /bin/cpanm
- name: save base image for build
tags: sample-mojo-alien01-buildbase:latest
bundle-base:
- from: alpine 3.6
- create-user
- name: base image preparation
pack:
# NOTE: no user creation in the script below
run: |
#!/bin/sh
set -e
exec >&2
apk --no-cache add perl
- name: save base image for bundle
tags: sample-mojo-alien01-bundlebase:latest
# ...
The advantage of this approach is that it’s more visually clear what’s going on in the preparation of each base image, because there’s less referencing around to other base images.
Where Are We Now?
At this point, I guess, we’re somewhere very near to what a multistage Dockerfile can do: we have a chain for building, another one for packaging a lean final image with our application, and there’s some caching to help us speed up things.
Now for the second reason why I wanted a tool like dibs: ease of
reuse. I already find dibs’s way of expressing actions inside
a container better than the lower-level RUN
provided by a Dockerfile,
but this is just the tip of the iceberg… as we will discover in our next
post!
The complete configuration file for this stage can be found here, considering the third alternative in the previous section.