Rolling Your Own Remote Docker Executor on AWS Spot Instances

At Digital Turbine, we sometimes find ourselves needing to offload tasks to specialized hardware. This allows the task to run more efficiently and reduce the load from our regular executors, such as those running our CI/CD operations, SDK Metrics ingestion and our aggregations.

The considerations

Many different approaches and tools are available to address such needs; some are very basic to operate, while others have a steep learning curve.

We explored the various possibilities, but most cases proved to be extra large in simple terms of usage. For example, we could have created or used a Kubernetes cluster, a simple CronJob and an Allocator for AWS. Still, such a need would require hands-on Kubernetes knowledge and a deep understanding of maintaining such an operation - knowledge which most mobile engineers do not possess.

In addition, we explored the idea of re-using an existing Air Flow operation, which our peer teams make use of - but that proved impossible due to the technicalities of mixing permissions across the groups; such an issue also existed with using AWS Batch.

It's worth noting that both approaches explained above and other 3rd party solutions would have required us to have a budget, but like all things, this was a proof of concept, so a more simple solution was required.

Another important consideration we took into account was the accessibility of the solution, in a way that wouldn't introduce too much complexity and new technologies. This steered the answer in a slightly different direction, to allow us to run the workloads as if we run them as regular Jenkins jobs, something most software engineers are already familiar with.

The final decision

After reviewing our options, we opted for a simpler and easier solution, which allows us to use our existing technology stack and knowledge to operate, while seamlessly running our operation from anywhere we wanted.

To suit the above requirements, a piece of software was written as a script to run our SDK Metrics ingestion workloads on AWS compute instances, without requiring a single machine to be active 24/7, since we wanted to maximize our cost/computer efficiency and use spot instances, as there are definitely "dead spots" in our ingestion process.

Our programming language for this utility is go-lang, as it has mature SDKs for both AWS and Docker and both are extremely fast to bootstrap.

In this blog post, we'll walk through the steps required to re-create the script and follow its inner workings.

Prerequisites and dependencies

For the script to run successfully, you will require a pre-existing AMI that has a docker daemon installed, running and exposing the Docker API port (2375) to the relevant calling machine(s). The following public non-Digital Turbine GitHub gist by styblope has some information about how to make the docker daemon listen to TCP in addition to UNIX sockets, but to sum up the requirements:

Create a daemon.json file in /etc/docker containing
{"hosts": ["tcp://0.0.0.0:2375", "unix:///var/run/docker.sock"]}
Run the daemon using the system in a way that will instruct the daemon to use the above config file, i.e., by editing the docker.service or creating an override.conf

Also, this post assumes the following dependencies have been provided to your go-lang project:

	github.com/aws/aws-sdk-go v1.32.4
	github.com/docker/docker v1.13.1

view raw go-lang 1 hosted with ❤ by GitHub

What are we going to do?

Before diving into the code, set out below is a short overview of the process we are about to see in action:

Request Amazon EC2 using the "RunInstances" API
Wait for the machine to be online
Connect to a docker service running on the newly provisioned machine
Pull a docker image
Create a container based on the above image
Start the container
Pipe the container's logs to our process
For brevity, use the container's exit code in our process
Terminate the EC2 machine using the "TerminateInstancesRequest" API

Show me the code!

So as a start, we need to create and run an EC2 instance, preferably a spot instance. Fortunately, that's a pretty straightforward task, and we only need a few details to invoke the API and receive the details of the newly created EC2 instance.

Here is how the request object looks in go-lang:

	eri := ec2.RunInstancesInput{
	DisableApiTermination: aws.Bool(false),
	DryRun: aws.Bool(dryRun), //Set to true if you just want to test things out
	IamInstanceProfile: &ec2.IamInstanceProfileSpecification{
	Name: aws.String("YOUR-IAM-PROFILE"),
	},
	ImageId: aws.String("YOUR-AMI-WITH-DOCKER-EXPOSED"),
	//The following sets the RunInstance request to use a spot instances
	InstanceMarketOptions: &ec2.InstanceMarketOptionsRequest{
	MarketType: aws.String("spot"),
	SpotOptions: &ec2.SpotMarketOptions{
	BlockDurationMinutes: aws.Int64(60), //Single hour
	//MaxPrice: nil, //This can be set if required
	SpotInstanceType: aws.String("one-time"),
	},
	},
	InstanceType: aws.String("MACHINE-TYPE"), //i.e. c5a.12xlarge
	KeyName: aws.String("SSH-KEYNAME-OR-NIL-STRING"),
	MaxCount: aws.Int64(1),
	MinCount: aws.Int64(1),
	//The following describe the instance networking details
	NetworkInterfaces: []*ec2.InstanceNetworkInterfaceSpecification{
	{
	AssociatePublicIpAddress: aws.Bool(true),
	DeleteOnTermination: aws.Bool(true),
	DeviceIndex: aws.Int64(0),
	Groups: []*string{aws.String("YOUR-SG-FOR-THE-INSTANCE")},
	InterfaceType: aws.String("interface"),
	SubnetId: aws.String("THE-SUBNET"),
	},
	},
	//The following is extremely helpful for reporting
	TagSpecifications: []*ec2.TagSpecification{{
	ResourceType: aws.String("instance"),
	Tags: []*ec2.Tag{{
	Key: aws.String("environment"),
	Value: aws.String("dev"),
	},
	}},
	//optionally, if you need some boiler plate running on the instance run -
	//UserData: aws.String("sudo apt install vim")
	}

view raw go-lang 2 hosted with ❤ by GitHub

‍

Using this object actually to perform the instance request:

	req, res := ec2Service.RunInstancesRequest(&eri)
	req_err := req.Send()

view raw go-lang 3 hosted with ❤ by GitHub

‍

If successful, the result from this call is constructive, as it includes details about the newly created and running instance; this will help us connect to the instance:

ip := *res.Instances[0].PrivateIpAddress

view raw go-lang 4 hosted with ❤ by GitHub

‍

Now that we have an instance running, we can attempt to connect to it; we can hijack Docker SDK's "Ping" API to check if the instance has the docker daemon running and accepting connections:

	start := time.Now()
	timeoutFuture := start.Add(time.Minute * 5)
	var newClient *client.Client
	for time.Now().Unix() < timeoutFuture.Unix() {
	log.Printf("waiting for %s to be connectable....", ip)
	time.Sleep(5 * time.Second)
	var errClient error
	newClient, errClient = client.NewClient(fmt.Sprintf("tcp://%s:2375",ip), client.DefaultVersion, nil, nil)
	if errClient != nil {
	log.Printf("current loop's error - %v", errClient)
	} else if newClient != nil {
	ping, errClient := newClient.Ping(context.Background())
	if errClient == nil {
	log.Printf("Ping result - docker %v", ping.APIVersion)
	break;
	} else {
	log.Printf("Couldn't ping remote docker machine - %v", errClient)
	}
	}
	}

view raw go-lang 5 hosted with ❤ by GitHub

After creating a new docker client object, the code described above performs the simple task of attempting to connect to the Docker daemon on the instance for a max time of 5 minutes, in intervals of 5 seconds.

After a successful connection, the ping result will have the API version the docker daemon in the EC2 instance is using.

Now that we have the machine running and we have a newly created docker client ready to go, we can start the work of actually using the docker services on the EC2 instance we just created.

This starts with pulling a docker image to the instance itself.

But before we do that, a head's up - if you would like to use ECR as your image repository, you must provide credentials to the docker service running on that instance.

To perform that, it's best to call the "GetAuthorizationToken" API.

	tokenRequest, errToken := ecrService.GetAuthorizationToken(&ecr.GetAuthorizationTokenInput{})
	if errToken != nil {
	log.Fatalf("couldn't retrieve ecr login details! %v", errToken)
	}
	token := *tokenRequest.AuthorizationData[0].AuthorizationToken

view raw go-lang 6 hosted with ❤ by GitHub

‍

To use that token, in the docker API, you must manipulate the string a bit:

	decoded, e := base64.StdEncoding.DecodeString(token)
	parts := strings.Split(decoded, ":")
	authConfig := types.AuthConfig{
	Username: parts[0],
	Password: parts[1],
	}
	encodedJSON, err := json.Marshal(authConfig)
	authStr := base64.URLEncoding.EncodeToString(encodedJSON)

view raw go-lang 7 hosted with ❤ by GitHub

‍

So here, we can finally pull our image:

	reader, err := newClient.ImagePull(ctx, image, types.ImagePullOptions{
	RegistryAuth: authStr,
	})

view raw go-lang 7 hosted with ❤ by GitHub

‍

If we want to pipe the output of the image pull operation:

	scanner := bufio.NewScanner(reader)
	for scanner.Scan() {
	log.Printf("docker \| %s", scanner.Text())
	}
	_ = reader.Close()

view raw go-lang 8 hosted with ❤ by GitHub

‍

Now we can continue to deal with the container, beginning with creating it:

	resp, err := newClient.ContainerCreate(ctx, config, &container.HostConfig{
	NetworkMode: "host", //might require a different setting, here we need the container exposed to the subnet
	}, nil, "name")

view raw go-lang 8 hosted with ❤ by GitHub

‍

If everything went smoothly (otherwise, check the returned errors from aws/docker) until here, we could now start the container:

	//attempt to start the container, notice the response id from container create API
	newClient.ContainerStart(ctx, resp.ID, types.ContainerStartOptions{})

view raw go-lang 9 hosted with ❤ by GitHub

‍

Now let's pipe the logs output from docker to the process running us:

	//if no error, grab the logs
	logs, errLogs := newClient.ContainerLogs(ctx, resp.ID, ops)
	if errLogs == nil {
	scanner := bufio.NewScanner(logs)
	for scanner.Scan() {
	log.Printf("docker \| %s", scanner.Text())
	}
	_ = reader.Close()
	}

view raw go-lang 9 hosted with ❤ by GitHub

‍

If we want to stop the container, we can use the stop container API:

newClient.ContainerStop(context.Background(), resp.ID, &timeout)

view raw go-lang 10 hosted with ❤ by GitHub

‍

Finally, to receive the exit code of the container, we can use Inspect API:

	inspect, errInspect := newClient.ContainerInspect(ctx, resp.ID)
	Inspect := inspect.State.ExitCode

view raw go-lang 11 hosted with ❤ by GitHub

‍

Now, to be good citizens, we have to terminate the EC2 instance we used; this might also save a few bucks in case we grabbed an expensive machine:

	ec2Service.TerminateInstancesRequest(&ec2.TerminateInstancesInput{
	DryRun: aws.Bool(dryRun),
	InstanceIds: []*string{runInstanceResponse.Instances[0].InstanceId},
	})

view raw go-lang 12 hosted with ❤ by GitHub

Summary

In the age of Kubernetes, this might look a bit like a step in the other direction. However, considering that the teams managing the project are all mobile developers, a more straightforward, more head and hands-on approach has nailed it.

This post has described how we can quickly empower both the AWS and Docker APIs and SDKs to create small remote execution units and run them as if they are running locally. We fine-tuned them running on Jenkins, allowing for a simple job output as if Jenkins ran the code itself.

Sagi Antebi

Newsletter Sign-Up

Get our mobile expertise straight to your inbox.