My Drone Docker Runner instance runs on a server where all containers are all stopped and recreated nightly for updates and backups.
This appears to be causing issues for long-running builds that keep going while that happens, and the runner appears to not know the containers exist after that point.
The runners are not designed to be killed while pipelines are running; if you kill the runner while pipelines are running it will result in zombie builds (that will eventually be terminated with an error state by the server after 24 hours). The recommendation is therefore to use
docker stop with the
--timeout flag to allow enough time for graceful termination and for jobs to complete.
I just check for running builds via the Drone API now and abort the backup procedure if so, which appears to be working for the time being.
Your timeout method isn’t the best in my opinion, because I might have builds that take 12+ hours to do, and I wouldn’t want my backups to wait super long to run just because of that. If backups could silently be done and then the runner just picks back up where it left off, I think that’d be much more elegant, as it ensures my backups still run in a timely fashion.
I feel like the pickup mechanism could be done in a fairly straightforward way as well by using Docker tags on all pipeline containers, but I can’t implement it much myself as I’m not really familiar with Go yet.