Builds are Stuck in Pending Status

This section will help triage why builds are stuck in a pending state.

Whenever we encounter this issue it is always related to configuration. To triage this problem we, therefore, need to see configuration details and logs. Please take the following actions and provide the following data:

  1. Provide your server configuration
  2. Provide your agent configuration
  3. Enable DRONE_LOGS_TRACE=true on the server
  4. Enable DRONE_LOGS_TRACE=true on the agent
  5. Provide the agent logs with trace enabled
  6. Provide the server logs with trace enabled
  7. Provide the Yaml configuration file
  8. Provide the build details for your pending builds via this API endpoint.

Verify Runner Installation

Please make sure you have installed a runner that can process your pipelines. If you have only installed the server, and have not installed any runners to execute your pipelines, they will sit in a pending state.

See https://docs.drone.io/server/overview/

Did you use Helm?

If you used Helm to install Drone please make sure you used our official charts at drone/charts. The charts in Helm stable are not official charts and will not produce a working installation.

Check Server Settings

If you set DRONE_AGENTS_DISABLED or DRONE_KUBERNETES_ENABLED you should remove these settings. They are legacy settings and will prevent the server from assigning workloads to the runner.

What does a successful connection look like?

Before we discuss troubleshooting connectivity issues, we should first examine what a successful connection looks like. When debug mode is enabled on the server, and when agents successfully connect, you will see an entry in your server logs that looks like this:

{
  "arch": "amd64",
  "kernel": "",
  "level": "debug",
  "msg": "manager: request queue item",
  "os": "linux",
  "time": "2019-04-28T16:00:47-07:00",
  "variant": ""
}

The manager: request queue item entry is proof that the agent is successfully connecting to the server. If you do not see these corresponding log entries, you can be certain that the agents are failing to connect with the server.

Networking Problems

The most common root cause is network connectivity issues. The best way to triage connectivity issues is to pass DRONE_LOGS_TRACE=true to the agent. This will provide detailed logs for http attempts made to the server.

If the agent cannot establish a connection to the server you will see the below agent logs. Please note that this indicates a problem with either your Agent configuration, your Server configuration or your Network configuration (DNS, etc). This does not indicate a bug with Drone.

2019/04/28 16:05:57 [ERR] POST http://localhost:8080/rpc/v1/request request failed: Post http://localhost:8080/rpc/v1/request: dial tcp [::1]:8080: connect: connection refused
2019/04/28 16:05:57 [DEBUG] POST http://localhost:8080/rpc/v1/request: retrying in 2s (29 left)
2019/04/28 16:05:57 [ERR] POST http://localhost:8080/rpc/v1/request request failed: Post http://localhost:8080/rpc/v1/request: dial tcp [::1]:8080: connect: connection refused
2019/04/28 16:05:57 [DEBUG] POST http://localhost:8080/rpc/v1/request: retrying in 2s (29 left)
2019/04/28 16:05:57 [ERR] POST http://localhost:8080/rpc/v1/request request failed: Post http://localhost:8080/rpc/v1/request: dial tcp [::1]:8080: connect: connection refused
2019/04/28 16:05:57 [DEBUG] POST http://localhost:8080/rpc/v1/request: retrying in 2s (29 left)

Invalid Endpoint, Proxy Problems

Another common root cause is when you specify and invalid endpoint or when a reverse proxy is incorrectly routing the request. This will manifest as an error that includes html in the error message, for example:

2019/04/28 16:12:03 [DEBUG] POST https://drone.company.com/rpc/v1/request
{
  "arch": "amd64",
  "error": "\u003c!DOCTYPE html\u003e\n\u003chtml

You should also check to ensure you provide the correct server address, including the scheme (http vs https). If you are using the http address, and your reverse proxy automatically redirects to https, it can result in connectivity issues.

Protected Mode

Did you enable protected mode? If so you need to make sure you sign your yaml configuration file. Otherwise, your pipeline will sit in a pending state until it has been manually approved. If you change your yaml file, also be sure to re-generate the signature.

Incorrect Secret

Unfortunately a shared secret mismatch between the agent and server is the most difficult error to debug because it does not produce a useful error message. You should take care to ensure you are passing the correct secret to both the server and agent. Make sure the characters match exactly. If you are using cat to read the secret from a file, be careful, since this has caused problems (with newlines, etc) that can be difficult to troubleshoot.

Undefined Platform when using Arm or Arm64

Drone assumes all pipelines are amd64 unless otherwise specified. If you are using Drone with arm or arm64 agents please be sure to specify the architecture to ensure builds are routed to the correct agent.

kind: pipeline
name: default

+platform:
+  os: linux
+  arch: arm64

steps: ...

Possible architecture values are arm, arm64, and amd64.

Undefined Kernel when using Windows Docker Runner

If you are using Docker runner with on Windows please be sure to specify the kernel version to ensure builds are routed to the correct agent. This section only applies to the Docker runner. Do not specify a kernel version when using other runner types (kubernetes, exec, etc).

kind: pipeline
name: default

+platform:
+  os: windows
+  version: 1903

steps: ...

If you are unsure which kernel version your runner is using, you can check your server logs with debug mode enabled. When the runner pings the server it includes the kernel version in the payload.

{
  "arch": "amd64",
  "kernel": "1903",
  "level": "debug",
  "msg": "manager: request queue item",
  "os": "windows",
  "time": "2019-04-28T16:00:47-07:00",
  "variant": ""
}

Invalid kind or type

Another common root cause is when your kind or type are invalid (due to a simple spelling mistake, etc). When you configure an invalid kind or type, the pipeline will sit in queue waiting for a runner with a matching kind and type to come online (which of course will never happen).

-kind: pipline
+kind: pipeline
type: docker
name: default

Incorrect type

The kind and type determine which runner can execute your pipeline. For example, the docker runner can only execute pipelines where type is docker; the kubernetes runner can only execute pipelines where type is kubernetes, etc. Please ensure the type in your yaml matches the value expected by your installed runners.

kind: pipeline
type: docker
name: default

steps: ...

Mismatched Labels and Nodes

When you set the runner labels the runner will only process pipelines where the node parameters are an exact match. All labels must match all node parameters and vice versa. If your pipeline’s node parameters are incorrect and do not exactly match any runners, the pipeline will sit in a pending state waiting for an available runner that is an exact match.

We recommend you double check your pipeline to ensure the node section is an exact match with the desired runner. Below is a sample yaml and a sample runner request to the server. You will note the node section is not an exact match to the runner labels and therefore would not be picked up by the runner.

{
  "arch": "amd64",
  "kernel": "",
  "level": "debug",
  "msg": "manager: request queue item",
  "os": "linux",
  "time": "2019-04-28T16:00:47-07:00",
  "labels": {
    "foo": "bar",
    "baz": "qux",
  }
}
node:
  foo: bar
  # missing baz: qux

Please note that labels and node selection is an advanced feature. Typical installations will not need to use this feature.

Beware of False Positives

When you enable trace logs it is easy to misinterpret the results. The Agent uses long polling to request builds from the server queue. The agent connects to the server for up to 30 seconds. If the agent does not receive a build from the queue after 30 seconds, it terminates the connection and then reconnects. The connection is terminated after 30 seconds to prevent timeouts (from reverse proxies, load balancers, etc). It is therefore completely normal to see 524 status codes and context deadline exceeded errors in the trace logs.

The following trace log entries are therefore completely normal:

{
  "arch": "amd64",
  "level": "debug",
  "machine": "bradleys-mbp.lan",
  "msg": "runner: polling queue",
  "os": "linux",
  "time": "2019-04-28T16:22:16-07:00"
}
2019/04/28 16:22:16 [DEBUG] http: no content returned: re-connect and re-try
2019/04/28 16:22:46 [DEBUG] http: no content returned: re-connect and re-try

Check the runner architecture

We have seen issues where Docker for mac (for whatever reason) pulls the arm image instead of the amd64 image. As a result the runner polls the server for arm pipelines. If this happens you should use the architecture-specific docker image.

Still having issues?

We are happy to help you troubleshoot issues, however, before we can help you will need to gather and provide the below information. We will not provide assistance until all requested information is provided.

  1. Provide your server configuration
  2. Provide your runner configuration
  3. Provide your full server logs with trace logging enabled
  4. Provide your full runner logs with trace logging enabled
    a. Enable runner variable DRONE_RPC_DUMP_HTTP=true
    b. Enable runner variable DRONE_RPC_DUMP_HTTP_BODY=true
  5. Provide your yaml configuration file
  6. Confirm you have checked all common issues described in this thread and quickly summarize how you ruled each item out.
  7. Provide the build details for your pending builds via this API endpoint.

We request this information up-front to streamline the support process. The alternative would be a prolonged back and forth of questions and answers, after which you end up providing us everything in the above list anyway.

We also request that you create a new Discourse thread when providing the above information. Please do not post this information to Gitter or our chatrooms.

3 Likes
Drone1.2.1 shows "default: Pending"
[solved] Builds stuck in pending state for drone:1.1.0
Builds stuck in pending state
Stuck in pending status
Drone Autoscaler
[solved] Drone builds stuck in pending after server move
Drone builds are pending
Builds are Stuck in Pending
Docker runner: Cannot accept stage: 400 Bad Request
No jobs are being started
[solved] Webhook received, no build created
Drone-runner keeps receiving HTML when doing RPC calls and fails to run jobs
No content returned: reconnect and reconnect
Rpc request failed: context deadline exceeded
Build not triggered
Runner - Cannot ping the remote server
Tasks are always in "default: Pending" status
Builds not starting on Kube
Drone builds staying grey in the UI
Kube runner gives 404
CI Pipeline Hangs
Gogs+drone docker configuration issue on arm64
[solved] Builds stuck in pending state for drone:1.1.0
Drone SSH stuck in pending
[Insufficient Details] The drone template cannot deploy
Drone url redirects to github page not found
Drone-jsonnet not working
[Insufficient Details] The drone template cannot deploy
stuck in state `default: Pending` on gcloud k8s cluster
[solved] Build never starts - kubernets install - logs looks ok, builds are created triggered by github changes, but nothing happens
Cannot get netrc file
Installed with Helm on Kubernetes, gets stuck at the "clone" step
How to resolve warning "runner: cannot ack stage"
Status dashboard for cloud edition?
Drone builds are pending
No jobs taken from the queue anymore
No jobs taken from the queue anymore
Drone Build in Pending State in EKS
I/o timeout - builds pending
Drone stuck in "pending" state