Build "Successful" even though most steps "skipped"

Using drone/drone-runner-docker:1.3

I should definitely update that to 1.5.

1 Like

@bradrydzewski Can you see which dependency is missing?? I’ve read through .drone.yml now and can’t see any missing dependencies at all. I mean, there might be one, but I can’t spot it (disclaimer: I’m the author of the Grafana Drone config).

@aknudsen I just pushed an update that provides a better error message:

linter: unknown step dependency detected: build-backend references test-backend

the directed graph library that we use would require that test-backend is defined before build-backend in the pipeline.

1 Like

@bradrydzewski Ah…, thanks on both accounts!

We have updated our runners to all use the latest docker runner (1.5). We have also triple-checked our dependency graphs and made sure they all pass lint.

We are continuing to see unexplained skipped steps, primarily on private repos. Restarting the build usually fixes it, but it is happening with some frequency.

The common symptom seems to be skipped steps, and at least one “successful” step that has no logs at all.

1 Like

I can confirm @captncraig’s account, we have experienced several builds of a private repo today to skip (most) steps and be marked as successful by Drone. I’ve run drone lint on the repo .drone.yml in question, and it finds no issues.

Could there please be a Drone configuration parameter to make it fail hard (with an explanation) when unable to resolve dependencies, instead of skipping and marking the build successful? I think this behaviour is really bad, as it’s never something I want to happen and I’m also super confused as to why it’s happening.

I want to happen and I’m also super confused as to why it’s happening

I published a patch yesterday that is available in drone/drone-runner-docker:latest, however, if you upgraded to version 1.5 you would not have the fix yet. I recommend using the latest image to see if this solves the issue. I am not aware of any other root cause for the behavior described, however, if the issue persists we would kindly ask that you enabled debug logging on the runner and provide the logs, as well as a yaml that we can use to try and reproduce (also if the repository is open source, a link to the build is also helpful).

edit: just noticed you mentioned this was for a private repository. once you upgrade, feel free to email me the requested info if you don’t want to share publicly

1 Like

Thanks for making that change @bradrydzewski! It’ll be a great help if Drone catches and reports these errors, instead of just skipping steps :slight_smile: I will see with @captncraig if we can try the latest image revision.

I’ll email you the Drone config in question (to the standard drone.io address).

I deployed runners from latest. The linter error does stop one edge case that looks a little like ours, and I much prefer the new behaviour, so thank you.

Time will tell if that gives any improvement or insight to the problematic builds.

@bradrydzewski This problem is happening again for us now (although randomly, as before) :frowning: See f.ex. this build. @captncraig We’re sure we’re on a version with the fix?

@bradrydzewski Got another aborted, but successful, build just now :confused: I thought you made a patch to stop this from happening?

we made numerous patches to both prevent this issue and provide additional trace logging so that we could help diagnose the issue if it happens again. In order to take advantage of all patches and improved logging you need version 1.5.2 of the runner or higher. If this happens again, enable trace logging on the runner and provide a log dump and we can analyze.

1 Like

Thanks, will check,.

I think we might have been on an older version of the runners after all, from what I hear we’ve rolled out v1.5.2 now. Fingers crossed it smooths out the bumps!

It happened again on 1.5.2 runners, sadly. But maybe there are some leads here:

Specifically the FIXME lines are a little concerning.

{"log":"time=\"2020-10-04T18:22:03Z\" level=debug msg=\"stage details fetched\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-04T18:22:03.459465436Z"}
{"log":"time=\"2020-10-04T18:22:03Z\" level=debug msg=\"updated stage to running\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-04T18:22:03.923214325Z"}
{"log":"time=\"2020-10-04T18:22:05Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=identify-runner thread=1\n","stream":"stderr","time":"2020-10-04T18:22:05.116809089Z"}
{"log":"time=\"2020-10-04T18:22:08Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=clone thread=1\n","stream":"stderr","time":"2020-10-04T18:22:08.587163022Z"}
{"log":"time=\"2020-10-04T18:23:43Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=initialize thread=1\n","stream":"stderr","time":"2020-10-04T18:23:43.515627302Z"}
{"log":"time=\"2020-10-04T18:23:46Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=shellcheck thread=1\n","stream":"stderr","time":"2020-10-04T18:23:46.374465501Z"}
{"log":"time=\"2020-10-04T18:23:46Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=codespell thread=1\n","stream":"stderr","time":"2020-10-04T18:23:46.52527782Z"}
{"log":"time=\"2020-10-04T18:23:58Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=build-docs-website thread=1\n","stream":"stderr","time":"2020-10-04T18:23:58.408606694Z"}
{"log":"time=\"2020-10-04T18:24:05Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=lint-backend thread=1\n","stream":"stderr","time":"2020-10-04T18:24:05.424525511Z"}
{"log":"time=\"2020-10-04T18:24:35Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=build-plugins thread=1\n","stream":"stderr","time":"2020-10-04T18:24:35.292970565Z"}
{"log":"time=\"2020-10-04T18:26:42Z\" level=debug msg=\"received exit code 0\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 step.name=test-frontend thread=1\n","stream":"stderr","time":"2020-10-04T18:26:42.100059521Z"}
{"log":"time=\"2020-10-04T18:26:42Z\" level=debug msg=\"destroying the pipeline environment\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-04T18:26:42.209616062Z"}
{"log":"time=\"2020-10-04T18:26:42Z\" level=debug msg=\"FIXME: Got an status-code for which error does not match any expected type!!!: -1\" module=api status_code=-1\n","stream":"stderr","time":"2020-10-04T18:26:42.209628951Z"}
{"log":"time=\"2020-10-04T18:26:42Z\" level=debug msg=\"FIXME: Got an status-code for which error does not match any expected type!!!: -1\" module=api status_code=-1\n","stream":"stderr","time":"2020-10-04T18:26:42.209632549Z"}
{"log":"time=\"2020-10-04T18:26:42Z\" level=debug msg=\"FIXME: Got an status-code for which error does not match any expected type!!!: -1\" module=api status_code=-1\n","stream":"stderr","time":"2020-10-04T18:26:42.209650361Z"}
{"log":"time=\"2020-10-04T18:26:48Z\" level=debug msg=\"successfully destroyed the pipeline environment\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-04T18:26:48.603660405Z"}
{"log":"time=\"2020-10-04T18:26:48Z\" level=debug msg=\"stage failed\" build.id=13512 build.number=623 duration=279 error=\"1 error occurred:\\n\\t* \u003chtml\u003e\\r\\n\u003chead\u003e\u003ctitle\u003e400 Bad Request\u003c/title\u003e\u003c/head\u003e\\r\\n\u003cbody\u003e\\r\\n\u003ccenter\u003e\u003ch1\u003e400 Bad Request\u003c/h1\u003e\u003c/center\u003e\\r\\n\u003chr\u003e\u003ccenter\u003enginx/1.15.10\u003c/center\u003e\\r\\n\u003c/body\u003e\\r\\n\u003c/html\u003e\\r\\n\\n\\n\" repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-04T18:26:48.603685414Z"}
{"log":"time=\"2020-10-04T18:26:48Z\" level=debug msg=\"poller: request stage from remote server\" thread=1\n","stream":"stderr","time":"2020-10-04T18:26:48.603691545Z"}
{"log":"time=\"2020-10-04T18:26:48Z\" level=debug msg=\"done listening for cancellations\" build.id=13512 build.number=623 repo.id=61 repo.name=grafana-enterprise repo.namespace=grafana stage.id=30779 stage.name=build-master-downstream stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-04T18:26:48.60381219Z"}
1 Like

I am unable to access the link. Also can you enable trace logging on the runner? We added a bunch of new trace log entries to help debug this issue.

Specifically the FIXME lines are a little concerning.

the FIXME lines come from the Docker Client and should be ignored. The error is both annoying and misleading, but unfortunately nothing we can do about it until there is an upstream Docker fix.

Yeah, will enable trace. I thought I had, but I guess not yet. Sry, didn’t realize this was a private build.

1 Like

Ok, we found another one on a public build: https://drone.grafana.net/grafana/grafana/3949

Logs from around that time (including trace):

{"log":"time=\"2020-10-06T12:02:23Z\" level=debug msg=\"stage received\" stage.id=31998 stage.name=test-pr stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-06T12:02:23.414526983Z"}
{"log":"time=\"2020-10-06T12:02:23Z\" level=debug msg=\"stage accepted\" stage.id=31998 stage.name=test-pr stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-06T12:02:23.42286068Z"}
{"log":"time=\"2020-10-06T12:02:23Z\" level=debug msg=\"stage details fetched\" build.id=13964 build.number=3949 repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-06T12:02:23.453335187Z"}
{"log":"time=\"2020-10-06T12:02:23Z\" level=debug msg=\"updated stage to running\" build.id=13964 build.number=3949 repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-06T12:02:23.982418764Z"}
{"log":"time=\"2020-10-06T12:02:30Z\" level=trace msg=\"http: no content returned: re-connect and re-try\"\n","stream":"stderr","time":"2020-10-06T12:02:30.530893674Z"}
{"log":"time=\"2020-10-06T12:02:37Z\" level=debug msg=\"received exit code 0\" build.id=13964 build.number=3949 repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 step.name=clone thread=1\n","stream":"stderr","time":"2020-10-06T12:02:37.284585639Z"}
{"log":"time=\"2020-10-06T12:02:39Z\" level=debug msg=\"received exit code 0\" build.id=13964 build.number=3949 repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 step.name=identify-runner thread=1\n","stream":"stderr","time":"2020-10-06T12:02:39.623123504Z"}
{"log":"time=\"2020-10-06T12:02:53Z\" level=trace msg=\"http: no content returned: re-connect and re-try\"\n","stream":"stderr","time":"2020-10-06T12:02:53.455636347Z"}
{"log":"time=\"2020-10-06T12:03:50Z\" level=trace msg=\"http: no content returned: re-connect and re-try\"\n","stream":"stderr","time":"2020-10-06T12:03:50.536582503Z"}
{"log":"time=\"2020-10-06T12:04:03Z\" level=debug msg=\"received exit code 0\" build.id=13964 build.number=3949 repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 step.name=initialize thread=1\n","stream":"stderr","time":"2020-10-06T12:04:03.841237002Z"}
{"log":"time=\"2020-10-06T12:04:03Z\" level=debug msg=\"destroying the pipeline environment\" build.id=13964 build.number=3949 repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-06T12:04:03.969474647Z"}
{"log":"time=\"2020-10-06T12:04:03Z\" level=debug msg=\"FIXME: Got an status-code for which error does not match any expected type!!!: -1\" module=api status_code=-1\n","stream":"stderr","time":"2020-10-06T12:04:03.969486996Z"}
{"log":"time=\"2020-10-06T12:04:08Z\" level=debug msg=\"successfully destroyed the pipeline environment\" build.id=13964 build.number=3949 repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-06T12:04:08.52212417Z"}
{"log":"time=\"2020-10-06T12:04:08Z\" level=debug msg=\"stage failed\" build.id=13964 build.number=3949 duration=100 error=\"1 error occurred:\\n\\t* \u003chtml\u003e\\r\\n\u003chead\u003e\u003ctitle\u003e400 Bad Request\u003c/title\u003e\u003c/head\u003e\\r\\n\u003cbody\u003e\\r\\n\u003ccenter\u003e\u003ch1\u003e400 Bad Request\u003c/h1\u003e\u003c/center\u003e\\r\\n\u003chr\u003e\u003ccenter\u003enginx/1.15.10\u003c/center\u003e\\r\\n\u003c/body\u003e\\r\\n\u003c/html\u003e\\r\\n\\n\\n\" repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-06T12:04:08.522144127Z"}
{"log":"time=\"2020-10-06T12:04:08Z\" level=debug msg=\"poller: request stage from remote server\" thread=1\n","stream":"stderr","time":"2020-10-06T12:04:08.522158546Z"}
{"log":"time=\"2020-10-06T12:04:08Z\" level=trace msg=\"http: context canceled\"\n","stream":"stderr","time":"2020-10-06T12:04:08.522287763Z"}
{"log":"time=\"2020-10-06T12:04:08Z\" level=debug msg=\"done listening for cancellations\" build.id=13964 build.number=3949 repo.id=48 repo.name=grafana repo.namespace=grafana stage.id=31998 stage.name=test-pr stage.number=1 thread=1\n","stream":"stderr","time":"2020-10-06T12:04:08.522301236Z"}
{"log":"time=\"2020-10-06T12:04:30Z\" level=trace msg=\"http: no content returned: re-connect and re-try\"\n","stream":"stderr","time":"2020-10-06T12:04:30.539099804Z"}

I am a bit confused since the “initialize” step returns a 0 exit code, and the next line is “destroying the pipeline”. Also it is not clear to me why the stage was failed after all of the success. Looks like there was an http 400 error somewhere. Perhaps a problem reporting step status back to the server?

One thing that jumps out is the log entry that indicates the stage failed:

time=\"2020-10-06T12:04:08Z\"
level=debug msg=\"stage failed\"
build.id=13964
build.number=3949
duration=100
error=\"1 error occurred:\\n\\t* \u003chtml\u003e\\r\\n\u003chead\u003e\u003ctitle\u003e400 Bad Request\u003c/title\u003e\u003c/head\u003e\\r\\n\u003cbody\u003e\\r\\n\u003ccenter\u003e\u003ch1\u003e400 Bad Request\u003c/h1\u003e\u003c/center\u003e\\r\\n\u003chr\u003e\u003ccenter\u003enginx/1.15.10\u003c/center\u003e\\r\\n\u003c/body\u003e\\r\\n\u003c/html\u003e\\r\\n\\n\\n\

This is interesting for a few reasons. First is that Drone does not return a 400 bad request from any runner API calls. Second, it is that the error returns an HTML page. It almost looks like some sort of reverse proxy intercepting the request and providing an unexpected response code and body.

Regardless and error should result in an error so let me see what else I can dig up.