I am just keep facing the
404 error problem on the webhook delivery (note that the payload and request are successfully sent, there is
404 error, not timeout), it’s about 30%~40$ webhook payload delivery will got the error, and a redeliver would usually make it
200 ok again, I’m very not sure what’s going on, the debug mode is turned on at the drone server, but didn’t see any related info to the failed deliveries.
The request header is: (token and X-GitHub-Delivery are replaced with XXXXX)
Request URL: https://ci.cdnjs.com/hook?access_token=XXXXXXXXXXXXXXXXXXXXX
Request method: POST
and the response header (with
404 error) is:
Cache-Control: no-cache, no-store, max-age=0, must-revalidate, value
Content-Type: text/plain; charset=utf-8
Date: Wed, 23 Nov 2016 05:39:18 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Last-Modified: Wed, 23 Nov 2016 05:39:17 GMT
X-Xss-Protection: 1; mode=block
I try to not use nginx and docker to make the environment much more simple, keep pull the latest drone version during these 5 days, but nothing helps.
Maybe there should be a feature to print out the category of log, and maybe one another feature to print the header like
X-GitHub-Delivery to help recognize the webhook would be helpful.
Please let me know if you also get this problem or you know any suggestion about how to dig into this issue, thanks!
What do the drone server logs say?
The code that processes the webhooks prints something to the logs for every possible failure path. So if you are seeing 404s and they are coming from drone, you will see the reason in the drone server logs.
I didn’t see the server logs say about the 404 error info, but maybe there are too many messages so I missed it … it’s hard to trace that
without the 404 and error message it won’t be possible to diagnose this further. If you don’t see a 404 in drone logs the assumption (on my end) is that drone is not receiving the http request, and it is a problem up the stack.
the logs shouldn’t be too difficult to search, unless you have DRONE_BROKER_DEBUG=true in which case you can disable. This is extremely verbose and not recommended if you are not explicitly trying to debug errors in the message broker.
there are only two possible reasons for a 404, which you can see in the hooks.go file.
The first 404 occurs when the repository name in the webhook signature (encoded as a url parameter) is not found in the database. This generally happens when:
- someone tries to manually modify the hook url in github or
- the repository name changes
log.Errorf("failure to find repo %s/%s from hook. %s", tmprepo.Owner, tmprepo.Name, err)
The second 404 occurs when drone cannot fetch the
.drone.yml for the commit sha. This generally happens when either forgets to include a
.drone.yml in the branch, or incorrectly assumes that drone only pull the
.drone.yml from master, which is not the case.
log.Errorf("failure to get build config for %s. %s", repo.FullName, err)
These two errors are the only code paths in drone that can return a 404 for a hook. If you do not see either of these errors in the logs it means something else is failing up your stack, which means the error is happening before the hook even gets to drone.
Thanks for the tips, I’ll use these two key strings to search from the log.
Since it’s “error”, can I set
DRONE_BROKER_DEBUG=false but still see it? I suppose to, just not sure.
yes, if you set
DRONE_BROKER_DEBUG=false you will still see the error logs. I would probably recommend
DRONE_DEBUG=true since the http server logs might still be useful.
DRONE_DEBUG parameter is much less verbose than
DRONE_BROKER_DEBUG, which is why I separated them into two different options
I finally catched the error message!
Looks like it can’t get the build config from GitHub, but I can’t reproduce the 404 on grabbing the files from GitHub, and I don’t have the same problem on Travis CI and Circle CI, not sure how to deal with this problem, can we have some retries here? Thanks.
ERRO failure to get build config for cdnjs/cdnjs. GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=e77e2e65537bbbe25b3d57b83d61b54985afa8ef: 404 Not Found 
ERRO Error #01: GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=e77e2e65537bbbe25b3d57b83d61b54985afa8ef: 404 Not Found 
ip=220.127.116.11 latency=411.596188ms method=POST path=/hook status=404 time=2016-11-26T05:52:52Z user-agent=GitHub-Hookshot/d9ba1f0
ERRO failure to get build config for cdnjs/cdnjs. GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=fb6b1e65b045dcf900f9443cd908541b156512ab: 404 Not Found 
ERRO Error #01: GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=fb6b1e65b045dcf900f9443cd908541b156512ab: 404 Not Found 
ip=18.104.22.168 latency=408.993726ms method=POST path=/hook status=404 time=2016-11-26T06:24:06Z user-agent=GitHub-Hookshot/d9ba1f0
ERRO failure to get build config for cdnjs/cdnjs. GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=dd3bc82a4e497e72d97764f8982900e438c34976: 404 Not Found 
ERRO Error #01: GET https://api.github.com/repos/cdnjs/cdnjs/contents/.drone.yml?ref=dd3bc82a4e497e72d97764f8982900e438c34976: 404 Not Found 
ip=22.214.171.124 latency=410.052014ms method=POST path=/hook status=404 time=2016-11-26T07:00:04Z user-agent=GitHub-Hookshot/d9ba1f0
So the question I have is why is GitHub returning a 404? Perhaps an issue should be opened with GitHub that valid request to fetch a file using the API returns a 404 even though the file exists?
but I can’t reproduce the 404 on grabbing the files from GitHub
my guess is if you re-send the hook from GitHub (via the GitHub webhooks UI) drone will run just fine and will not get a 404. This would demonstrate that it has nothing to do with drone or the drone code, but is a GitHub issue where they are not properly able to serve their content.
I don’t have the same problem on Travis CI and Circle CI
maybe because travis and circle have long queue times, and don’t process the requests for many seconds after the hook is received? Maybe GitHub has an eventually-consistent database, and sometimes the API request reads from a node that does not yet have the record? Perhaps GitHub could shed some light here.
Yes, I also told GitHub about this problem
OK great, thanks! Let’s see what information we can get from GitHub. I doubt we will see an immediate fix from them (although maybe) but if they can help us understand the root cause we can at least craft an effective fix in drone.
Let me know what they say!
Sure thing I’ll update the response here.
I wonder if there is a small delay for API to retrieve the latest commit, maybe drone is just too fast haha …
That is actually my guess. Maybe GitHub has an eventually consistent database and given the large size of CDNJS it takes a few seconds for the commit to propagate to all their storage servers. It the API gets routed to a storage node without the content it gets a 404.
This could also explain why Travis and Circle don’t have the same issue, because your Drone server doesn’t have large backlogs in the queue and thus processes the commit-hook immediately. Compare to Travis which probably has 50 items in the queue ahead of your build when the commit-hook is received, resulting in delayed processing.
Just a guess though
I found this 404 webhook issue yesterday. I tried regenerating the OATH Secret but it didn’t fix it. When I removed and re-added the repo in the drone UI, the webhooks worked again. I realized that another owner in our github org enabled these repos in the drone ui and he left the org on Friday. Since then, all of the repos he added in drone were getting 404 errors on the webhooks. Even though the repositories are owned by the organization, there is some credential cached that is tied to the user that enables the repository?