#8320 Event hooks on src.f.o sometime not called, is the event hook processor stuck ?
Closed: Fixed 6 months ago by fbo. Opened 9 months ago by fbo.

Hello,

This issue was open on the pagure project tracker, but I think I should be have been open it here instead (https://pagure.io/pagure/issue/4636)

That's not the first time that I see the event hook are not called.

Last time it happen: I never received the hook calls, that should have been generated by comments added to the PRs below the Friday 18, Monday 21.

https://src.fedoraproject.org/rpms/python-zuul-sphinx/pull-request/2
https://src.fedoraproject.org/rpms/nodepool/pull-request/1

Yesterday it started to work again w/o any change on tooling/config. Today it is working too.

Could it be possible to check what is happening server side ?

Thanks a lot for your help.

Fabien


Hello,

Today it seems stuck again on src.f.o (while it works well on pagure.io).

The comments added there did not fired web hook calls on my side:
https://src.fedoraproject.org/rpms/python-gear/pull-request/8#comment-32727
https://src.fedoraproject.org/rpms/nodepool/pull-request/1#comment-32726

Last Wednesday, and Tuesday it worked.

On pagure.io I never had such issue, example here Zuul CI got called then responded:
https://pagure.io/zuul-distro-jobs/pull-request/23

Could it be possible to check what is happening server side ?
Or let me know how can I help.

Thanks a lot for your help.

Odd. I see some kind of dns issue?

Oct 25 09:02:32 pkgs02.phx2.fedoraproject.org celery[4806]: 2019-10-25 09:02:32,091 [INFO] pagure.l
ib.tasks_services: An error occured while querying: https://softwarefactory-project.io/zuul/api/con
nection/src.fedoraproject.org/payload - Error: ('Connection aborted.', gaierror(-3, 'Temporary fail
ure in name resolution'))
Oct 25 09:02:32 pkgs02.phx2.fedoraproject.org celery[4806]: 2019-10-25 09:02:32,091 [INFO] pagure.l
ib.tasks_services: An error occured while querying: https://softwarefactory-project.io/zuul/api/con
nection/src.fedoraproject.org/payload - Error: ('Connection aborted.', gaierror(-3, 'Temporary fail
ure in name resolution'))

@pingou any ideas?

Metadata Update from @kevin:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: src.fp.o

9 months ago

Hi Kevin, thanks looking into this !

Yesterday I saw it working ( https://src.fedoraproject.org/rpms/nodepool/pull-request/1#comment-32806 ) But today it seems to be stuck again.

Is the src.f.o host is using a specific DNS resolver ?

Hi,

Today it seems to not work. No call from src.fedoraproject.org on my hook server. Around 10:15 UTC this https://src.fedoraproject.org/rpms/nodepool/pull-request/2#comment-33150 must have generated an event then a hook call.

Thanks in advance for your help

Hi,

Any news on this issue. Today src.fedoraproject.org do not call my web hooks.

Thanks in advance for your help

@kevin @pingou who can work with @fbo to debug this?
CI without reliable trigger will not be useful esp. when running a PoC to gain confidence in it.

My last 7 "recheck" attempts over the 8 last days were caught by Zuul, meaning that the situation looks better now ! :) I'll continue to monitor some more days.

https://src.fedoraproject.org/rpms/python-gear/pull-request/8

Retried today 5:35 UTC December 23 and it seems I hit the issue again. If someone could check the logs to see if this is again the DNS issue ?

Retried today 9:31 UTC January 10 and it seems I hit the issue again.

I see no errors in the log and I see some calls to softwarefactory.

I've added a little debugging, could you try reproducing it again?

I've just commented the test review with "recheck" to get the PR commented event (https://src.fedoraproject.org/rpms/python-gear/pull-request/8). Do you see something in the debug ?

If I look at the correct pagure instance I get better results, I see now:

pagure.lib.tasks_services: An error occured while querying: https://softwarefactory-project.io/zuul/api/connection/src.fedoraproject.org/payload - Error: ('Connection aborted.', gaierror(-3, 'Temporary failure in name resolution'))

Confirmed by:

host softwarefactory-project.io
;; connection timed out; no servers could be reached

And:

curl http://softwarefactory-project.io -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0curl: (6) Could not resolve host: softwarefactory-project.io; Unknown error

I don't know how to debug this further :(

@pingou what is the host that you were running these commands on? Basically whatever host could not connect to ns.redhat.com to get the dns info. The next step I usually try is

dig +trace softwarefactory-project.io

.. deleted stuff
io.                     172800  IN      NS      b0.nic.io.
io.                     172800  IN      NS      a0.nic.io.
io.                     172800  IN      NS      a2.nic.io.
io.                     172800  IN      NS      c0.nic.io.
io.                     86400   IN      DS      57355 8 1 434E91E206134F5B3B0AC603B26F5E029346ABC9
io.                     86400   IN      DS      57355 8 2 95A57C3BAB7849DBCDDF7C72ADA71A88146B141110318CA5BE672057 E865C3E2
io.                     86400   IN      RRSIG   DS 8 1 86400 20200123050000 20200110040000 33853 . QzrcRwQEbB2q2sS4rIXD5xIRLEWj6cy0OIEqhyicBh0AZW2aPEKJZQqN VH/HslG9GMo8Fiq9DWCaAZ297sItlxnM+bXEz65CheL7GuMHWklvn0TU nvfJcOdakJMdZ4bkkREP6LVh0T4b3z1VAP3kk9hrCwbxWaCMBD2TPKgw ZR7KYapH37t1EzvSUCJbH3NvyxoNgweQLs5GRymBU1Ihrtc9Tqxd/hfm SbPXacZKKbD6klUl98fkwzkqcfP6H15aC4e48TIITd32t8l5nzfYQD0K 6pPMdPlg9TJzFOxFRe6sYQ0G2Rgtjw14RpqkV6m64iWre6QTBENsv7IW m3jCkA==
;; Received 674 bytes from 192.58.128.30#53(j.root-servers.net) in 13 ms

softwarefactory-project.io. 86400 IN    NS      ns2.redhat.com.
softwarefactory-project.io. 86400 IN    NS      ns1.redhat.com.
softwarefactory-project.io. 86400 IN    NS      ns3.redhat.com.
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN NSEC3 1 1 1 D399EAAB 2IV073DVU92DLMV2H5L9G1PEODM9RDE6 NS SOA RRSIG DNSKEY NSEC3PARAM
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN RRSIG NSEC3 8 2 900 20200131153741 20200110143741 24399 io. TYkxrnNEdhY8vTT5yVhP/4Xy9wPmXSLzOrBkursGw2KHVq4Mb7xZVquk cKlyPIYPMCokEm/gP8irYM1LXBhTTRmXquujFdOFL/3dZ7qwSTH/l96N tsKHFEycFlUkw4jZTes5tK1c+k8XYx5ttMMYNzsuZmqyAXKYI9j5bZ1B ppA=
4nt58f1lle4ru0kn1lh9tp3gjp4nce23.io. 900 IN NSEC3 1 1 1 D399EAAB 4O37P1QI41CL4UC3G6L955RN36FUEGT8 NS DS RRSIG
4nt58f1lle4ru0kn1lh9tp3gjp4nce23.io. 900 IN RRSIG NSEC3 8 2 900 20200130151733 20200109141733 24399 io. IWLxqXYEovklDZDJYKzBlweHaZYNKxNI1zPYAkHOwq3SS/MYXYtXYfls 3ei84y5wf1nl1zn4miDxTCy2BZpBSPWkxFRHdMAI/fPd17Is9eo6hpu4 vUvCr9MnXi2qb5Yn7rm/dX4ITdo7SfwNJ6bQcUXHFx7hAB4KqeS7sRMT 5z4=
;; Received 610 bytes from 65.22.163.17#53(a2.nic.io) in 8 ms

softwarefactory-project.io. 300 IN      A       38.145.34.47
softwarefactory-project.io. 300 IN      NS      ns2.redhat.com.
softwarefactory-project.io. 300 IN      NS      ns4.redhat.com.
softwarefactory-project.io. 300 IN      NS      ns3.redhat.com.
softwarefactory-project.io. 300 IN      NS      ns1.redhat.com.
;; Received 217 bytes from 209.132.186.218#53(ns1.redhat.com) in 90 ms

@smooge I'm running this on pkgs02

The dig +trace command returns something but I don't know where to cut to give the info. I do not see this line in its output though:
softwarefactory-project.io. 300 IN A 38.145.34.47

It's like it stops at the section before (with the 86400 IN NS...).

ok, I think I see where to cut to give the same info as you:

# dig +trace softwarefactory-project.io

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-9.P2.el7 <<>> +trace softwarefactory-project.io
;; global options: +cmd

... deleted stuff

io.         172800  IN  NS  a2.nic.io.
io.         172800  IN  NS  c0.nic.io.
io.         172800  IN  NS  a0.nic.io.
io.         172800  IN  NS  b0.nic.io.
io.         86400   IN  DS  57355 8 2 95A57C3BAB7849DBCDDF7C72ADA71A88146B141110318CA5BE672057 E865C3E2
io.         86400   IN  DS  57355 8 1 434E91E206134F5B3B0AC603B26F5E029346ABC9
io.         86400   IN  RRSIG   DS 8 1 86400 20200123050000 20200110040000 33853 . QzrcRwQEbB2q2sS4rIXD5xIRLEWj6cy0OIEqhyicBh0AZW2aPEKJZQqN VH/HslG9GMo8Fiq9DWCaAZ297sItlxnM+bXEz65CheL7GuMHWklvn0TU nvfJcOdakJMdZ4bkkREP6LVh0T4b3z1VAP3kk9hrCwbxWaCMBD2TPKgw ZR7KYapH37t1EzvSUCJbH3NvyxoNgweQLs5GRymBU1Ihrtc9Tqxd/hfm SbPXacZKKbD6klUl98fkwzkqcfP6H15aC4e48TIITd32t8l5nzfYQD0K 6pPMdPlg9TJzFOxFRe6sYQ0G2Rgtjw14RpqkV6m64iWre6QTBENsv7IW m3jCkA==
;; Received 674 bytes from 192.112.36.4#53(g.root-servers.net) in 48 ms

softwarefactory-project.io. 86400 IN    NS  ns1.redhat.com.
softwarefactory-project.io. 86400 IN    NS  ns2.redhat.com.
softwarefactory-project.io. 86400 IN    NS  ns3.redhat.com.
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN NSEC3 1 1 1 D399EAAB 2IV073DVU92DLMV2H5L9G1PEODM9RDE6 NS SOA RRSIG DNSKEY NSEC3PARAM
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN RRSIG NSEC3 8 2 900 20200131155741 20200110145741 24399 io. IskLlHTer7JCZa+SHv3YyEG92/3Xy4BtrR+bMGvLJzD1fRcKlc8VoY21 j26S3G1TSikjmfkBgwavwnEEAWa+P8NNfO0sH/Xq/Sb4aT4a1gUByCkB QHdy4ovtJKERGlKtTT1NhhHJAKgoHNYj7upav7IzMFpP4iwRt234kKjf 8+Q=
4nt58f1lle4ru0kn1lh9tp3gjp4nce23.io. 900 IN NSEC3 1 1 1 D399EAAB 4O37P1QI41CL4UC3G6L955RN36FUEGT8 NS DS RRSIG
4nt58f1lle4ru0kn1lh9tp3gjp4nce23.io. 900 IN RRSIG NSEC3 8 2 900 20200130151733 20200109141733 24399 io. IWLxqXYEovklDZDJYKzBlweHaZYNKxNI1zPYAkHOwq3SS/MYXYtXYfls 3ei84y5wf1nl1zn4miDxTCy2BZpBSPWkxFRHdMAI/fPd17Is9eo6hpu4 vUvCr9MnXi2qb5Yn7rm/dX4ITdo7SfwNJ6bQcUXHFx7hAB4KqeS7sRMT 5z4=
;; Received 610 bytes from 65.22.163.17#53(a2.nic.io) in 29 ms

;; connection timed out; no servers could be reached

OK the problem is that we are unable to contact the Red Hat dns name servers inside of PHX2 for this domain.

OK this is a datacenter problem with our PHX2 location. the server it is going to try and talk to is unreachable and unless we define a specific exception in our dns servers, it can't be gotten around. I have added the needed changes to the system.

I have put in changes to DNS similar to other redhat.com zones and I am able to get an ip address for the system now. Please test and let me know.

Metadata Update from @smooge:
- Issue assigned to smooge

6 months ago

Thanks @smooge and @pingou ! I'll run my usual test the following days to check if the issue is gone.

I was logged as zuul. ^

@fbo should we close this ticket then?

Sorry, I've misread this, I read I've instead of I'll, I'll blame Monday for this! :)

I've run my check every days since your fix. That seems to be good now :) !
Then I close this issue. Thanks for the help !

Metadata Update from @fbo:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

6 months ago

Login to comment on this ticket.

Metadata