#7714 intermittent DNS issues resolving pagure.io
Closed: Insufficient data 13 days ago by kevin. Opened a month ago by mikeb.

When cloning repos from pagure.io, a couple percent of the time the clone will fail with:

Could not resolve host: pagure.io; Unknown error

In testing, we don't see the same kind of errors cloning from github or locations, so it seems to be an issue with the pagure.io DNS infra.


What tool is printing the message "Unknown error"? It would be useful to know what the error is. It might be useful to see if you can produce an error with dig:

$ dig pagure.io

Maybe run that a bunch of times on the host that is experiencing this problem and see if it ever fails and include failed output here.

We're seeing the errors in the context of a OpenShift/Jenkins pipeline, so it's not one particular host, it's a dynamic container running on internal infrastructure.

We ran a bunch of cronjobs over the course of a couple of days to rule out infra issues. I'll see if I can get those logs.

We aren't seeing any other dns reports, and all our nameservers seem to be working normally and pass checks.

So, not sure whats going on here...

Pagure.io is running on the same dns infrastructure as Fedoraproject.org's. If you are seeing problems with DNS with pagure.io you should see the same errors at the same time for any and all Fedoraproject.org resources.

We will need to get some sort of trace, timestamps or more detailed logs. DNS is a cached shared database so your query may be going through multiple intermediates which have their own policies on what they 'cache' for certain sites. Again we would need to see what your system is seeing for a working DNS entry versus a non-working one to see what to recommend.

@kevin Yep, running a cronjob, got a couple of instances of dig returning SERVFAIL immediately before a "git clone" failed. We're also investigating our internal infrastructure, but I'm including those logs here for reference. Any info you can provide about it would be helpful.

dig pagure.io returned:

; <<>> DiG 9.11.5-P4-RedHat-9.11.5-4.P4.fc29 <<>> pagure.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 33417
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1200
;; QUESTION SECTION:
;pagure.io.                     IN      A

;; Query time: 1 msec
;; SERVER: 10.0.19.59#53(10.0.19.59)
;; WHEN: Fri Apr 19 12:55:09 UTC 2019
;; MSG SIZE  rcvd: 38



['git', 'clone', '-n', 'https://pagure.io/fm-orchestrator.git'] exited with return code 128
stderr: Cloning into 'fm-orchestrator'...
fatal: unable to access 'https://pagure.io/fm-orchestrator.git/': Could not resolve host: pagure.io
dig pagure.io returned:

; <<>> DiG 9.11.5-P4-RedHat-9.11.5-4.P4.fc29 <<>> pagure.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5648
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1200
;; QUESTION SECTION:
;pagure.io.                     IN      A

;; Query time: 822 msec
;; SERVER: 10.0.19.69#53(10.0.19.69)
;; WHEN: Tue Apr 23 12:10:18 UTC 2019
;; MSG SIZE  rcvd: 38



['git', 'clone', '-n', 'https://pagure.io/fm-orchestrator.git'] exited with return code 128
stderr: Cloning into 'fm-orchestrator'...
fatal: unable to access 'https://pagure.io/fm-orchestrator.git/': Could not resolve host: pagure.io

For reference, here is the dig output from a successful lookup from the same systems:

dig pagure.io returned:

; <<>> DiG 9.11.5-P4-RedHat-9.11.5-4.P4.fc29 <<>> pagure.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15191
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 6

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1200
;; QUESTION SECTION:
;pagure.io.                     IN      A

;; ANSWER SECTION:
pagure.io.              49      IN      A       152.19.134.147

;; AUTHORITY SECTION:
pagure.io.              64733   IN      NS      ns05.fedoraproject.org.
pagure.io.              64733   IN      NS      ns04.fedoraproject.org.
pagure.io.              64733   IN      NS      ns02.fedoraproject.org.

;; ADDITIONAL SECTION:
ns02.fedoraproject.org. 56      IN      A       152.19.134.139
ns02.fedoraproject.org. 56      IN      AAAA    2610:28:3090:3001:dead:beef:cafe:fed5
ns05.fedoraproject.org. 56      IN      A       85.236.55.10
ns05.fedoraproject.org. 56      IN      AAAA    2001:4178:2:1269:dead:beef:cafe:fed5
ns04.fedoraproject.org. 40188   IN      A       209.132.181.17

;; Query time: 1 msec
;; SERVER: 10.0.19.69#53(10.0.19.69)
;; WHEN: Tue Apr 23 13:00:12 UTC 2019
;; MSG SIZE  rcvd: 232

@mikeb is this the Red Hat iinternal DNS server? And if yes, office or non-office?
Also, if it's the Red Hat nameservers, ask the DNS people to look at INC0312060.

It looks like your server is cpt-1052.paas.prod.upshift.rdu2.redhat.com so you will need to have the cron run on that or something similar to see where it is getting its DNS from. [A +trace may do it or it may fail somewhere because it is turned off. ]

Metadata Update from @smooge:
- Issue assigned to smooge

a month ago

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

a month ago

Here's the output of dig pagure.io +trace from lookup failure:

dig pagure.io returned:

; <<>> DiG 9.11.5-P4-RedHat-9.11.5-4.P4.fc29 <<>> pagure.io +trace
;; global options: +cmd
.                       429247  IN      NS      k.root-servers.net.
.                       429247  IN      NS      e.root-servers.net.
.                       429247  IN      NS      h.root-servers.net.
.                       429247  IN      NS      l.root-servers.net.
.                       429247  IN      NS      b.root-servers.net.
.                       429247  IN      NS      j.root-servers.net.
.                       429247  IN      NS      c.root-servers.net.
.                       429247  IN      NS      m.root-servers.net.
.                       429247  IN      NS      f.root-servers.net.
.                       429247  IN      NS      d.root-servers.net.
.                       429247  IN      NS      g.root-servers.net.
.                       429247  IN      NS      a.root-servers.net.
.                       429247  IN      NS      i.root-servers.net.
.                       429717  IN      RRSIG   NS 8 0 518400 20190506170000 20190423160000 25266 . tRFeXF0ccHkCHTB11jEKDzXtoQtiSrCDX3GRzqyLvl2D5+ML6yqEkYTc e9Bs2sKYmXFk2pdldVbub3n0IQTXAW5MSuWDWqv/WtCA5v6FCCJTXCm+ mGDSKEbTdfLJDfzxYunWUKo1sYCs2d8im5LFs0RJMY/1EIngrJK1ujkj JrSXZjdmlaUv1cTBIXuV/Xn3CansYP3wOwIY3W4fOVYgfLAE1MEvnAUR 0xxjFj1eXNuv3wYE5mYGtumYL1fPHiU/XAIACZj3FWdWiG2loDz/u+ty zGPB6t+Ms7DKbaFp7EiWskWL60zWzxHcd3vxOUL0o0Ic+8csLqL6tO1h zJA3nA==
;; Received 525 bytes from 10.0.19.52#53(10.0.19.52) in 1 ms

io.                     172800  IN      NS      a0.nic.io.
io.                     172800  IN      NS      a2.nic.io.
io.                     172800  IN      NS      b0.nic.io.
io.                     172800  IN      NS      c0.nic.io.
io.                     172800  IN      NS      ns-a1.io.
io.                     172800  IN      NS      ns-a3.io.
io.                     86400   IN      DS      57355 8 1 434E91E206134F5B3B0AC603B26F5E029346ABC9
io.                     86400   IN      DS      57355 8 2 95A57C3BAB7849DBCDDF7C72ADA71A88146B141110318CA5BE672057 E865C3E2
io.                     86400   IN      DS      64744 8 2 2E7D661097A76EAC145858E4FF8F3DDAE5EAEDFD527725BC6F8A943E 4FE23A29
io.                     86400   IN      RRSIG   DS 8 1 86400 20190507170000 20190424160000 25266 . L8BGQDYVnxzIwVS3bc+LfT+W6s/PTHYe93sbSV2WpEaBHLnbqUkq4iH5 W7iH3dC4KVvv1TGvNdK2kTokTC+pQVeZ9vX2VQY+FLy57+oPrkeGPULZ KOgxKcpky+/3+1BmJn4iS5t16S0pJ4x23O7J2Z7pXJ2LcM8XD3O14VSc 3pFs1od4+4PVspkmNjXebrxaL+L5WG1ofHGAwp1FpFWBok/PVTOyT6fD GivOIUS0u2HVsUUNOUoU1CL6YRYG2rqmBbSZ7xQ9nVsB6L6sGIcYfIvV osj4rCRsktTaoq7WGg0Z12VNZPJW3TM82VgSv2LMtU8kH1AyJNO1QW29 w6x4fw==
;; Received 805 bytes from 198.97.190.53#53(h.root-servers.net) in 41 ms

pagure.io.              86400   IN      NS      ns05.fedoraproject.org.
pagure.io.              86400   IN      NS      ns02.fedoraproject.org.
pagure.io.              86400   IN      NS      ns04.fedoraproject.org.
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN NSEC3 1 1 1 D399EAAB 2IV7T2DEE5N8V4AC5IHQK0MNI25BCHD7 NS SOA RRSIG DNSKEY NSEC3PARAM
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN RRSIG NSEC3 8 2 900 20190516042408 20190425032408 14241 io. mLsaV1wY5vC/r4i0J2Ka4wxtBgaXCOZLq1WfymZIDp/H5FWYcJARdjBP ZCpCIRI6ZhQq/0woCsBMfAyJK9fea3MH9jkJPV2xn01anI56/d/aYL+B K6gGA9U7RjXI+4kwot57t1/0oI2PsrrGwCTTcAy61DYI7czC3lA9GCay tEQ=
7g6i2vn5edu8ouv98b0mu4teu1cde93o.io. 900 IN NSEC3 1 1 1 D399EAAB 7G700S60GTHQ39B5CU5UAGJ0EOTK5Q74 NS DS RRSIG
7g6i2vn5edu8ouv98b0mu4teu1cde93o.io. 900 IN RRSIG NSEC3 8 2 900 20190508151831 20190417141831 14241 io. 0lHcV+SeqQXjWXU6yET8KvYqspa1XBAAIPlLyYnmhRRJUU/2mg73EbJh ILfWGne7OPYECR/PUJnF/NsHioWUNz3i2yNbPwYrc4Ff3futIXrwjcSd rRGPsywul158D9mB9OQjx15y+vPs0rNJ4+Z++IuezL9FdvH81YOootld +F0=
couldn't get address for 'ns05.fedoraproject.org': not found
couldn't get address for 'ns02.fedoraproject.org': not found
couldn't get address for 'ns04.fedoraproject.org': not found


['git', 'clone', '-n', 'https://pagure.io/fm-orchestrator.git'] exited with return code 128
stderr: Cloning into 'fm-orchestrator'...
fatal: unable to access 'https://pagure.io/fm-orchestrator.git/': Could not resolve host: pagure.io

So I think something in the network between your systems and the outside world is fritzing out... because we have multiple tests and it looks like you can't look up backbone network at that point (you go thte .io nameservers but could not get to the org nameservers to get ns04's ip address as it is stored there).

So I would try making the test go to fedoraproject.org versus pagure.io. and see if it gives a better answer.

Here's the output of a failing dig fedoraproject.org +trace:

dig fedoraproject.org +trace returned:
; <<>> DiG 9.11.5-P4-RedHat-9.11.5-4.P4.fc29 <<>> fedoraproject.org +trace
;; global options: +cmd
.           210898  IN  NS  j.root-servers.net.
.           210898  IN  NS  b.root-servers.net.
.           210898  IN  NS  e.root-servers.net.
.           210898  IN  NS  f.root-servers.net.
.           210898  IN  NS  i.root-servers.net.
.           210898  IN  NS  a.root-servers.net.
.           210898  IN  NS  c.root-servers.net.
.           210898  IN  NS  m.root-servers.net.
.           210898  IN  NS  g.root-servers.net.
.           210898  IN  NS  d.root-servers.net.
.           210898  IN  NS  k.root-servers.net.
.           210898  IN  NS  h.root-servers.net.
.           210898  IN  NS  l.root-servers.net.
.           470400  IN  RRSIG   NS 8 0 518400 20190509050000 20190426040000 25266 . eFpb+bFjhQ6eCBbLG7VqpTg4XVf0nUJeIKyAEwcA1CzX/SwZiSrQWwI6 +hRNtyxmjOMR5RB2DX6HB/rUMqlptaz6zCzHtwo5bBXfcnkOlSqrR68F nj9Dy97rtrVvu6jvxIuuwecRNkLcPF9CR5bgR3MDbQrH73cSd+2GD/6E EAsaiq2FvxOza9ic7Tbdc4ofAGfcNWd9mOEgWvQWlAjBqe+QoccbIcQV hrEmS/01ZJZWFT7txaDybwy+bjGqZlXkzoRxP9fWbSp6SeL1VwUK2vT9 VJO03p+Zxz/BAa15GGr9El+q8E98rJH23D3JPWyYB1hYxsJDwvPV+NkM N+yF9Q==
;; Received 1097 bytes from 10.0.19.52#53(10.0.19.52) in 1 ms
org.            172800  IN  NS  a0.org.afilias-nst.info.
org.            172800  IN  NS  a2.org.afilias-nst.info.
org.            172800  IN  NS  b0.org.afilias-nst.org.
org.            172800  IN  NS  b2.org.afilias-nst.org.
org.            172800  IN  NS  c0.org.afilias-nst.info.
org.            172800  IN  NS  d0.org.afilias-nst.org.
org.            86400   IN  DS  9795 7 1 364DFAB3DAF254CAB477B5675B10766DDAA24982
org.            86400   IN  DS  9795 7 2 3922B31B6F3A4EA92B19EB7B52120F031FD8E05FF0B03BAFCF9F891B FE7FF8E5
org.            86400   IN  RRSIG   DS 8 1 86400 20190509170000 20190426160000 25266 . U4vYJv+6buqxVXIVa86/ec+c/v7aE9daxfUtte9J13cKdi5JpMwc7VTx PpkiaKDgZg3SwEkZv5XNKq4KMBrQ8Qmr5MiDaRZVIgWWEYXuf9DERGLJ d2g9B4dxUY0szOxu/7W344lDWWKFnn9e8PhS/u1K9EfR0Cb3RvCmHMa5 kPXKK60m20KpRTOZ9tWNhjlY1v7HySdUxDN5rOgIlwAhoLdnBvfH+cYw GnYvyDfgDUvsiKpFqN+LWgg2DAKYMzLI3WA1KwDdj0NOn6Ka+FTrFxm9 xn/EF+oytGpO77VuqasBF1ncNM8HrYXs4//89FPwtlxB4JL76P5RnQDH UkpD1g==
;; Received 819 bytes from 192.58.128.30#53(j.root-servers.net) in 137 ms
fedoraproject.org.  86400   IN  NS  ns02.fedoraproject.org.
fedoraproject.org.  86400   IN  NS  ns04.fedoraproject.org.
fedoraproject.org.  86400   IN  NS  ns05.fedoraproject.org.
fedoraproject.org.  86400   IN  DS  16207 5 1 8DD099791A2A110851FDE5D14F6C62ADC3DD7C18
fedoraproject.org.  86400   IN  DS  16207 5 2 A7C9BF5AFE374C9650ED678F3D36931A7DE9256B86A7BC34D6DEED7D 4E492E5E
fedoraproject.org.  86400   IN  RRSIG   DS 7 2 86400 20190516152816 20190425142816 9062 org. mk8YWLTn8jFPhUnE5vQq6e/ohen1dxyGE5Hm4UqqpDBRr5BZqoLctGv8 fzvSWUgxmsjIvfj51eDE9KtAovZhqNwW5+RlNOtzZ8VXfkxIJ32oUmGr 6El48q2uowcmkaOhsU69+WxPSqbx6EwBNnquzL9prgJqvfQHuKdCmmKg pkE=
couldn't get address for 'ns02.fedoraproject.org': not found
couldn't get address for 'ns04.fedoraproject.org': not found
couldn't get address for 'ns05.fedoraproject.org': not found

dig pagure.io +trace and git clone https://pagure.io/fm-orchestrator.git both failed immediately after this.

So if the other dig fails after this.. then something in the network stack is not letting you talk to either your nearest DNS server or to the backbone. The DNS host you see 192.58.128.30 should have been able to give your query an ip address for ns

[smooge@smoogen-laptop ~]$ host ns04.fedoraproject.org. 192.58.128.30
Using domain server:
Name: 192.58.128.30
Address: 192.58.128.30#53
Aliases:

ns04.fedoraproject.org has address 209.132.181.17

Since this is happening constantly, something in your stack is interfering with DNS. Let's try and figure it out.. this could be UDP problems, dnssec problems OR something else

For UDP lets move to TCP
dig -4 +trace +tcp +all +nofail pagure.io

For dnssec
dig -4 +trace +nodnssec +all +nofail pagure.io

After that.. it is going to be something in how 10.0.19.52 and other hardware is setup and caching SERVFAIL

Updated the script to use the TCP command you provided above. I'll let you know when I get some results.

Failure for pagure.io.

dig -4 +trace +tcp +all +nofail pagure.io returned:

; <<>> DiG 9.11.5-P4-RedHat-9.11.5-4.P4.fc29 <<>> -4 +trace +tcp +all +nofail pagure.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10475
;; flags: qr ra; QUERY: 1, ANSWER: 14, AUTHORITY: 0, ADDITIONAL: 27

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1200
;; QUESTION SECTION:
;.              IN  NS

;; ANSWER SECTION:
.           151785  IN  NS  g.root-servers.net.
.           151785  IN  NS  j.root-servers.net.
.           151785  IN  NS  e.root-servers.net.
.           151785  IN  NS  m.root-servers.net.
.           151785  IN  NS  b.root-servers.net.
.           151785  IN  NS  f.root-servers.net.
.           151785  IN  NS  k.root-servers.net.
.           151785  IN  NS  l.root-servers.net.
.           151785  IN  NS  d.root-servers.net.
.           151785  IN  NS  i.root-servers.net.
.           151785  IN  NS  c.root-servers.net.
.           151785  IN  NS  a.root-servers.net.
.           151785  IN  NS  h.root-servers.net.
.           411287  IN  RRSIG   NS 8 0 518400 20190509050000 20190426040000 25266 . eFpb+bFjhQ6eCBbLG7VqpTg4XVf0nUJeIKyAEwcA1CzX/SwZiSrQWwI6 +hRNtyxmjOMR5RB2DX6HB/rUMqlptaz6zCzHtwo5bBXfcnkOlSqrR68F nj9Dy97rtrVvu6jvxIuuwecRNkLcPF9CR5bgR3MDbQrH73cSd+2GD/6E EAsaiq2FvxOza9ic7Tbdc4ofAGfcNWd9mOEgWvQWlAjBqe+QoccbIcQV hrEmS/01ZJZWFT7txaDybwy+bjGqZlXkzoRxP9fWbSp6SeL1VwUK2vT9 VJO03p+Zxz/BAa15GGr9El+q8E98rJH23D3JPWyYB1hYxsJDwvPV+NkM N+yF9Q==

;; ADDITIONAL SECTION:
a.root-servers.net. 391786  IN  A   198.41.0.4
a.root-servers.net. 391786  IN  AAAA    2001:503:ba3e::2:30
b.root-servers.net. 391786  IN  A   199.9.14.201
b.root-servers.net. 342769  IN  AAAA    2001:500:200::b
c.root-servers.net. 391786  IN  A   192.33.4.12
c.root-servers.net. 342769  IN  AAAA    2001:500:2::c
d.root-servers.net. 391786  IN  A   199.7.91.13
d.root-servers.net. 342769  IN  AAAA    2001:500:2d::d
e.root-servers.net. 391785  IN  A   192.203.230.10
e.root-servers.net. 342769  IN  AAAA    2001:500:a8::e
f.root-servers.net. 497687  IN  A   192.5.5.241
f.root-servers.net. 391786  IN  AAAA    2001:500:2f::f
g.root-servers.net. 391784  IN  A   192.112.36.4
g.root-servers.net. 342769  IN  AAAA    2001:500:12::d0d
h.root-servers.net. 391784  IN  A   198.97.190.53
h.root-servers.net. 391784  IN  AAAA    2001:500:1::53
i.root-servers.net. 391785  IN  A   192.36.148.17
i.root-servers.net. 391785  IN  AAAA    2001:7fe::53
j.root-servers.net. 391786  IN  A   192.58.128.30
j.root-servers.net. 391786  IN  AAAA    2001:503:c27::2:30
k.root-servers.net. 391785  IN  A   193.0.14.129
k.root-servers.net. 391785  IN  AAAA    2001:7fd::1
l.root-servers.net. 391784  IN  A   199.7.83.42
l.root-servers.net. 391784  IN  AAAA    2001:500:9f::42
m.root-servers.net. 391786  IN  A   202.12.27.33
m.root-servers.net. 391786  IN  AAAA    2001:dc3::35

;; Query time: 1 msec
;; SERVER: 10.0.19.72#53(10.0.19.72)
;; WHEN: Sat Apr 27 12:35:24 UTC 2019
;; MSG SIZE  rcvd: 1097

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14023
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 10, ADDITIONAL: 12

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;pagure.io.         IN  A

;; AUTHORITY SECTION:
io.         172800  IN  NS  c0.nic.io.
io.         172800  IN  NS  ns-a3.io.
io.         172800  IN  NS  ns-a1.io.
io.         172800  IN  NS  a2.nic.io.
io.         172800  IN  NS  a0.nic.io.
io.         172800  IN  NS  b0.nic.io.
io.         86400   IN  DS  64744 8 2 2E7D661097A76EAC145858E4FF8F3DDAE5EAEDFD527725BC6F8A943E 4FE23A29
io.         86400   IN  DS  57355 8 1 434E91E206134F5B3B0AC603B26F5E029346ABC9
io.         86400   IN  DS  57355 8 2 95A57C3BAB7849DBCDDF7C72ADA71A88146B141110318CA5BE672057 E865C3E2
io.         86400   IN  RRSIG   DS 8 1 86400 20190510050000 20190427040000 25266 . A91TS7yYaosGrb8Y/Vl5NGqeKfxwPTyFhR6SRhNunniC9wtz8Q72/k0k 3EROC4gixIswydnHd76cN54EYtan/8ptoPthK82WIfm70b/blR9GpTwO TFu62x3PTuY4oVKX4guOhZkRDGM1iKxoZlTEh0BfpF/y3dTDWDBpk+LP /9Mlz0J8TiquN45icI47sDHn9wbKrT3WZc47rd5P0cpHBHZzH72mzRly u1BZFx5LMVy/wEVW19m0M2/3axc1gPSo6uhjMrLY9o4J0ljM3AWBZDzL o/DG9611horvNjbmmNbI8zcxJHOQL39zJPJrfX9m6L0PaUJcfLXs0NuD o3PinQ==

;; ADDITIONAL SECTION:
a0.nic.io.      172800  IN  A   65.22.160.17
a2.nic.io.      172800  IN  A   65.22.163.17
b0.nic.io.      172800  IN  A   65.22.161.17
c0.nic.io.      172800  IN  A   65.22.162.17
ns-a1.io.       172800  IN  A   194.0.1.1
ns-a3.io.       172800  IN  A   74.116.178.1
a0.nic.io.      172800  IN  AAAA    2a01:8840:9e::17
a2.nic.io.      172800  IN  AAAA    2a01:8840:a1::17
b0.nic.io.      172800  IN  AAAA    2a01:8840:9f::17
c0.nic.io.      172800  IN  AAAA    2a01:8840:a0::17
ns-a1.io.       172800  IN  AAAA    2001:678:4::1

;; Query time: 87 msec
;; SERVER: 202.12.27.33#53(202.12.27.33)
;; WHEN: Sat Apr 27 12:35:24 UTC 2019
;; MSG SIZE  rcvd: 805

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35470
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 7, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;pagure.io.         IN  A

;; AUTHORITY SECTION:
pagure.io.      86400   IN  NS  ns02.fedoraproject.org.
pagure.io.      86400   IN  NS  ns04.fedoraproject.org.
pagure.io.      86400   IN  NS  ns05.fedoraproject.org.
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN NSEC3 1 1 1 D399EAAB 2IV7T2DEE5N8V4AC5IHQK0MNI25BCHD7 NS SOA RRSIG DNSKEY NSEC3PARAM
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN RRSIG NSEC3 8 2 900 20190518123102 20190427113102 14241 io. OJFjA4xMlVlzsuU0joo3ptvynJ/zgf1qkx3xPFOvsjH4JNK1EVJfqYm3 sKBHWM6bBlF3PC2LBp+oGP1OlMT9KjXZvcJKBQfPRTFyxzGaPyo2n0+o d8ruXKUg9xejk2yf1P4Z+m3MZIQS28wcDQjYjN84YYwO1U3I2muQtIs1 DGo=
7g6i2vn5edu8ouv98b0mu4teu1cde93o.io. 900 IN NSEC3 1 1 1 D399EAAB 7G700S60GTHQ39B5CU5UAGJ0EOTK5Q74 NS DS RRSIG
7g6i2vn5edu8ouv98b0mu4teu1cde93o.io. 900 IN RRSIG NSEC3 8 2 900 20190516151656 20190425141656 14241 io. mlxzp9lNKHJ39HZF3aARKGfwvo85pJpR3Kwg1kD5Pm/dUUcLV4GXN2sq Mrbq0IT7LWujEV9Vy7lQn4wtxaKhsNOPmcc8k1a08uaHe6G7rMnbUg5D BvVJpOX+kjtn57ElwNNN1oAECz8k2kKiEaQPx/nWFGHSZSrlQnPGoKIt 9IA=

couldn't get address for 'ns02.fedoraproject.org': not found
couldn't get address for 'ns04.fedoraproject.org': not found
couldn't get address for 'ns05.fedoraproject.org': not found


['git', 'clone', '-n', 'https://pagure.io/fm-orchestrator.git'] exited with return code 128
stderr: Cloning into 'fm-orchestrator'...
fatal: unable to access 'https://pagure.io/fm-orchestrator.git/': Could not resolve host: pagure.io

Got a failure for fedoraproject.org at the same time:

dig fedoraproject.org +trace returned:

; <<>> DiG 9.11.5-P4-RedHat-9.11.5-4.P4.fc29 <<>> fedoraproject.org +trace
;; global options: +cmd
.           151790  IN  NS  h.root-servers.net.
.           151790  IN  NS  j.root-servers.net.
.           151790  IN  NS  m.root-servers.net.
.           151790  IN  NS  b.root-servers.net.
.           151790  IN  NS  l.root-servers.net.
.           151790  IN  NS  a.root-servers.net.
.           151790  IN  NS  d.root-servers.net.
.           151790  IN  NS  k.root-servers.net.
.           151790  IN  NS  c.root-servers.net.
.           151790  IN  NS  e.root-servers.net.
.           151790  IN  NS  f.root-servers.net.
.           151790  IN  NS  g.root-servers.net.
.           151790  IN  NS  i.root-servers.net.
.           411292  IN  RRSIG   NS 8 0 518400 20190509050000 20190426040000 25266 . eFpb+bFjhQ6eCBbLG7VqpTg4XVf0nUJeIKyAEwcA1CzX/SwZiSrQWwI6 +hRNtyxmjOMR5RB2DX6HB/rUMqlptaz6zCzHtwo5bBXfcnkOlSqrR68F nj9Dy97rtrVvu6jvxIuuwecRNkLcPF9CR5bgR3MDbQrH73cSd+2GD/6E EAsaiq2FvxOza9ic7Tbdc4ofAGfcNWd9mOEgWvQWlAjBqe+QoccbIcQV hrEmS/01ZJZWFT7txaDybwy+bjGqZlXkzoRxP9fWbSp6SeL1VwUK2vT9 VJO03p+Zxz/BAa15GGr9El+q8E98rJH23D3JPWyYB1hYxsJDwvPV+NkM N+yF9Q==
;; Received 1097 bytes from 10.0.19.72#53(10.0.19.72) in 1 ms

org.            172800  IN  NS  b0.org.afilias-nst.org.
org.            172800  IN  NS  a2.org.afilias-nst.info.
org.            172800  IN  NS  d0.org.afilias-nst.org.
org.            172800  IN  NS  c0.org.afilias-nst.info.
org.            172800  IN  NS  a0.org.afilias-nst.info.
org.            172800  IN  NS  b2.org.afilias-nst.org.
org.            86400   IN  DS  9795 7 1 364DFAB3DAF254CAB477B5675B10766DDAA24982
org.            86400   IN  DS  9795 7 2 3922B31B6F3A4EA92B19EB7B52120F031FD8E05FF0B03BAFCF9F891B FE7FF8E5
org.            86400   IN  RRSIG   DS 8 1 86400 20190510050000 20190427040000 25266 . MIYbJzMshNdlDb8ncdYRdkEei/hTLQq8l/rKNuBfRval+cNIKv7xjEif vv04hxhl3Lauu6PrJbAnVnJihQNJMRZmy/Wf1gzUfhdkVo8dbomqojPP JV7aW18e7YJXiB6geVPXvgZUzqeaIMUT4+NTU3G+aq2Raxy+nXVLhE4N E3GJmbTfQ6kDSlpX8z/CQV1CLXE51SKTGCcaERpM08fOxhbctoB7gOsH uUGnQBWG1PYiNG1udqSPOvxplA5D3XKvnudIu6MXe87tHcayhrWnxL/+ 1tsn/SV2vZq3UxscphpZnKv87DAvauSxJo9H3OYqZHG9P+FPO8D4GmKz CTM/pA==
;; Received 847 bytes from 192.112.36.4#53(g.root-servers.net) in 56 ms

fedoraproject.org.  86400   IN  NS  ns02.fedoraproject.org.
fedoraproject.org.  86400   IN  NS  ns04.fedoraproject.org.
fedoraproject.org.  86400   IN  NS  ns05.fedoraproject.org.
fedoraproject.org.  86400   IN  DS  16207 5 1 8DD099791A2A110851FDE5D14F6C62ADC3DD7C18
fedoraproject.org.  86400   IN  DS  16207 5 2 A7C9BF5AFE374C9650ED678F3D36931A7DE9256B86A7BC34D6DEED7D 4E492E5E
fedoraproject.org.  86400   IN  RRSIG   DS 7 2 86400 20190516152816 20190425142816 9062 org. mk8YWLTn8jFPhUnE5vQq6e/ohen1dxyGE5Hm4UqqpDBRr5BZqoLctGv8 fzvSWUgxmsjIvfj51eDE9KtAovZhqNwW5+RlNOtzZ8VXfkxIJ32oUmGr 6El48q2uowcmkaOhsU69+WxPSqbx6EwBNnquzL9prgJqvfQHuKdCmmKg pkE=
couldn't get address for 'ns02.fedoraproject.org': not found
couldn't get address for 'ns04.fedoraproject.org': not found
couldn't get address for 'ns05.fedoraproject.org': not found

So I was wrong and +tcp is not working the way I was hoping

; EDNS: version: 0, flags: do; udp: 4096

if I am reading this correctly, you aren't getting tcp but udp packets.

The only thing which stands out now is that your last query failures are all 4096 UDP packets which should contain all the data you need to get to any of the nsXX.fedoraproject.org (that is stored in the .org and .io servers and in the whois data for the domain.) Since you are failing to get that data before you get to our servers, this says that it is something to do with your network.

That second failure you are seeing is because some intermediate DNS service (either on your local system (aka dnsmasq or nscd or ...) is negatively caching failures to speed up delivery. So when the first 'I cant figure out fedoraproject.org' happens you are poop out of luck until the negative cache result times out.

At this point, I would go get tcpdumps of the transactions and go over with your network and security engineers to see what might be corrupting/dropping or killing your DNS packets from the backbone.

  1. You have a network packet issue in the stack which is breaking large DNS packets needed for signed dns.
  2. You have a caching nameserver which is overloaded and going nope.
  3. You are doing too many queries and an upstream is going 'nope' so you need a caching server
  4. You have a security device which is saying 'something wrong with this dns'.. nope.
  5. You have a network layer issue with large packets. Being that this is UDP then they could be corrupted, they could be fragmented, or they could be dropped.

The reason I am pretty sure it is nothing on our side is.
1. There are a lot of CI hitting our DNS and we aren't getting any other reports of this failure.
2. The area the traces are showing being broken isn't with any of our servers but with backbone systems. The way DNS works is

Client talks to . (dot) servers.
dot servers tell client where to get .io or .org nameservers
client talks to .io and .org nameservers
.io/.org nameservers tell client where to find the fedoraproject nameservers.
You are getting told that it can't get the address for those servers which would be what those servers would be telling you.

Did you see any failures with +nodnssec options? If you dont' then change all your tests to that and see if you get failures then.. if you aren't.. then I am going to say it could be switch/router/security hardware doing introspection, not liking the signatures or dropping data.

Just saw another failure to get the addresses for nic.io servers:

dig -4 +trace +tcp +all +nofail pagure.io returned:

; <<>> DiG 9.11.5-P4-RedHat-9.11.5-4.P4.fc29 <<>> -4 +trace +tcp +all +nofail pagure.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45743
;; flags: qr ra; QUERY: 1, ANSWER: 14, AUTHORITY: 0, ADDITIONAL: 27

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1200
;; QUESTION SECTION:
;.              IN  NS

;; ANSWER SECTION:
.           483309  IN  NS  l.root-servers.net.
.           483309  IN  NS  f.root-servers.net.
.           483309  IN  NS  k.root-servers.net.
.           483309  IN  NS  d.root-servers.net.
.           483309  IN  NS  c.root-servers.net.
.           483309  IN  NS  i.root-servers.net.
.           483309  IN  NS  e.root-servers.net.
.           483309  IN  NS  a.root-servers.net.
.           483309  IN  NS  b.root-servers.net.
.           483309  IN  NS  j.root-servers.net.
.           483309  IN  NS  h.root-servers.net.
.           483309  IN  NS  m.root-servers.net.
.           483309  IN  NS  g.root-servers.net.
.           483309  IN  RRSIG   NS 8 0 518400 20190512050000 20190429040000 25266 . bQWAaqwMGyuKJ43sy8YDogYmQbm0CPjSlIxhdSa5QhQXjWArYKeHpS/F oaoDGBoDxxTkNKDqhFp5NWZikNXGfzDr6VdYnWoRzhscK7gMC0UFdiLf HelwaJ8agLehlq9Hp6mX2AVUdTd0UfZcRioI3OS6azSMGEocNI96T4+9 AJ633UU62cSMEzxE/t+5U6p2Vc/JDwg4Ji9n9mPNJSN3oeBlyB4MXfLz 0/GpNbEagyWJOhWzpRyo4/DOTFxG8tyrnZWYLe88f8Brkdxm0AFg7xAh E55hO+57oGciCR0xffYvtJMX/oPll1Qa6tlGBBIZXtKwSsiktKA115Mw w6mLWQ==

;; ADDITIONAL SECTION:
l.root-servers.net. 204894  IN  A   199.7.83.42
l.root-servers.net. 204894  IN  AAAA    2001:500:9f::42
c.root-servers.net. 204896  IN  A   192.33.4.12
c.root-servers.net. 155879  IN  AAAA    2001:500:2::c
d.root-servers.net. 204896  IN  A   199.7.91.13
d.root-servers.net. 155879  IN  AAAA    2001:500:2d::d
g.root-servers.net. 204894  IN  A   192.112.36.4
g.root-servers.net. 155879  IN  AAAA    2001:500:12::d0d
m.root-servers.net. 204896  IN  A   202.12.27.33
m.root-servers.net. 204896  IN  AAAA    2001:dc3::35
f.root-servers.net. 310797  IN  A   192.5.5.241
f.root-servers.net. 204896  IN  AAAA    2001:500:2f::f
e.root-servers.net. 204895  IN  A   192.203.230.10
e.root-servers.net. 155879  IN  AAAA    2001:500:a8::e
b.root-servers.net. 204896  IN  A   199.9.14.201
b.root-servers.net. 155879  IN  AAAA    2001:500:200::b
k.root-servers.net. 204895  IN  A   193.0.14.129
k.root-servers.net. 204895  IN  AAAA    2001:7fd::1
j.root-servers.net. 204896  IN  A   192.58.128.30
j.root-servers.net. 204896  IN  AAAA    2001:503:c27::2:30
h.root-servers.net. 204894  IN  A   198.97.190.53
h.root-servers.net. 204894  IN  AAAA    2001:500:1::53
i.root-servers.net. 204895  IN  A   192.36.148.17
i.root-servers.net. 204895  IN  AAAA    2001:7fe::53
a.root-servers.net. 204896  IN  A   198.41.0.4
a.root-servers.net. 204896  IN  AAAA    2001:503:ba3e::2:30

;; Query time: 1 msec
;; SERVER: 10.0.19.77#53(10.0.19.77)
;; WHEN: Mon Apr 29 16:30:14 UTC 2019
;; MSG SIZE  rcvd: 1097

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50165
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 10, ADDITIONAL: 12

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 65535
;; QUESTION SECTION:
;pagure.io.         IN  A

;; AUTHORITY SECTION:
io.         172800  IN  NS  a0.nic.io.
io.         172800  IN  NS  a2.nic.io.
io.         172800  IN  NS  b0.nic.io.
io.         172800  IN  NS  c0.nic.io.
io.         172800  IN  NS  ns-a1.io.
io.         172800  IN  NS  ns-a3.io.
io.         86400   IN  DS  57355 8 1 434E91E206134F5B3B0AC603B26F5E029346ABC9
io.         86400   IN  DS  57355 8 2 95A57C3BAB7849DBCDDF7C72ADA71A88146B141110318CA5BE672057 E865C3E2
io.         86400   IN  DS  64744 8 2 2E7D661097A76EAC145858E4FF8F3DDAE5EAEDFD527725BC6F8A943E 4FE23A29
io.         86400   IN  RRSIG   DS 8 1 86400 20190512050000 20190429040000 25266 . C4mN1OH0NGXRGF3OtauyB6BFec17jNraI5XaqhaL/w0yukLW8ZVIYgru S9PCWagRZdF/IdgGqiFjwU3Wy+yCl5gHIu4+FO0r1697QPibF0YyIkYD FrpO14oZqskxE/HMWbTFIDnrQdwc3P3FtQbjkkfjoysY0QDi+8p/8oUM giEsyefd+xqiZSp8cz7i1TeRXyhuExfkNwF0ZpB74ctATrDAU+nij16A reGoVTEXwydkHfUXGWye5bx1ETa9n9csRDmwqf3JuHggC6eFbZvK2oix okT5csCl8KXV40rjc/q3HUp16lpicfB9L1BZOHfuiMr5scXcsz6J84kd 0I+rUg==

;; ADDITIONAL SECTION:
a0.nic.io.      172800  IN  A   65.22.160.17
a0.nic.io.      172800  IN  AAAA    2a01:8840:9e::17
a2.nic.io.      172800  IN  A   65.22.163.17
a2.nic.io.      172800  IN  AAAA    2a01:8840:a1::17
b0.nic.io.      172800  IN  A   65.22.161.17
b0.nic.io.      172800  IN  AAAA    2a01:8840:9f::17
c0.nic.io.      172800  IN  A   65.22.162.17
c0.nic.io.      172800  IN  AAAA    2a01:8840:a0::17
ns-a1.io.       172800  IN  A   194.0.1.1
ns-a1.io.       172800  IN  AAAA    2001:678:4::1
ns-a3.io.       172800  IN  A   74.116.178.1

couldn't get address for 'a0.nic.io': not found
couldn't get address for 'a2.nic.io': not found
couldn't get address for 'b0.nic.io': not found
couldn't get address for 'c0.nic.io': not found
;; Query time: 22 msec
;; SERVER: 192.5.5.241#53(192.5.5.241)
;; WHEN: Mon Apr 29 16:30:14 UTC 2019
;; MSG SIZE  rcvd: 805

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54850
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 7, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;pagure.io.         IN  A

;; AUTHORITY SECTION:
pagure.io.      86400   IN  NS  ns02.fedoraproject.org.
pagure.io.      86400   IN  NS  ns04.fedoraproject.org.
pagure.io.      86400   IN  NS  ns05.fedoraproject.org.
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN NSEC3 1 1 1 D399EAAB 2IV7T2DEE5N8V4AC5IHQK0MNI25BCHD7 NS SOA RRSIG DNSKEY NSEC3PARAM
2iui5t1khct6c5o8i2i67rppatgvegqo.io. 900 IN RRSIG NSEC3 8 2 900 20190520162725 20190429152725 14241 io. JIFqc/enUsLLFNLZ0+TAiLCk42Cr6MqDfL1rwIIlwOIJ6uJCEgREUxA4 yz11b0IE2GOptldDSMUcIxKIGfuqgWilhymtYmf6CoTzVRK8/tqUHPYr U/CHmRol8XyNPvzMkjqDIh4iXrztLN5cNIlaRe9puPIIq56+LZLW+iLd 8Ts=
7g6i2vn5edu8ouv98b0mu4teu1cde93o.io. 900 IN NSEC3 1 1 1 D399EAAB 7G700S60GTHQ39B5CU5UAGJ0EOTK5Q74 NS DS RRSIG
7g6i2vn5edu8ouv98b0mu4teu1cde93o.io. 900 IN RRSIG NSEC3 8 2 900 20190516151656 20190425141656 14241 io. mlxzp9lNKHJ39HZF3aARKGfwvo85pJpR3Kwg1kD5Pm/dUUcLV4GXN2sq Mrbq0IT7LWujEV9Vy7lQn4wtxaKhsNOPmcc8k1a08uaHe6G7rMnbUg5D BvVJpOX+kjtn57ElwNNN1oAECz8k2kKiEaQPx/nWFGHSZSrlQnPGoKIt 9IA=

;; Query time: 18 msec
;; SERVER: 74.116.178.1#53(74.116.178.1)
;; WHEN: Mon Apr 29 16:30:14 UTC 2019
;; MSG SIZE  rcvd: 641

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31624
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;pagure.io.         IN  A

;; ANSWER SECTION:
pagure.io.      60  IN  A   152.19.134.147

;; Query time: 59 msec
;; SERVER: 209.132.181.17#53(209.132.181.17)
;; WHEN: Mon Apr 29 16:30:14 UTC 2019
;; MSG SIZE  rcvd: 54



['git', 'clone', '-n', 'https://pagure.io/fm-orchestrator.git'] exited with return code 128
stderr: Cloning into 'fm-orchestrator'...
fatal: unable to access 'https://pagure.io/fm-orchestrator.git/': Could not resolve host: pagure.io

I guess this supports the idea that something in the internal network is blocking some small percentage of DNS traffic?

I believe so. In that case you weren't even able to talk to the backbone DNS servers.

Looking at the large packets which are failing... I am guessing there is some sort of fragmentation going on. But it could be a bunch of things. In the end you will need to set up tcpdump and capture what is going on and see what might be going on.

There are lots of different things which could be causing this.. and you may just need to set up a caching DNS server at different levels until you find out which part is rewriting or fragmenting your packets.

Since this does not look like us.. can we close this ticket?

I might have figured out whats going on here.

It seems that the internal to redhat.com dns queries are coming to our nameservers from 2 ip's in phx2, and the volume of them reached a point where it started hitting our rate-limits we put in place to avoid DOS attacks long ago.

I've now whitelisted those 2 redhat.com ip's. Can you see if this clears up what you are seeing, or if doesn't and this is something else?

If it does.. then I will want to know how we are affecting the top level DNS servers that he can't get to .io nameservers either. Because that is going to be some major problem to fix :smile:

Ahh, interesting. I'll keep an eye out. Thanks!

@mikeb any further news here? Is it working since that change, or still having issues, but likely on your side?

Shall we keep this open, or close it out?

I am still seeing issues, but I haven't been able to determine if it's on our side or yours. If no one else is reporting issues, I suppose you can close this out and I'll reopen it if we find something more conclusive.

ok, please do let us know if there's any further info we can provide to help out, etc...

Metadata Update from @kevin:
- Issue close_status updated to: Insufficient data
- Issue status updated to: Closed (was: Open)

13 days ago

Login to comment on this ticket.

Metadata