#9060 retrace.fedoraproject.org is down
Closed: Fixed 3 years ago by kevin. Opened 3 years ago by msuchy.

Could not resolve host: retrace.fedoraproject.org

Additionally, the instruction at
./docs/sysadmin-guide/sops/dns.rst

does not work:

[msuchy@batcave01 ~][PROD-IAD2]$ git clone /git/dns
fatal: repository '/git/dns' does not exist

retrace.fedoraproject.org is meant to go to RDU-CC. It was meant to be brought back before the IAD2 move but has been delayed because of networking issues in RDU.
So it will only be back after the work in IAD2 has been completed (so most likely not before the end of July).

I'm going to be adding it in https://hackmd.io/hpYYJQRjQy-oHxUS7IonIA?edit so it's referenced there.

Just to explain impact. Users are not able to report bugs from a system using ABRT. It would be great to fix this ASAP or they could be really bad user experience.

Just to explain impact. Users are not able to report bugs from a system using ABRT. It would be great to fix this ASAP or they could be really bad user experience.

Ahh, I just tried reporting a graphics driver crash and in abrt I see this:

--- Running report_uReport ---
Failed to upload uReport to the server 'https://retrace.fedoraproject.org/faf' with curl: Could not resolve host: retrace.fedoraproject.org
Error: curl_easy_perform: Couldn't resolve host name
('report_uReport' exited with 1)

Just to explain impact. Users are not able to report bugs from a system using ABRT. It would be great to fix this ASAP or they could be really bad user experience.

Well, I am not sure how we can do that. The hardware is racked but has no network.

@msuchy is it practical/possible to spin up a temp one in aws?

@msuchy is it practical/possible to spin up a temp one in aws?

We have a staging environment in AWS, so theoretically yes. But we cannot have the previous DB, and merging these two DB's later would be a pain.

But ETA of end of July is pretty long :( I will check what we can do.

Metadata Update from @smooge:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: cloud, groomed, high-gain, high-trouble

3 years ago

If retrace isn't that important, how about setting up a dummy server which answers requests and throws them away, so that you can still report bugs to bugzilla?

If retrace isn't that important, how about setting up a dummy server which answers requests and throws them away, so that you can still report bugs to bugzilla?

You still can... when you start the process abrt lets you select retrace or some other options, like debug locally, or just file a bug.

You still can... when you start the process abrt lets you select retrace or some other options, like debug locally, or just file a bug.

I can't. ABRT does not show any options after I click "Report". It first tries to report to the retrace server and then immediately fails.

What is the state of this issue as the host is still not available after a month now...

its now August and its still down for me as well. https://retrace.fedoraproject.org/faf.
i wonder what did they find?

files and reports are vanishing from my workstations abrt folder. maybey because of low disk space, still getting those arrant arp cache entry's, iv installed fail2ban now my workstation is logging out while im doing things, while lightning wire labs DNS pointer is being deleted.

If you'll note status has a link at the top to https://fedoraproject.org/wiki/Infrastructure/2020-post-datacenter-move-known-issues which definitely mentions that retrace is down.

The work to bring things up in that RDU2 datacenter has gone very slowly for a number of reasons: covid restrictions, finding out that the networking was very different from what we thought it was and needs redoing, physical location of servers is difficult to bring up without infrastructure changes (moving, or power + networking), and finally because everyone who would move it forward has been trying to finish up the IAD2 datacenter move that is higher priority.

Rest assured this is not forgotten, it's just proving to be difficult. We are exploring ways to get something on line for retrace (and copr ppc64le builders and arm maintainer instances) sooner rather than later.

I'm very sorry it's taking so long... but we are doing the best we can.

Can you update please status.fedoraproject.org to match? It used to say it is down, and currently it shows as up, which contradicts the status linked from the top of the page that you mentioned:
"ABRT Server retrace
good Everything seems to be working."

Metadata Update from @kevin:
- Issue untagged with: groomed
- Issue assigned to kevin
- Issue tagged with: ops

3 years ago

retrace03.rdu-cc.fedoraproject.org is up now and has had some initial config done on it.

@msuchy is going to look at finishing the deployment on it.

Once thats done, we can change dns to point retrace.fedoraproject.org to it.

@msuchy , ping? it might help some or a lot with our current crash reporting woes that are blocking Fedora 33 Beta if the retrace server were up again.

Still down... :(
--- Running report_uReport ---
Falha ao enviar uReport para o servidor 'https://retrace.fedoraproject.org/faf' com curl: Could not resolve host: retrace.fedoraproject.org
Erro: curl_easy_perform: Couldn't resolve host name
('report_uReport' saiu com 1)

@geraldosimiao I know it's still down, I was attempting to get a status report on where we are wrt getting it back up again :)

Ok, thanks for the reply.
:)

Em sex, 18 de set de 2020 17:59, Adam Williamson pagure@pagure.io
escreveu:

adamwill added a new comment to an issue you are following:
@geraldosimiao I know it's still down, I was attempting to get a status report on where we are wrt getting it back *up* again :)

To reply, visit the link below or just reply to this email
https://pagure.io/fedora-infrastructure/issue/9060

Is it possible to get retrace.fedoraproject.org to resolve to something, anything, such that abrt's fallback will work?

@ekulik what's the minimum response needed for abrt to do that? Right now it just goes nowhere, maybe if it just 404'd it'd work?

$ traceroute retrace.fedoraproject.org
retrace.fedoraproject.org: Name or service not known

I just updated the domain record and it should point to a new server. In a dozen minutes it should propagate.
Tomorrow I plan to test against real reporting whether everything is correctly set up (should be). Any other testing is appreciated.

I just updated the domain record and it should point to a new server. In a dozen minutes it should propagate.
Tomorrow I plan to test against real reporting whether everything is correctly set up (should be). Any other testing is appreciated.

Bravo

I confirm that I can now submit bugreports for crashes, which I've been unable to do for months while retrace was down (e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1881695 and https://bugzilla.redhat.com/show_bug.cgi?id=1881701). It now offers the choice for doing a local retrace by downloading debuginfo packages, see the result at https://bugzilla.redhat.com/attachment.cgi?id=1715829. Perhaps the bugreporting code should be patched to deal with retrace being down, should this happen in the future.

Note that some bugs seem to be not reportable anymore, since their coredumps are gone.
coredumpctl list shows 33 coredumps as missing, after some filtering the unique crashing apps are:

/usr/bin/Xwayland
/usr/bin/abrt-applet
/usr/bin/gnome-shell
/usr/bin/nvim
/usr/bin/remmina
/usr/bin/wl-copy

and /usr/lib/tmpfiles.d/systemd.conf has

d /var/lib/systemd/coredump 0755 root root 3d

I'll report them the next time they crash...

@edwintorok please report these upstream to the abrt team https://github.com/abrt. Fedora Infrastructure only gives power/network to the service, we do not write/fix this service.

Since the retrace server is back online now, I am closing this ticket.

Please let us know if there's anything further we can assist with.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

For me the retrace server is still not working at https://retrace.fedoraproject.org

For me the retrace server is still not working at https://retrace.fedoraproject.org

I can ping retrace.fedoraproject.org, but if I try to send a report i get the following error:
--- Running report_uReport ---
The URL 'retrace.fedoraproject.org' does not exist (got error 404 from server)
('report_uReport' exited with 1)

Server has run out of disk space due to podman overlays growing to 11gb per overlay this week. We only provide network/power to this service and I exhausted what I could to free up disk space without possibly breaking the service completely. I have alerted the main retrace admin to see what could hapen.

Thanks for the update :-)

The issue with disk has been resolved. We are still working on it, but it should not be user-visible.

Login to comment on this ticket.

Metadata
Boards 1
ops Status: Done