#7709 Failing ODCS compose causes Flatpak build failure
Closed: Fixed 4 months ago by kevin. Opened 4 months ago by feborges.

Some Flatpak builds are failing with very little error messages explaining the reasoning. ODCS doesn't expose the logs for how the compose of the modules failed.

See https://koji.fedoraproject.org/koji/taskinfo?taskID=34112111 for Evince and https://koji.fedoraproject.org/koji/taskinfo?taskID=34126087 for GNOME Calculator.

Thoughts @jkaluza, @otaylor?


Both composes failed due to:

Traceback (most recent call last):
  File "/usr/bin/pungi-koji", line 489, in <module>
    main()
  File "/usr/bin/pungi-koji", line 249, in main
    compose_dir = Compose.get_compose_dir(opts.target_dir, conf, compose_type=compose_type, compose_label=opts.label)
  File "/usr/lib/python3.6/site-packages/pungi/compose.py", line 88, in get_compose_dir
    os.makedirs(compose_dir)
  File "/usr/lib64/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
OSError: [Errno 5] Input/output error: '/srv/odcs/odcs-237-1-20190411.n.0'

Edit: wrong paste, fixed traceback

Metadata Update from @mizdebsk:
- Issue priority set to: Waiting on Assignee (was: Needs Review)

4 months ago

Simpler reproducer:

[root@odcs-backend01 ~][PROD]# touch /srv/odcs/foo
touch: cannot touch '/srv/odcs/foo': Input/output error

This gives more info: touch: cannot touch '/srv/odcs/foo': Transport endpoint is not connected
So it looks like intstances of GlusterFS can't talk to each other.

The I/O error was fixed with: ansible -m shell -a "umount /srv/odcs && mount /srv/odcs" odcs-frontend:odcs-backend
But the compose is still failing. Investigating...

Metadata Update from @mizdebsk:
- Issue assigned to mizdebsk

4 months ago

Now the problem is that fedmsg user can't write to /srv/odcs - it gets permission denied. File permissions look fine, facls look fine, no SELinux issues. I don't know what else to check and I have no experience with GlusterFS. Can one of the other ODCS admins look into this issue?

Metadata Update from @mizdebsk:
- Assignee reset
- Issue tagged with: factory2, odcs

4 months ago

Nobody has been able to build a flatpak since last Thursday - any attention that can be put to this would be greatly appreciated.

Please try one now. I think I have it working?

Looks like @kalev got a Flatpak to build successfully: https://koji.fedoraproject.org/koji/taskinfo?taskID=34231444 - so ODCS should be working fine now. Thanks!

@kevin What was the fix? I'd like to know in case this happens again in future.

Oddly, while the /srv/odcs directory perms were 777 on odcs-backend01, they were 700 on odcs-frontend01. I did a 'chmod 777' on odcs-frontend01 and then backend started working.

I have no idea why gluster behaved this way. I think longer term we should look at moving this to a nfs volume...

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 months ago

Login to comment on this ticket.

Metadata