#2354 make -j4 distcheck fails
Closed: Fixed None Opened 9 years ago by nkondras.


What's interesting this is never a problem on Debian, although it uses different set of configure options:
http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_debian_testing/30/console

_comment0: What's interesting this never a problem on Debian, although it uses different set of configure options:
http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_debian_testing/30/console => 1401991519671459

I was not able to reproduce this problem on fedora 20, but I found interesting lines in log files on that machine.

Out of memory: Kill process 21701 (memcheck-amd64-) score 148 or sacrifice child
Killed process 21701 (memcheck-amd64) total-vm:268836kB, anon-rss:83404kB, file-rss 308kB

cc: => lslebodn@redhat.com

OOM can be a reason of this issues and issue in ticket #2350. Could you try to increase memory for VM?

Make would have probably noticed a process dying. Still, how much more memory would you like me to add?

You wrote in ticket description: Execution of "make -j4 distcheck" on RHEL7 or Fedora fails sometimes. I am not sure how to reproduce it. In log files, I saw something like: file was truncated.

I just want to reduce potential source of problems. I expect that there is some java process from jenkins and it can consume some memory. In my opinion, it worth to try increase memory. If it does not help we can continue with investigating of this problem. But OOM is not good message and it will better to get rid of it.

Sure. I'll increase the memory to 1GB then, as a start.

Failed on a Debian VM with 1GB of RAM. Freshly rebooted, went through a few builds without a problem, then failed, now working fine again. No OOM killer messages in the log.

CI output (with links to logs): http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_debian_testing/40/console

"make -j4 distcheck" output: http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_debian_testing/40/artifact/ci-build-debug/ci-make-distcheck.log

The make output will be attached in a moment.

In the log file debian-testing-ci-make-distcheck2.log.xz​,
the problem is with re-linking library libsss_ldap_common.la

libtool: install: warning: relinking `libsss_ldap_common.la'
libtool: install: (cd /var/lib/jenkins/workspace/new_private_master_debian_testing/ ...
libtool: relink: gcc -shared  -fPIC -DPIC  src/providers/ldap/.libs/libsss_ldap_common_la-ldap_id.o ...
/usr/bin/ld: cannot find -lsss_idmap
libtool: install: (cd /var/lib/jenkins/workspace/new_private_master_debian_testing/ci-build-debug/sssd-1.11.92/_inst/lib && { ln -s -f libipa_hbac.so.0.0.1 libipa_hbac.so || { rm -f libi
pa_hbac.so && ln -s libipa_hbac.so.0.0.1 libipa_hbac.so; }; })
collect2: error: ld returned 1 exit status

The re-linking of libraries is necessary in distcheck, because files are installed in different directory than it should be installed (/usr/local/{lib,lib64}) according to values from configure script.

Install-time re-linking error can happen if libraries are in wrong order in automake variable e.g. lib_LTLIBRARIES

  1993  # Plugin Libraries #
  1994  ####################
  1995  
  1996  # libsss_krb5_common must be installed before libsss_ldap_common
  1997  # because libtool tries to relink libsss_ldap_common when installing
  1998  # libsss_ldap_common and therefore make distcheck fails
  1999  pkglib_LTLIBRARIES += libsss_krb5_common.la
  2000  pkglib_LTLIBRARIES += libsss_ldap_common.la

Between different variables (say, lib_LTLIBRARIES and pkglib_LTLIBRARIES)
currently there is no guaranteed installation ordering at all. Between
different Makefiles of a package the traversal order given by the
SUBDIRS variables of all Makefiles need to walk libraries in dependency. http://gnu-automake.7480.n7.nabble.com/relinking-error-td6954.html

It looks like the only solution will be to run make distcheck without multiple jobs (-j1)

Aargh. That's a shame. Thank you for the research.

I'll consider making the distcheck single-process and moving it to extended CI
run, or removing it altogether.

I think we can close this as "cantfix".

make distcheck works fine on my machine with multiple jobs and it fails just sometimes in CI. There is a race condition and we cannot fix it. (I agree)
Fortunately, we have a workaround for issue in automake.

owner: somebody => lslebodn

What workaround do we have?

There won't be race condition with make distcheck without multiple jobs (-j1)

Ah, yes, but it's not a workaround for parallel (and faster) execution, unfortunately.

This is only possible solution(workaround). Feel free to send patch to automake :-)

Fields changed

resolution: => cantfix
status: new => closed

Metadata Update from @nkondras:
- Issue assigned to lslebodn
- Issue set to the milestone: NEEDS_TRIAGE

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/3396

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata