#1393 large srpms (texlive) is blocking our dist-git import queue
Closed: Fixed 3 years ago by praiskup. Opened 3 years ago by praiskup.


No problem. I canceled a similar build earlier. I wasn't sure if it's a temporary outage or a quota issue. The source rpm is 2.5G due to many ctan packages being included. I don't need all of them but I didn't figure out how to selectively build those i need. They are all bundled in a single srpm and have dependency against each other. So I just threw the srpm link from fedora source repo onto copr.

Before I built it I noticed some other people had success in doing so:

https://copr.fedorainfracloud.org/coprs/ghjm/texlive/package/texlive/
https://copr.fedorainfracloud.org/coprs/rajeeshknambiar/texlive/package/texlive/

So why did my build fail? Is it because the newer version of texlive grows larger? Or the copr infrastructure has been modified? Those 2 successful builds have 1.7G and 2.2G srpms, respectively, not much smaller than the 2.5G srpm here.

So why did my build fail? Is it because the newer version of texlive grows larger? Or the copr infrastructure has been modified?

We didn't have time to do the problem analysis, yet. There are two problems,
one is that the process "deadlocks" somewhere, and the other is that - while
it should - the logic doesn't start other concurrent workers.

Well, one thing I diagnosed yesterday. I applied this patch:

--- /usr/share/copr/dist_git/helpers.py 2019-12-04 01:06:09.000000000 +0000
+++ /usr/share/copr/dist_git/helpers.py 2020-06-09 19:36:05.917170688 +0000
@@ -155,6 +155,7 @@

     if 200 <= r.status_code < 400:
         try:
+            log.info("reading the downloaded package")
             filename = os.path.basename(url)
             filepath = os.path.join(destination, filename)
             with open(filepath, 'wb') as f:

And since the code passed that log entry for texlive, the code seems to stay in this loop:

            with open(filepath, 'wb') as f:
                for chunk in r.iter_content(1024):
                    f.write(chunk)

One CPU core is o ~100%. Maybe it just takes too long to process it this way. This doesn't seem
to be correct way of handling files anyway.

I don't know what r is exactly, but reading and writing 1024 bytes a time is often a performance penalty. For files whose size is in GBs, I would try several MBs a time. You can probably debug with the same texlive srpm when you have time to do so.

One import got stuck on:

Found remote branch origin/master
No local branch found, creating a new one
Popen(['git', 'checkout', '-b', 'epel8', '--track', 'origin/epel8'], cwd=/tmp/tmpkhigsvmw, universal_newlines=False, shell=None, istream=None)
Branch 'epel8' set up to track remote branch 'epel8' from 'origin'.
Popen(['git', 'diff', '--cached', '--abbrev=40', '--full-index', '--raw'], cwd=/tmp/tmpkhigsvmw, universal_newlines=False, shell=None, istream=None)
Popen(['git', 'diff', '--abbrev=40', '--full-index', '--raw'], cwd=/tmp/tmpkhigsvmw, universal_newlines=False, shell=None, istream=None)
Popen(['git', 'ls-files'], cwd=/tmp/tmpkhigsvmw, universal_newlines=False, shell=None, istream=None)

so classic Popen && PIPE hang?

On dev instance, with enabled debug output, the importing works fine and it
doesn't block concurrent workers:

[07:57:48][DEBUG][git.cmd][cmd:722] Popen(['git', 'status', '--porcelain', '--untracked-files'], cwd=/tmp/tmp59ga59d9, universal_newlines=False, shell=None, istream=None)
[07:57:48][DEBUG][git.cmd][cmd:722] Popen(['git', 'status', '--porcelain', '--untracked-files'], cwd=/tmp/tmp59ga59d9, universal_newlines=False, shell=None, istream=None)
[07:57:49][DEBUG][dist_git.importer][importer:32] Get task data...
[07:57:49][DEBUG][dist_git.importer][importer:41] No new tasks to process.
[07:57:49][DEBUG][git.cmd][cmd:722] Popen(['git', 'status', '--porcelain', '--untracked-files'], cwd=/tmp/tmp59ga59d9, universal_newlines=False, shell=None, istream=None)
[07:57:50][DEBUG][git.cmd][cmd:722] Popen(['git', 'status', '--porcelain', '--untracked-files'], cwd=/tmp/tmp59ga59d9, universal_newlines=False, shell=None, istream=None)

The only problem is that each tarball takes ~1 second to process, and there's
6769 tarballs :-( perhaps that's too much?

Metadata Update from @praiskup:
- Issue assigned to frostyx

3 years ago

Metadata Update from @praiskup:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata