#2119 kojira saying it's reached max tasks when it hasn't
Closed: Fixed 3 years ago by tkopecek. Opened 4 years ago by kevin.

We are seeing kojira say:

Apr 01 17:15:20 koji02.phx2.fedoraproject.org kojira[14646]: 2020-04-01 17:15:20,631 [INFO] {14646} koji.repo.regen:750 Maximum
number of repo tasks reached

➜ ~ koji list-tasks --user kojira
(no tasks)

/etc/kojira/kojira.conf has:
max_repo_tasks=15
repo_tasks_limit=15

This seems to have started happening after there was a script issue and there were a lot of side tags (~90). Those have since mostly been deleted.

I thought this might be newrepos getting swamped by deletiions, so I tried setting
delete_batch_size=10
but that still didn't seem to help.

I also added debug=true, but it's not giving me much more info.
Might have debug cause it to print out the newrepo tasks it's seeing?
Or is the message misleading and thats newrepo + delete + prune?


This should be only number of really running newRepo tasks owned by kojira. If there was a queue of those sidetags, than it can be, that these tasks failed immediately, so you don't see them. On the other hand, log should contain tons of "Created newRepo task %s for tag %s" messages meanwhile.

Metadata Update from @tkopecek:
- Custom field Size adjusted to None

4 years ago

So, I see with debug:

Apr 03 18:38:42 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:38:42,950 [INFO] {3631} koji.repo.regen:792 Created newRepo task 43007601 for tag 20775 (f33-build-side-20775), expired for 83063.1 sec
Apr 03 18:38:42 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:38:42,973 [WARNING] {3631} koji:768 Needed tag refers to unknown task. f33-build-side-20853 -> 43007016
Apr 03 18:38:42 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:38:42,994 [INFO] {3631} koji.repo.regen:792 Created newRepo task 43007602 for tag 20853 (f33-build-side-20853), expired for 83065.5 sec
Apr 03 18:38:43 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:38:43,018 [WARNING] {3631} koji:768 Needed tag refers to unknown task. f33-build-side-20811 -> 43006681
Apr 03 18:38:43 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:38:43,045 [INFO] {3631} koji.repo.regen:792 Created newRepo task 43007603 for tag 20811 (f33-build-side-20811), expired for 83066.9 sec
Apr 03 18:38:43 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:38:43,071 [INFO] {3631} koji.repo.regen:750 Maximum number of repo tasks reached
Apr 03 18:38:48 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:38:48,361 [DEBUG] {3631} koji.repo.regen:342 Repo data: [{'dist': False, 'create_event': 29957736, 'state': 1, 'tag_id': 3306, 'tag_name': 'module-9721006f9645a474-build', 'create_ts': 1515789179.84302, 'id': 838428}, {'dist': False, 'create_event': 52460305, 'state': 2, 'tag_id': 19388, 'tag_name': 'f32-build-side-19388', 'create_ts': 1582879142.63321, 'id': 1387037}, <tons and tons of pages of repos>
Apr 03 18:40:13 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:40:13,146 [INFO] {3631} koji.repo.regen:750 Maximum number of repo tasks reached
Apr 03 18:40:20 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:40:20,904 [DEBUG] {3631} koji.repo.regen:391 Skipped  check for repos: set([1310756, ...very long list of repos... pages and pages... 
Apr 03 18:40:28 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:40:28,161 [INFO] {3631} koji.repo.regen:750 Maximum number of repo tasks reached
Apr 03 18:40:39 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:40:39,292 [DEBUG] {3631} koji.repo.regen:391 Skipped check for repos: set([1310756, very long list of repos... pages of them
...
Apr 03 18:51:08 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:51:08,607 [DEBUG] {3631} koji.repo:196 Repo 1391357 (
/mnt/koji/repos/f33-build-side-19659/1391357) age: 84180 sec
Apr 03 18:51:08 koji02.phx2.fedoraproject.org kojira[3631]: 2020-04-03 18:51:08,687 [DEBUG] {3631} koji.repo:196 Repo 1391358 (
/mnt/koji/repos/f33-build-side-19695/1391358) age: 84179 sec
...

➜ ~ koji list-tags | grep side | wc -l
81
➜ ~ koji list-targets | grep side | wc -l
39
➜ ~ koji list-tasks --user kojira
(no tasks)

Those repos it's saying are old... are really old... like a month old... f33-build-side-19659 for example is from March 2nd.

If it could help I could get you access to our instance, or you can grab our db and look: https://infrastructure.fedoraproject.org/infra/db-dumps/koji.dump.xz or I am happy to provide more info.

I added some debugging...

Apr 05 18:19:44 koji02.phx2.fedoraproject.org kojira[15006]: 2020-04-05 18:19:44,158 [DEBUG] {15006} koji.repo.regen:741 runnin
g_taasks += {'maven': False, 'id': 43044140, 'tag_id': 21043, 'taskinfo': {'weight': 1.0, 'parent': None, 'channel_id': 2, 'sta
rt_time': None, 'start_ts': None, 'state': 0, 'awaited': None, 'label': None, 'priority': 15, 'completion_time': None, 'waiting
': None, 'create_time': '2020-04-05 18:09:13.208623', 'id': 43044140, 'create_ts': 1586110153.20862, 'owner': 129, 'host_id': N
one, 'completion_ts': None, 'arch': 'noarch', 'method': 'newRepo'}}
Apr 05 18:19:44 koji02.phx2.fedoraproject.org kojira[15006]: 2020-04-05 18:19:44,158 [DEBUG] {15006} koji.repo.regen:741 runnin
g_taasks += {'maven': False, 'id': 43044141, 'tag_id': 21047, 'taskinfo': {'weight': 1.0, 'parent': None, 'channel_id': 2, 'sta
rt_time': None, 'start_ts': None, 'state': 0, 'awaited': None, 'label': None, 'priority': 15, 'completion_time': None, 'waiting
': None, 'create_time': '2020-04-05 18:09:13.257146', 'id': 43044141, 'create_ts': 1586110153.25715, 'owner': 129, 'host_id': N
one, 'completion_ts': None, 'arch': 'noarch', 'method': 'newRepo'}}
Apr 05 18:19:44 koji02.phx2.fedoraproject.org kojira[15006]: 2020-04-05 18:19:44,158 [INFO] {15006} koji.repo.regen:751 Maximum
 number of repo tasks reached

but if you look at task 43044141, https://koji.fedoraproject.org/koji/taskinfo?taskID=43044141 it finished 4minutes before it said it was running and counting against the maximum?

shouldn't that no longer count?

So somehow it's not realizing side tag newrepos are finishing ?

The first problem - old repo not being deleted yet. Age is determined from:

  • newest time of repo event creation
  • mtime of repo dir
  • first_seen - this could be a problem here - if kojira is restarted, it must run at least for deleted_repo_lifetime to hit this - we probably should redesign it
  • expire_ts - when kojira noticed, that repo must be regenerated

Untracked tasks - these are logged "Untracked newRepo task: xyz" and they are not counted into max_repo_tasks (only to repo_tasks_limit).

I've added few debug lines in #2137 which you can use to see, what is hanging in the tables.

4 minutes delay is okish, if it will be noticed then. There is a few competing threads, queries to hub can be also waiting a bit - I think that ~5 minutes could be a limit, where it starts to be weird.

ok. I added the debug...

yes, I think kojira depending on how long it has been running is really confusing and unexepected in this day and age. You may restart servers for any number of reasons, or if you are running in containers you may scale up and down all the time and wouldn't expect to need some long running pod for kojira. :(

So, I see:

Apr 06 15:37:28 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:37:28,309 [INFO] {20599} koji.repo.regen:796 Created
 newRepo task 43063299 for tag 21009 (f33-build-side-21009), expired for 425.5 sec
Apr 06 15:38:14 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:38:14,072 [DEBUG] {20599} koji.repo.regen:751 Tracke
d tasks: [43063299, 43063300, 43063366, 43063367, 43063368, 43063313, 43063314, 43063315, 43063316, 43063317, 43063350, 4306335
1, 43063352, 43063348, 43063349]
Apr 06 15:38:29 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:38:29,088 [DEBUG] {20599} koji.repo.regen:751 Tracke
d tasks: [43063299, 43063300, 43063366, 43063367, 43063368, 43063313, 43063314, 43063315, 43063316, 43063317, 43063350, 4306335
1, 43063352, 43063348, 43063349]
Apr 06 15:38:44 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:38:44,102 [DEBUG] {20599} koji.repo.regen:751 Tracke
d tasks: [43063299, 43063300, 43063366, 43063367, 43063368, 43063313, 43063314, 43063315, 43063316, 43063317, 43063350, 4306335
1, 43063352, 43063348, 43063349]
Apr 06 15:38:59 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:38:59,114 [DEBUG] {20599} koji.repo.regen:751 Tracke
d tasks: [43063299, 43063300, 43063366, 43063367, 43063368, 43063313, 43063314, 43063315, 43063316, 43063317, 43063350, 4306335
1, 43063352, 43063348, 43063349]
Apr 06 15:39:14 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:39:14,131 [DEBUG] {20599} koji.repo.regen:751 Tracke
d tasks: [43063299, 43063300, 43063366, 43063367, 43063368, 43063313, 43063314, 43063315, 43063316, 43063317, 43063350, 4306335
1, 43063352, 43063348, 43063349]
Apr 06 15:39:29 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:39:29,146 [DEBUG] {20599} koji.repo.regen:751 Tracke
d tasks: [43063299, 43063300, 43063366, 43063367, 43063368, 43063313, 43063314, 43063315, 43063316, 43063317, 43063350, 4306335
1, 43063352, 43063348, 43063349]
Apr 06 15:39:44 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:39:44,163 [DEBUG] {20599} koji.repo.regen:751 Tracke
d tasks: [43063299, 43063300, 43063366, 43063367, 43063368, 43063313, 43063314, 43063315, 43063316, 43063317, 43063350, 4306335
1, 43063352, 43063348, 43063349]
...

It did fire off 15 newrepos. They finished... but the "Tracked tasks" isn't changing any. Shouldn't it update those on each print?

Or when does it update? Could it be because the newRepos for side tags are so short (3-4min) it's not updating quickly enough?

Apr 06 15:37:28 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:37:28,309 [INFO] {20599} koji.repo.regen:796 Created
 newRepo task 43063299 for tag 21009 (f33-build-side-21009), expired for 425.5 sec
...
Apr 06 15:55:55 koji02.phx2.fedoraproject.org kojira[20599]: 2020-04-06 15:55:55,395 [INFO] {20599} koji.repo.regen:624 Finishe
d: newRepo task 43063299 for tag 21009

So about ~18min to note a newrepo task that finished in 4 min: https://koji.fedoraproject.org/koji/taskinfo?taskID=43063299
also, I see in that task:

"Parameters
Parameters are not correct for this method.
['f33-build-side-21009'] " ? could that be related?

Tasks are being rechecked in main loop. Probably only place, where it can wait for so long is deletion of old repos. I thinks, that checkTasks could run also in the beginning of regenRepos, so it would benefit immediately from freed slots. (I wonder, why we don't hit that lag internally).

Added #2140

It is a problem, which I've introduced with #1488 As tag is already deleted, it can't fetch the data for it. Previously there was 500 page, now there is this message with correct args printed below. So - not, this is not the problem.

I've applied 2140 and with it and delete_batch_size=1 things seem to be definitely better.

Do you still have problems with larger batch size? Now it should be more independent.

It seems much better now. I'm not sure it's fully ideal yet tho, as it's still 45min or so for some side tags... I know its going to be slow with lots of side tags, but we can try and optimize it as much as possible anyhow.

Metadata Update from @tkopecek:
- Issue set to the milestone: 1.22

3 years ago

So, I think the issue this ticket was filed for is fixed. kojira doesn't say it's hit max all the time and not be processing anything.

However, it's still not functioning well. I can continue here, or we could open a new ticket, whatever you prefer.

I see 1.21.0 has ability to log out to a logfile. That might help debug this once we are on that version.

Is there a chance we could write up a doc on how it scores/decides the order of things to do, and in general how it's supposed to work?

I can continue here, or we could open a new ticket

Let's keep issues on their topic and not let their scope creep please.

Perfectly reasonable. I'll file a new ticket when/if I have gathered enough info... yesterday I looked and it was fine, but we will see.

Thanks!

Commit bc378d1 relates to this ticket

Metadata Update from @jcupova:
- Issue tagged with: no_qe

3 years ago

Commit d44af4c relates to this ticket

Login to comment on this ticket.

Metadata
Related Pull Requests
  • #2140 Merged 3 years ago
  • #2137 Merged 3 years ago