#955 RFE: better hub handling for memory errors
Closed: Dropped 2 years ago by tkopecek. Opened 5 years ago by mikem.

For a sufficiently large Koji database, there are some queries that will not fit into memory. Setting the appropriate rlimit setting in hub.conf, minimizes the impact of such poorly chosen queries. However, the response can be less than ideal. It might be a strange fault or even not a well-formed response.

Because the data can vary greatly, it is not sufficient to simply police query parameters. On a small system, querying all builds might be reasonable. On another, there might be cases where querying the builds for a single package name could be too large.


I can think of a couple things we might do.

  1. make sure we are using QueryProcessor.iterate wherever it makes sense
  2. investigate the possibility of sending our RPC responses as we form them rather than generating the whole thing in memory. This could be very tricky.
  3. monitor our own memory usage and raise our own (friendlier) error before we hit the rlimit.

Have a different idea? Please share

Metadata Update from @mikem:
- Issue priority set to: Low (was: Normal)
- Issue tagged with: discussion

5 years ago
  1. iterate is good start, but we will probably fail in next step - creating result and wrapping to xml.
  2. streaming response could solve it and it seem possible to do it with current code. On the other hand, if we will server these - it still would mean abusing resources. Now it is simply killed, but in case when we handle these, server can produce results for long time and memory problems will appear on client side.
  3. makes some sense, if we don't have to change API and/or streaming.

  4. maybe for some susceptible queries (listBuilds, etc..) add offset/limit/order options to allow paging? Also webui could benefit from this in some places.

1.. iterate is good start, but we will probably fail in next step - creating result and wrapping to xml.

The ExtendedMarshaller in koji is generator aware. If the iterator result is returned, then the full result will never be in memory. The xml response will be, but at least this reduces the memory hit.

Granted, I think we already do use iterate in most of the large queries.

2.. streaming response could solve it and it seem possible to do it with current code. On the other hand, if we will server these - it still would mean abusing resources. Now it is simply killed, but in case when we handle these, server can produce results for long time and memory problems will appear on client side.

Client can choose to stop reading, or just avoid doing something silly. Also, many clients will have much more free memory than we would reasonably set our rlimit to. I'm not sure how much we need to protect the clients from themselves. Does bash stop you from running foo=$(base64 /dev/zero)?

My major concern with streaming the response is that we'll be in the position of having started to return data before we have actually finished querying it. That means if something goes wrong, we will not be able to return a sane Fault.

3.. monitor our own memory usage

My concern with this one is how to implement it. Where do we perform this check? Maybe at some key points? Seems arbitrary and invasive.

If it weren't the hub, I'd say fire of a thread that monitors memory use, but I'm realllly hesitant to add threads in the hub code. Seems like a terrible bug waiting to happen.

4.. maybe for some susceptible queries (listBuilds, etc..) add offset/limit/order options

I think this is a good idea. I've wanted more queries to have better options along this line. Still, doesn't keep the client from asking for it all. Also, for some queries this is terribly inefficient. When we do chunked builds queries for example, it takes much, much longer to query all the builds than just a plain listBuilds call. This is because the db has to run and sort the same query N times instead of once.

Metadata Update from @tkopecek:
- Custom field Size adjusted to None
- Issue close_status updated to: Dropped
- Issue status updated to: Closed (was: Open)

2 years ago

Login to comment on this ticket.

Metadata