476af05 Ticket 48384 - Fix dblayer_is_cachesize_sane and dblayer_sys_pages for linux

Authored and Committed by William Brown 6 years ago
    Ticket 48384 - Fix dblayer_is_cachesize_sane and dblayer_sys_pages for linux
    
    Bug Description:
    At this point in time, the current algorithm to determine if
    the cachesizing is sane is based on:
    
    issane = (int)((*cachesize / pagesize) <= (pages - procpages));
    
    However, the values of pages and procpages are suspect:
    
    Consider:
    dblayer_is_cachesize_sane pages=3050679 / procpages=268505910
    
    This isn't a type: procpages often exceeds pages. This is because procpages is
    derived from /proc/pid/status, vmsize, which is the maximum amount of ram a
    process *could* allocate.
    
    Additionally, the value of pages may exceed out vmsize, so we may be over
    eagerly allocating memory, that we don't actually have access to. Vmsize also
    includes swap space, so we might be trying to alloc memory into swap too.
    
    dblayer_is_cachesize_sane also only takes into account pages (total on system)
    and the current process' allocation: This makes no regard to:
    
    * How much ram is *actually* free on the system with respect to other processes
    * The value of getrlimit via availpages.
    
    The first condition is especially bad, because our system may be approaching an
    OOM condition, and we blazenly allocate a swathe of pages which triggers this.
    
    Fix Description:
    First, this fix corrects procpages to be based on vmrss, which
    is the actual working set size of the process, rather than the maximum possible
    allocation in vmsize.
    
    The value of pages is taken from the smaller of:
    
    * vmsize
    * systeminfo total ram (excluding swap)
    
    The value of availpages is derived from the smallest of:
    
    * pages
    * getrlimit
    * freepages (with consideration of all processes on system)
    
    The check for issane now checks that the cachepage request is smaller than
    availpages, which guarantees:
    
    * Our system actually has the ram free to accomodate them without swapping or
      triggering an OOM condition.
    * We respect rlimits in the allocation.
    
    Next, this moves the cachesize_is_sane and sys_pages utilities to util.c. This
    way we can begin to reference these checks in other areas of the code.
    
    We also change the way that we calculate free and total memory. Linux as it seems
    does not offer a complete API for sysinfo, so the only way to really get these
    is to read /proc/meminfo.
    
    This fixes the live calls from cn=config to only checkmemory allocation based on
    the difference, due to the fact the current allocation is already passed and may
    be consuming ram.
    
    https://fedorahosted.org/389/ticket/48384
    
    Author: wibrown
    
    Review by: nhosoi, tbordaz, lkrispen (Thanks!)
    
        
file modified
+365 -0