Ticket 1 - pre-normalize filter and pre-compile substring regex - and other optimizations
When processing large search filters which are applied to every entry in
the search result set, the filter is normalized anew each time a new entry
is tested. For substring filters, a regular expression must be created,
compiled, and freed each time the substring filter is tested, in addition
to normalizing the values. For example, if the search filter contains
1000 substring sub-filters, for each entry tested with the filter, this
will require 1000 filter normalizations followed by 1000 regex creation,
compilation, and cleanup. If there are 1000 entries in the search result
set, this will require a million such operations.
The solution is to "pre-compile" the search filter - perform all necessary
normalizations and compiling of the regular expressions used in the
filter once we know the search will go through.
struct subfilt and struct ava have "private" members which weren't being
used for anything. For subfilt, the private field is used to store the
pre-compiled regex to pass to the syntax filter code. For ava, the
private field is used to store the flags to specify if the type and/or
value is normalized.
Try to avoid normalization wherever possible. slapi_value has a v_flags
field which is used to specify if the value is normalized. Check this
before we attempt to normalize a value. If we are creating a new
slapi_value, set the normalized flag if the new value is already
normalized. Have to make sure that Slapi_Value structures are always
initialized correctly.
When examining the filter string, do not convert it to lower case first -
just use strcasestr - note that even though the string may be utf8,
strcasestr will still work, because we are searching for ascii characters.
Use PL_strcasestr because the system strcasestr causes valgrind to
print uninitialized memory errors.
Eliminate some uses of sprintf where a simple char assignment will suffice.
Reviewed by: nhosoi (Thanks!)