During an investigation it became apparent that DS clock handling was not thread safe. In an attempt to "speed up" clock access, the architecture used a time thread that would wake every second (or more) and update a global value in time.c. However, this value was not atomically updated, or atomically read. To update this to use atomics would likely cause a performance regression. Worse, these values were relied upon by csngen to generate csns for replication.
With hr log timestamps we showed that ns precision is virtually free (if not faster) on linux today. With this in mind, there is no reason not to enable HR etime by default now. As well, by using clock_gettime in calls to the clock, we take advantage of VDSO's in clock_gettime, that are going to be faster than atomics. Finally, we can use monotonic clock for timers, and utc for all other calls to remove performance and timezone issues.
Metadata Update from @firstyear: - Issue assigned to firstyear
Metadata Update from @firstyear: - Custom field type adjusted to defect - Issue priority set to: major - Issue set to the milestone: 1.3.7.0 - Issue tagged with: Complex
Additionally, since it's been 24 years since clock_gettime was released, I'm making it a hard requirement of DS from now.
Regarding replication, we have only one time thread that periodically (each second) calls time(¤ttime). If the same thread calls time(&t2) after time(&t1), is there a chance that t1>t2 ? If not then all DS components (including csn generation) that rely on the timethread are protected.
Now the others components that directly call time() are impacted and there are many of them. I thought that time(2) was thread safe, so it is not ?
time(2) is safe. the time thread is not. The write to the g_current_time was not atomic, and the read from current_time was also not atomic. As a result, it's possible for the update from the time thread to never be made visible to other threads (or more practically, it's delayed until the next barrier, and you have to hope it's a store one else our time thread write is blown away), and on the read side, we may never see the update to the time (or more practically, until by some accident we get a barrier elsewhere, assuming the write thread actually barriered as well). So as a result, this is not a safe interface.
Linux clock_gettime is implemented as a VDSO, so there is almost 0 latency to just calling clock_gettime, so instead my fix will wrap this.
Ohh and even better @tbordaz - most of our timeouts were only accurate to a second, so if you happen to start an op with a time out of 1 second, at the time point "0.9" second, and then the op runs to "1.000001" second, you'll be timed out - even though you only were in the op for 0.1 second, the "timeout" sees the second change and says "nope"
<img alt="0001-Ticket-49316-Fix-clock-unsafety-in-DS.patch" src="/389-ds-base/issue/raw/4e695b492aeab6dd59f673c986468bf62f6d33d8a5071541e05455c277289de2-0001-Ticket-49316-Fix-clock-unsafety-in-DS.patch" />
Metadata Update from @firstyear: - Custom field reviewstatus adjusted to review
This patch has been extensively tested with the full test suite twice now.
A few minor issues....
indentation issue:
498 diff --git a/ldap/servers/plugins/chainingdb/cb_search.c b/ldap/servers/plugins/chainingdb/cb_search.c 499 index dad2785..e2aa163 100644 500 --- a/ldap/servers/plugins/chainingdb/cb_search.c 501 +++ b/ldap/servers/plugins/chainingdb/cb_search.c 502 @@ -40,9 +40,8 @@ chainingdb_build_candidate_list ( Slapi_PBlock *pb ) 503 » const char *target = NULL; 504 » char *filter; 505 » char **attrs = NULL; 506 -» struct timeval timeout; 507 -» time_t optime; 508 -» time_t endbefore = 0; 509 +» struct timespec expire_time; 510 +» struct timeval timeout; 511 » time_t endtime = 0; 512 » char *matched_msg, *error_msg; 513 » char **referrals = NULL;
Logging issue:
1631 @@ -4422,17 +4460,45 @@ static int checkpoint_threadmain(void *param) 1632 if (!db || rc ) { 1633 continue; 1634 } 1635 - slapi_log_err(SLAPI_LOG_BACKLDBM, "checkpoint_threadmain", "Compacting DB start: %s\n", 1636 + slapi_log_err(SLAPI_LOG_NOTICE, "checkpoint_threadmain", "Compacting DB start: %s\n", 1637 inst->inst_name); 1638 + 1639 + /* 1640 + * It's possible for this to heap us after free because when we access db 1641 + * *just* as the server shut's down, we don't know it. So we should probably 1642 + * do something like wrapping access to the db var in a rwlock, and have "read" 1643 + * to access, and take writes to change the state. This would prevent the issue. 1644 + */ 1645 + DBTYPE type; 1646 + rc = db->get_type(db, &type); 1647 + if (rc) { 1648 + slapi_log_err(SLAPI_LOG_ERR, 1649 + "compactdb: failed to determine db type for %s: db error - %d %s\n", 1650 + inst->inst_name, rc, db_strerror(rc));
There is no subsystem specified for the logging call, it should be:
slapi_log_err(SLAPI_LOG_ERR, "checkpoint_threadmain", "compactdb: failed to determine db type for %s: db error - %d %s\n", inst->inst_name, rc, db_strerror(rc));
Indentation issue:
2677 + 2678 + /* 2679 + * If the duration set is larger than time_t max - current time, we probably have 2680 + * made it to the heat death of the universe. Congratulations on finding this bug. 2681 + */ 2682 + if ( errno == ERANGE || duration <= 0 || duration > (MAX_ALLOWED_TIME_IN_SECS_64 - slapi_current_utc_time()) ) { 2683 + slapi_create_errormsg(errorbuf, SLAPI_DSE_RETURNTEXT_SIZE, "password lockout duration \"%s\" is invalid. ", value); 2684 + retVal = LDAP_OPERATIONS_ERROR; 2685 + return retVal; 2686 }
The rest looks good, and no compiler warnings!! I'll ack this, but please fix the above issues.
Metadata Update from @mreynolds: - Custom field reviewstatus adjusted to ack (was: review)
I went through the patch 3 times to try and catch all these! Good spotting. I'll fix these any commit :)
Thanks so much,
I swear I don't set out just looking for these indentation issues, but they just really stand out to me and it makes me cringe :-) I'm just glad I caught the missing logging subsystem though. Since it's a macro there is no compiler warning/error, but I'm not trying to reignite that conversation ;-) Anyway, vacation time for me, talk to you next week!
Fixed all those issues, thanks again!
commit c70baf1d509f6dc144d839d71e874e3d59d72150 To ssh://git@pagure.io/389-ds-base.git 4940c5f..680a1a5 master -> master
Metadata Update from @firstyear: - Issue close_status updated to: fixed - Issue status updated to: Closed (was: Open)
@firstyear just for curiosity, time() and clock_gettime() are thread safe. What is the reason the time thread was not thread safe ? is it because it updated/returned a global variable ?. If this is the reason would a fix like this one being thread safe ?
diff --git a/ldap/servers/slapd/time.c b/ldap/servers/slapd/time.c index 16467af..8315dda 100644 --- a/ldap/servers/slapd/time.c +++ b/ldap/servers/slapd/time.c @@ -120,22 +120,13 @@ free_timestring(char *timestr) time_t poll_current_time() { - if ( !currenttime_set ) { - currenttime_set = 1; - } - - time( ¤ttime ); - return( currenttime ); + return 0; } time_t current_time( void ) { - if ( currenttime_set ) { - return( currenttime ); - } else { - return( time( (time_t *)0 )); - } + return( time( (time_t *)0 )); }
If you did not have current time set, this would be thread safe, but there is a caveat, and that is you and I the human's.
What is current_time? Thoughout the code I saw current time as:
People who used the api either had it change without knowing underthem, or they didn't know what they were "getting" and assumed. As well, we had a slapi_current_time wrapper that just wrapped the function in a (pointless) extra fn call.
So my solution was to explicitly name the replacement, and make it part of the slapi api so we just get exactly what we want. I'm hoping to avoid human error in the future.
When testing on top of current master I noticed that the event queue thread is consuming 100 % cpu.
Could you revisit your changes in the eventq code. There are loops calling functions ... and taking time, but you use curtime looked up before starting the loop, wheras the previous code was using current_time() inside the loop
I think there was an incorrect macro replacement applying the NOT only to the first and component, the following patch seems to fix it
<img alt="0001-fix-macro-replacement.patch" src="/389-ds-base/issue/raw/14ef4976dde4cf5724d8c886432aa800afb8ecff69f4fc3f365e769c4a8e290a-0001-fix-macro-replacement.patch" />
I think you're right. I'll apply this now (with my ack)
commit 1b95045 To ssh://git@pagure.io/389-ds-base.git b95f5f2..1b95045 master -> master
Metadata Update from @lkrispen: - Custom field component adjusted to None - Custom field origin adjusted to None - Custom field version adjusted to None
this change broke checkpointing, fix attached
<img alt="0001-make-checkpointing-work-again.patch" src="/389-ds-base/issue/raw/files/1451b783eb2ef87fa01a90475fe1941f29a1afe91195fe5c7a819f18cc708bb1-0001-make-checkpointing-work-again.patch" />
ack
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/2375
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: fixed)
Log in to comment on this ticket.