#49316 clock thread safety: was hr etime by default

Created 2 months ago by firstyear
Modified 2 months ago

Issue Description

During an investigation it became apparent that DS clock handling was not thread safe. In an attempt to "speed up" clock access, the architecture used a time thread that would wake every second (or more) and update a global value in time.c. However, this value was not atomically updated, or atomically read. To update this to use atomics would likely cause a performance regression. Worse, these values were relied upon by csngen to generate csns for replication.

With hr log timestamps we showed that ns precision is virtually free (if not faster) on linux today. With this in mind, there is no reason not to enable HR etime by default now. As well, by using clock_gettime in calls to the clock, we take advantage of VDSO's in clock_gettime, that are going to be faster than atomics. Finally, we can use monotonic clock for timers, and utc for all other calls to remove performance and timezone issues.

2 months ago

Metadata Update from @firstyear:
- Issue assigned to firstyear

2 months ago

Metadata Update from @firstyear:
- Custom field type adjusted to defect
- Issue priority set to: major
- Issue set to the milestone:
- Issue tagged with: Complex

Additionally, since it's been 24 years since clock_gettime was released, I'm making it a hard requirement of DS from now.

Regarding replication, we have only one time thread that periodically (each second) calls time(&currenttime). If the same thread calls time(&t2) after time(&t1), is there a chance that t1>t2 ?
If not then all DS components (including csn generation) that rely on the timethread are protected.

Now the others components that directly call time() are impacted and there are many of them.
I thought that time(2) was thread safe, so it is not ?

time(2) is safe. the time thread is not. The write to the g_current_time was not atomic, and the read from current_time was also not atomic. As a result, it's possible for the update from the time thread to never be made visible to other threads (or more practically, it's delayed until the next barrier, and you have to hope it's a store one else our time thread write is blown away), and on the read side, we may never see the update to the time (or more practically, until by some accident we get a barrier elsewhere, assuming the write thread actually barriered as well). So as a result, this is not a safe interface.

Linux clock_gettime is implemented as a VDSO, so there is almost 0 latency to just calling clock_gettime, so instead my fix will wrap this.

Ohh and even better @tbordaz - most of our timeouts were only accurate to a second, so if you happen to start an op with a time out of 1 second, at the time point "0.9" second, and then the op runs to "1.000001" second, you'll be timed out - even though you only were in the op for 0.1 second, the "timeout" sees the second change and says "nope"

2 months ago

Metadata Update from @firstyear:
- Custom field reviewstatus adjusted to review

This patch has been extensively tested with the full test suite twice now.

A few minor issues....

indentation issue:

 498 diff --git a/ldap/servers/plugins/chainingdb/cb_search.c b/ldap/servers/plugins/chainingdb/cb_search.c
 499 index dad2785..e2aa163 100644
 500 --- a/ldap/servers/plugins/chainingdb/cb_search.c
 501 +++ b/ldap/servers/plugins/chainingdb/cb_search.c
 502 @@ -40,9 +40,8 @@ chainingdb_build_candidate_list ( Slapi_PBlock *pb )
 503  »       const char          *target = NULL;
 504  »       char                *filter;
 505  »       char                **attrs = NULL;
 506 -»       struct timeval      timeout;
 507 -»       time_t              optime;
 508 -»       time_t              endbefore = 0;
 509 +»       struct timespec     expire_time;
 510 +»       struct timeval     timeout;
 511  »       time_t              endtime = 0;
 512  »       char                *matched_msg, *error_msg;
 513  »       char                **referrals = NULL;

Logging issue:

1631 @@ -4422,17 +4460,45 @@ static int checkpoint_threadmain(void *param)
1632                  if (!db || rc ) {
1633                      continue;
1634                  }
1635 -                slapi_log_err(SLAPI_LOG_BACKLDBM, "checkpoint_threadmain", "Compacting DB start: %s\n",
1636 +                slapi_log_err(SLAPI_LOG_NOTICE, "checkpoint_threadmain", "Compacting DB start: %s\n",
1637                                inst->inst_name);
1638 +
1639 +                /*
1640 +                 * It's possible for this to heap us after free because when we access db
1641 +                 * *just* as the server shut's down, we don't know it. So we should probably
1642 +                 * do something like wrapping access to the db var in a rwlock, and have "read"
1643 +                 * to access, and take writes to change the state. This would prevent the issue.
1644 +                 */
1645 +                DBTYPE type;
1646 +                rc = db->get_type(db, &type);
1647 +                if (rc) {
1648 +                    slapi_log_err(SLAPI_LOG_ERR,
1649 +                              "compactdb: failed to determine db type for %s: db error - %d %s\n",
1650 +                              inst->inst_name, rc, db_strerror(rc));

There is no subsystem specified for the logging call, it should be:

slapi_log_err(SLAPI_LOG_ERR, "checkpoint_threadmain",
                            "compactdb: failed to determine db type for %s: db error - %d %s\n",
                             inst->inst_name, rc, db_strerror(rc));

Indentation issue:

2677 +
2678 +  /*
2679 +   * If the duration set is larger than time_t max - current time, we probably have
2680 +   * made it to the heat death of the universe. Congratulations on finding this bug.
2681 +   */
2682 +  if ( errno == ERANGE || duration <= 0 || duration > (MAX_ALLOWED_TIME_IN_SECS_64 - slapi_current_utc_time()) ) {
2683 +    slapi_create_errormsg(errorbuf, SLAPI_DSE_RETURNTEXT_SIZE, "password lockout duration \"%s\" is invalid. ", value);
2684 +    retVal = LDAP_OPERATIONS_ERROR;
2685 +    return retVal;
2686    }

The rest looks good, and no compiler warnings!! I'll ack this, but please fix the above issues.

2 months ago

Metadata Update from @mreynolds:
- Custom field reviewstatus adjusted to ack (was: review)

I went through the patch 3 times to try and catch all these! Good spotting. I'll fix these any commit :)

Thanks so much,

I went through the patch 3 times to try and catch all these! Good spotting. I'll fix these any commit :)

I swear I don't set out just looking for these indentation issues, but they just really stand out to me and it makes me cringe :-) I'm just glad I caught the missing logging subsystem though. Since it's a macro there is no compiler warning/error, but I'm not trying to reignite that conversation ;-) Anyway, vacation time for me, talk to you next week!

Fixed all those issues, thanks again!

commit c70baf1d509f6dc144d839d71e874e3d59d72150
To ssh://git @pagure.io/389-ds-base.git
4940c5f..680a1a5 master -> master

2 months ago

Metadata Update from @firstyear:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

@firstyear just for curiosity, time() and clock_gettime() are thread safe. What is the reason the time thread was not thread safe ? is it because it updated/returned a global variable ?. If this is the reason would a fix like this one being thread safe ?

diff --git a/ldap/servers/slapd/time.c b/ldap/servers/slapd/time.c
index 16467af..8315dda 100644
--- a/ldap/servers/slapd/time.c
+++ b/ldap/servers/slapd/time.c
@@ -120,22 +120,13 @@ free_timestring(char *timestr)
-    if ( !currenttime_set ) {
-       currenttime_set = 1;
-    }
-    time( &currenttime );
-    return( currenttime );
+    return 0;

 current_time( void )
 -    if ( currenttime_set ) {
 -        return( currenttime );
 -    } else {
 -        return( time( (time_t *)0 ));
 -    }
+    return( time( (time_t *)0 ));

If you did not have current time set, this would be thread safe, but there is a caveat, and that is you and I the human's.

What is current_time? Thoughout the code I saw current time as:

  • utc
  • localtime
  • relative system time

People who used the api either had it change without knowing underthem, or they didn't know what they were "getting" and assumed. As well, we had a slapi_current_time wrapper that just wrapped the function in a (pointless) extra fn call.

So my solution was to explicitly name the replacement, and make it part of the slapi api so we just get exactly what we want. I'm hoping to avoid human error in the future.

When testing on top of current master I noticed that the event queue thread is consuming 100 % cpu.

Could you revisit your changes in the eventq code. There are loops calling functions ... and taking time, but you use curtime looked up before starting the loop, wheras the previous code was using current_time() inside the loop

I think there was an incorrect macro replacement applying the NOT only to the first and component, the following patch seems to fix it


I think you're right. I'll apply this now (with my ack)

commit 1b95045
To ssh://git @pagure.io/389-ds-base.git
b95f5f2..1b95045 master -> master

2 months ago

Metadata Update from @lkrispen:
- Custom field component adjusted to None
- Custom field origin adjusted to None
- Custom field version adjusted to None

this change broke checkpointing, fix attached

Login to comment on this ticket.