#50652 Allow installation of instances with completely different schema
Closed: wontfix 4 days ago by spichugi. Opened 11 months ago by abbra.

For Global Catalog support in FreeIPA we need to create an instance that uses LDAP schema from Active Directory. AD schema is clashing with several attributes and classes defined in a normal 389-ds schema. The clash is on multiple levels:
- some attributes use different syntax, sort and comparison rules
- some attributes use different OIDs while retaining the same names
- some attributes use different names while utilizing the same OIDs

Ideally, we would like to have an instance where the schema is fully provided as a part of FreeIPA Global Catalog work. Such schema would include minimal core required for 389-ds to work itself (@mreynolds promised to find out what constitutes this minimal set).

From my investigation of 389-ds-base code, it looks like if we could add a config option that removes hardcoded use of SYSTEMSCHEMADIR in init_schema_dse_ext(), then we can re-use existing per-instance schema directory.

This would need to be complemented by another flag that would disable copying schema files in lib389.instance.setup.SetupDS._install_ds:

        _ds_shutil_copytree(os.path.join(slapd['sysconf_dir'], 'dirsrv/schema'), slapd['schema_dir'])

With these two we would be able to have a completely independent schema for this instance.

Metadata Update from @mreynolds:
- Issue assigned to mreynolds

11 months ago

Thanks for the initial code analysis @abbra !

Metadata Update from @mreynolds:
- Custom field origin adjusted to None
- Custom field reviewstatus adjusted to None
- Issue set to the milestone: 1.4.2

11 months ago

Metadata Update from @mreynolds:
- Custom field rhbz adjusted to https://bugzilla.redhat.com/show_bug.cgi?id=1762465

11 months ago

There is quite a bit of core schema for cn=config that we need though so we can't just ignore what's in the system schema dir. It also means importantly that if you ignore the system schema dir you will miss updates to these core schemas in your instances when package updates occur. This matters for containers and instances where we don't have rpm post install trigger schema updates etc.

I think you really need to think about a scriptless/stateless upgrade scheme here and how you will account for this so that system core schemas continue to be updated in 389, without creating extra complexity or tooling that is needed.

It's likely your "flag" to ignore system schema dir is incorrect, and actually should be a flag that says to only include core schemas from the system dir instead.

I also think you need to think about how your "gc" will actually manage it's own schema upgrades too in a clean and sustainable manner as that will probably shape the implementation of how you implement this kind of feature here.

Can you please submit a more complete design document to the 389 wiki in this case detailing this?

As a first example of a better solution that is much more sustainable, this change should actually split schema to:

  • required
  • 389-supplementary
  • gc-ad

Then we have a config that is a schema-to-use flag, defaulting to 389-sup in libglobs.c

389 then at startup always reads the "required" directory. Based on the schema-to-use flag we then process either the 389-sup directory, gc-ad, or neither.

This gives us:

  • a suitable forward migration path that "just works" for all current distributions
  • will not require admin intervention at any point also allowing seamless upgrades and downgrades as all schema is rpm/deb/other managed
  • Does not required signifigant changes to the current schema dir processing code.
  • Still allows us a core schema that is required
  • Will require that freeipa submits their gc schema to us to allow us to "test" it in various configurations through our lib389 tooling so that we can identify issues in matching and syntax rules that you likely plan/require to add.

This is how I would rather see this implemented.

@firstyear I was going to propose the same after reading your previous comment. Yes, it would be enough to have ability to switch off the supplementary part which is where most of conflicts are.

There are conflicts at required part too but I'm going to workaround them in the schema translator. We can't go too far without that as things like 'cn' are marked as single valued in AD schema but for global catalog support in FreeIPA we can certainly remove this requirement. Or handle 'top' class required attributes differently via a supplemental class as global catalog is read-only and there will be only a well-defined way to translate from IPA content to GC content.

As for the schema, currently we produce schema at build time based on official AD schema files from Microsoft by using a converter script that doesn't depend on FreeIPA itself. Once the script is stable, we can provide it to 389-ds and you can have the same schema files included as well.

Note that in either case we are talking about the directory list modification in init_schema_dse_ext(), the amount of modifications is not that big there for both approaches.

I think it is a bit more involved in finding out what are the required schema elements for 389-ds itself.

I would expect the "required" parts shouldn't be tooooo bad because it's mostly in cn=config, and so long as top/cn work there, then we are probably okay.

We also don't have the same schema as AD remember - there is no concept of marking cn as single value unique, you'd do that with attr uniqueness at the backend level, so this should mean that those changes won't have a big impact on cn=config anyway.

Worth keeping in mind you will not be able to use replication then to get IPA -> AD/GC content here because schema replication would kick in and cause issues/conflicts. You'll need some other mechanism. Also worth considering you won't be able to replicate the GC, you would rely on IPA repl and then each GC would have to be updated from it's related IPA db instead to ensure some level of consistency (IE you don't want the GC to have a conflict that's not in IPA and causes divergence).

Replication-wise, that was always a plan to not create another topology and instead use local means to feed GC instance off the primary instance on the same host. Since GC is read-only, this makes us possible to control all the aspects of transformation and schema -- we really need to only consider external clients accessing GC for read and their behavior with regards to expected LDAP controls etc but not what they have for write since nobody writes to GC by definition other than the 'Active Directory' itself.

'top' difference in AD is that it has nTSecurityIdentifier as a mandatory attribute (and few more unique ones). We can handle this via a supplemental class that is always added by the transformation code that feeds the data into GC.

Right now I'm dealing with the fact that as it is the whole AD schema is not loadable into 389-ds, so until this ticket is fixed, I need to find a subset of AD schema to translate successfully and start creating transformation routines to actually test the sync part.

That's fine, that won't affect us then using top if you add a nttop class for all your extra nt attrs.

We need to ensure that the solution not only works for read only replicas.

The idea to split the schema into required and supplemental sounds good, but right now in a replicated topology after any change all of it would get mixed up in 99user.ldif after "schema learning", so we should also work on this. There have been different suggestions to improve schema handling starting with #496 and #49069, #49418, #49420, and soem other scheam related tickets

I also like to idea to split schema with core/required and supplemental. Now I wonder if some of the issues can not be address with 99user.ldif

99user.ldif can overwrites existing standard (/share/dirsrv/schema) definition.
I think it should address the problem of new definition having different syntax/matching_rules.
For definitions with same oid but different name, it could be included in the the list of alias NAME.

I think it remains the problem of different name with the same OID and attributes being single valued while it exists data (config or DB) with multiple values.

Here is the core schema the server needs, and the conflicts....

Core Schema Files

  • 00core.ldif --> Conflict
  • 02common.ldif --> Conflict
  • 10mep-plugin.ldif
  • 01core389.ldif
  • 30ns-common.ldif
  • 60pam-plugin.ldif
  • 60posix-winsync-plugin.ldif
  • 10automember-plugin.ldif
  • 10dna-plugin.ldif
  • 60acctpolicy.ldif

Attribute Conflicts

  • 00core.ldif
    • streetAddress
    • name

OID Conflicts

  • 02common.ldif
    • nsLicensedFor
    • changeLog
    • ref

For the attribute conflicts we can move them to a new file and put them in supplemental category as "streetAddress" and "name" are not used by the core server, but the OID conflicts are more of an issue. The OIDs we use are used by other LDAP vendors and are documented on multiple "non-redhat/389" sites. We could assign new OIDs for these attributes, but it could potentially break clients that for some reason look at the schema OIDs. "ref" is the one that scares me as I know that is more commonly used than the others.

We need to ensure that the solution not only works for read only replicas.
The idea to split the schema into required and supplemental sounds good, but right now in a replicated topology after any change all of it would get mixed up in 99user.ldif after "schema learning", so we should also work on this. There have been different suggestions to improve schema handling starting with #496 and #49069, #49418, #49420, and soem other scheam related tickets

It may not just be read only-s, but I can certainly see risks in if admins were to have two replicas and configure different schema options on them. One would have to learn from the other, but it would at least keep learning on upgrades. Worst case you configure conflicting schemas on them.

Perhaps when we go to add this configuration option, we should put in something like "nsslapd-unsafe-use-alternate-schema".

IMO the "defaults" chosen here must work for us in 389-ds primarily and our deployments. It's only on configuration that the schema could be minimised or replaced to the AD style.

@mreynolds ,

wouldn't it be an option to add (overwrite) streetAdresse and 'name' in 99user.ldif
For OID conflict, would it be possible to add them into 99user.ldif with a <named-oid> rather then a digit OID. AFAIK GC instance will not replicate so it is only for local definition.

I think we've needed a way to over-load schema from 99user.ldif for a while, so that would help here I think.

IE instead of rejectingthe duplicate OID, we take the "latest" version from the .ldif, meaning 99user.ldif always over-rides all else.

Saying that this could have un-intended consequences .... but it would solve some of @abbra's needs, and it would solve the rfc2307/rfc2307bis issues we've had.

The attached patch should resolve the problem of new attribute (definition and name) using an already used OID, allowing 99user.ldif to overwrite the existing definition

Error in the previous patch

@tbordaz Would this allow any of the ldifs to overwrite a former content? For example something in 10example.ldif to override 00core.ldif? This is the behaviour I want to allow people to be able to over-ride with rfc2307bis for example.

The ordered list of files (both from /share and <instance_confdir>) are loaded into a single cn=schema entry, that is parse.
During parsing the overwrite flag for attributetypes (SLAPI_ATTR_FLAG_OVERRIDE), should force new attributetypes value and this is what this "tentative" patch was doing. There are multiple corner cases and in this specific case (reusing an OID for a different name) the flag was not enforced. It could exist others corner cases.
I have not seen such flag for objectclasses, so possibly conflicting objectclasses are not forced.

In short, if rfc2307bis overwrite attributestypes definitions it should work. For objectclasses I have doubt

I think objectclasses will possibly be a problem for @abbra too, so we should consider how to approach that in addition to this patch. Is it possible to have this patch as a PR?

I have now an installation that loads a modified AD schema into 389-ds instance. 389-ds is patched with @tbordaz patch (and few temporary other patches to aid with logging around schema loading). It is available in my copr abbra/gc-wip.

I had to tune quite a bit my schema converter so that we aren't conflicting anymore on object class level between 389-ds and AD schema. This means, for example, that bunch of AD classes got renamed and assigned new OIDs from FreeIPA space. So far, I had to re-assign seven classes and filter 32 additional ones.

The idea is to ensure all original object classes are present in the objects that will be created in GC (GC is read-only for all consumers, so we have full control what goes in) in addition with the real object classes that add required attributes from the original AD object classes. For example, 'top' in AD has way more attributes than in 389-ds, so the converter automatically rename it to 'ad-top' and the latter will be added to objects along with 'top'. As long as data translator script will handle the addition of 'ad-top' to the objects, we should be OK.

@tbordaz, please, submit the patch as a pull request, I think it solves at least part of the problem we have and is worth adding it.

Look, freeipa isn't any of my business but the amount of OID and class changing does not seem like something that is good as a solution here. I think we need to allow objectClass over-rides, not just attribute ones to resolve this ....

And I already asked @tbordaz to make this a PR ....

Preparing a PR for https://pagure.io/389-ds-base/issue/50652#comment-606068, the patch can create trouble (for example OID hashtable containing duplicates) and moreover @abbra removed (filtered) the problematic definitions. At the end of the day, AD schema can be loaded on DS on master branch. So there is no need for this patch anymore.

Is there any other pending patch or issues with this ticket ?

What's the PR in progress? Still #50652?

@firstyear, the patch allowed to override conflicting attribute definitions but it was more a hack to accelerate GC testing. One of the concern with the patch is that OID hashtable table would have duplicate and there was no guaranty that during lookup the overriding definition would come first. Also it was difficult to anticipate others side effect after relaxing the schema override.

In parallel @abbra removed the problematic definitions and GC does no longer need this patch.

So I will not create a PR for the patch (that is abandoned).
The question I was asking is if this ticket is still valid and expects a fix from 389-ds.

Okay, I think that's up to @abbra to answer then .... but don't we need to split up and clean our schema into core, ad and 389?

I think it would be good to split the schema into separate parts, indeed.

Metadata Update from @mreynolds:
- Issue priority set to: normal
- Issue set to the milestone: 1.4.3 (was: 1.4.2)

6 months ago

No I don't think so - I think that this is about running a tottaly different core (AD vs 389 ds schema) rather than the rfc2307 issue (having to make concessions inside of the 389 schema only). So I think this remains valid and different.

Metadata Update from @mreynolds:
- Issue tagged with: Schema

5 months ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/3707

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

4 days ago

Login to comment on this ticket.