#225 Bugzilla 484855 - Mediawiki Fedora-only patch
Closed None Opened 14 years ago by ricky.

This ticket is to track the discussion surrounding a Fedora-only patch to the mediawiki package at https://bugzilla.redhat.com/show_bug.cgi?id=484855.

In addition to causing serious breakages (the files listed comment 17 are still broken in the current package), I think this is a case of unnecessarily going against our policy of sticking to upstream (an upstream developer has made their opinion clear on comments 38 and 40).

Just to be clear, I'd rather this discussion remain technical (and the issues with maintainer unresponsiveness and otherwise be discussed separately if need be).


Adding Aryeh Gregor (upstream mediawiki developer) to CC.

My analysis of the issue (based on my interpretation of our policies):

Carrying a local patch is generally the maintainer's call. Plus, in this case, the patch is REQUIRED to comply with the FHS and thus with our packaging guidelines. Upstream has their reasons for rejecting it, but this is just a case where upstream and distributions have and will continue to have conflicting goals (upstream wants users to be able to dump the tarball somewhere into their home directory on a shared host and have it work, distributions have the FHS to comply to), so carrying the patch is the right thing to do.

Notwithstanding the above, the issues caused by the patch need to be fixed. But that means the patch needs to be completed (the other affected files patched), not dropped.

Our policy to stick to upstream is just a recommendation, or policy to comply with the FHS is a mandatory packaging guideline. So the FHS trumps upstream.

Does this really have to come to a FESCo vote?

The patch is not at all necessary to comply with the FHS. In fact, the Fedora mediawiki instance is using a mediawiki package that is identical, except with the patch removed. Mediawiki is still installed into /usr/share/mediawiki, and we use the method described at http://meta.wikimedia.org/wiki/MediaWiki_FAQ#Is_it_possible_to_install_more_than_one_wiki_.28MediaWiki.29_on_a_server.3F_How.3F to handle multiple wiki instances.

So no, the patch does not have anything to do with FHS compliance. It has more to do with introducing unnecessary complex (and broken) non-upstream code when alternative solutions (mentioned in comments 22 and 35) are possible (just replace ../../mediawiki-1.15.0 with /usr/share/mediawiki - I only used a tarball for testing purposes).

If Axel's last response to the bug is causing confusion here, I want to reemphasize that installation to /usr/share/mediawiki is not the point of contention here - in fact, I very much disagree with his assessment of Aryeh's comment as "mostly against packaged content in general."

As Aryeh mentioned in comment 40,
{{{
However, we could support installation into /usr/share as an alternative, for users who want to do that (e.g., installing from packages).
}}}

and

{{{
Your current patch is unsalvageable. You're trying to manually hack every single reference to a path, which is unmaintainable.
}}}

These statements are followed by a discussion of various strategies to maintain FHS-compliance without the patch. Especially given the amount of breakage caused by the patch and the current packaging, I think his advice needs to be given some consideration.

As a (very) casual user of mediawiki I had no idea that doing a simple 'yum update' on a Fedora 10 box would render mediawiki useless without running some update stuff manually from the command line--not to mention the time it took to figure out why it had suddenly stopped working. I cannot ever remember an "update" of a package in a stable version of Fedora that required this much work to get the "update" working again.

Are there any packaging or maintainer guidelines around horrible end-user experiences like this? If not, shouldn't some be created?

Replying to [comment:5 poelstra]:

I cannot ever remember an "update" of a package in a stable version of Fedora that required this much work to get the "update" working again.

You're not a gallery2 user then?

Are there any packaging or maintainer guidelines around horrible end-user experiences like this? If not, shouldn't some be created?

I think that would be a good idea.

OK, this is not funny anymore. Let's collect some facts:

  • This patch exists since Aug 23rd 2006, e.g. almost 3 years now (Fedora CVS has a date of 2006/10/11, so let's then say 2 3/4 years for purists).
  • This patch adds higher security since other than the defined entry points (until recently just index.php), no code is visible to the outside. This is a big improvement towards typical mediawiki installs and some of the CVEs encountered in the last years would not affect Fedora due to this.
  • Other than security it also allows for multiple instances, since it is a shared-code-multiple-data patch like some of the upstream suggestions at http://www.mediawiki.org/wiki/Wiki_family
  • Bug #484855 was effectively a request to allow for a new entry point api.php. After a misunderstanding on my side of what the reported really wanted I admitted that the package needed to allow this entry point as well.
  • The reporter's request, e.g. a working api.php entry point has been dealt with, e.g. the bug is fixed (hopefully, there was no feedback anymore, since the bug was hijacked to discuss the patch).
  • There are more entry points, but not all of them make sense to automatically export. For instance StartProfiler.php needs manual intervention anyway. A packager/maintainer should weigh in the typical use cases and activate the most common ones. We have now index.php and api.php, some may follow. Actually I do have some headaches about unleashing api.php on every mediawiki instance, I'd prefer this to be a choice of the wiki admin, so maybe a future mechanism will push more choices to the wiki admin.
  • What is the patch really doing?
  • The patch starts to differentiate between pure code parts ($IP) and data/instance parts ($DIR, perhaps $DATADIR would had been less a misnomer)
  • There is already some work in the vanilla codebase for that with the scarcely documented MW_INSTALL_PATH variable, but it is not fleshed out yet.
  • Some implicit assumptions in the code of what the current dir is are circumvented that way.
  • /usr/share/mediawiki is hardcoded as the installation path which is the part that makes this patch not-yet-submittable upstream.
  • did this patch cause "serious breakages"? I disagree, the patch's bugs were undetected for almost three years, and the "serious breakages" are observed only when using corner case scenarios like api.php. Which doesn't mean it shouldn't be fixed, but 99.9% of Fedora's mediawiki users probably couldn't care less.
  • Ricky's alternative: After discussing/explaining the commoncode patch Ricky suggests a symlink setup that has similar functionality, but
  • is overly complicated in user serviceable parts like
  • the two-tier setup /w2 -> (Alias) /srv/something -> (symlink) to overcome security concerns
  • being error prone in like the broken config with typos suggested in comment 35
  • is more difficult to mass upgrade (what happens with new/removed symlinks, where on the filesystem do they live? The symlink solutions are opt-out, while the current one is opt-in)
  • On the upstream input by simetrical:
  • He started to argue from the viewpoint of users on a non-editable mediawiki tree, which equally applies to both the patch method and the symlink one. In general any packaged version will never allow direct editing by end (non-root) users, only copying/symlinking would be allowed. So much of his comments are about user tarball installs vs packaged installs.
  • His alternative suggestion of hardcoding $IP = '/usr/share/mediawiki'; or offering altered input.php and friends is actually part of what the patch does (although the patch modifies the files themselves to set $IP, while simetrical suggests wrapper and includes, but I'd consider this an implementation detail).
  • On the difficulty to setup/update mediawiki (or place any LAMP or mysql controlled application here):
  • Setup/migration/updates of mediawiki/moodle/joomla/bugzilla/drupal/... that require an authenticated database setup and will be difficult to manage.
  • The patch eases on the users' work by doing all the code related updating for all instances on the server - the user will still have to manually update the databases themselves.
  • Debian has a more progressive approach auto-updating the databases, but I feel too conservative to have a yum update starting to mess with the users' database, perhaps this will come too at some point in time.
  • At any rate the patch does not hinder updating in fact it facilitates this, since the users doesn't need to cleanup possibly dangling symlinks to nirvana.

Let's summarize:
* Is this patch worth upstream? It continues the separation of code vs data access which is useful for security and multi-instance (farms/families) setups, so if polished and reviewed by mediawiki developers has IMHO a chance of getting into upstream (even though simetrical currently considers it bad).
* Does the package work now? It works like it worked in the last 3 years - updates sometimes need a manual intervention to get the database schemes aligned to the current version. Automating this or not was not in the scope of the discussion in the mentioned bugzilla, but is worth to examine.
* How well is the setup documented? I admit not at all. Perhaps instead of writing replies to Ricky I should document this.
* How complicated is the /etc/httpd/conf.d/mywiki.conf setup?
{{{
Alias /skins /usr/share/mediawiki/skins
}}}
* Is this a package/patch which has/had a bug? God forbid, there are bugs? Let's not kill the bugs, but the patch. :)

I like to see people interested in the package and trying to improve it, but I think this is just Ricky's vs Axel's ways. Note that upstream has 6 different ways of implementing multiple instances, so one could say that there is no canonical approach to this.

Just for fun, 5 years ago I suggested to separate code and data in bugzilla

http://fossplanet.com/bug-tracking.bugzilla.devel/thread-172782-bugzilla/

which bugzilla 2.22 finally implemented. Maybe separating code and data is not a bad idea after all?

 ''Trust me. I know what I'm doing.'' -- Sledge Hammer

Replying to [comment:7 athimm]:

  • This patch exists since Aug 23rd 2006, e.g. almost 3 years now (Fedora CVS has a date of 2006/10/11, so let's then say 2 3/4 years for purists).
    That is how long api.php and the files listed have been broken.

  • This patch adds higher security since other than the defined entry points (until recently just index.php), no code is visible to the outside. This is a big improvement towards typical mediawiki installs and some of the CVEs encountered in the last years would not affect Fedora due to this.
    This is achievable by non-patching methods mentioned in the bug.

  • Other than security it also allows for multiple instances, since it is a shared-code-multiple-data patch like some of the upstream suggestions at http://www.mediawiki.org/wiki/Wiki_family
    The page does not suggest patching mediawiki code or modifying anything other than configuration files. Upstream has suggested alternatives that allow multiple instances and the security security of only exposing entry points.

  • The reporter's request, e.g. a working api.php entry point has been dealt with, e.g. the bug is fixed (hopefully, there was no feedback anymore, since the bug was hijacked to discuss the patch).
    The other issues listed in the bug are not fixed, see comment 17 for a list of breakages as well as comment 37 mentioning the issue with no skins for the install script. I offered to report a new bug specifically about the patch but received no response.

  • There are more entry points, but not all of them make sense to automatically export. For instance StartProfiler.php needs manual intervention anyway. A packager/maintainer should weigh in the typical use cases and activate the most common ones. We have now index.php and api.php, some may follow. Actually I do have some headaches about unleashing api.php on every mediawiki instance, I'd prefer this to be a choice of the wiki admin, so maybe a future mechanism will push more choices to the wiki admin.
    Those entry points are broken even if a user attempts to symlink them in because the patch breaks them (by modifying WebStart.php to use the normally nonexistent $DIR variable).

  • did this patch cause "serious breakages"? I disagree, the patch's bugs were undetected for almost three years, and the "serious breakages" are observed only when using corner case scenarios like api.php. Which doesn't mean it shouldn't be fixed, but 99.9% of Fedora's mediawiki users probably couldn't care less.
    Many believe it did. Bugs 494362 and 494880 were directly caused by the patch. Fedora Infrastructure has started running a custom package due to the breakages introduced by the patch. The list of files in the bug remain broken.

  • Ricky's alternative: After discussing/explaining the commoncode patch Ricky suggests a symlink setup that has similar functionality, but

  • is overly complicated in user serviceable parts like
  • the two-tier setup /w2 -> (Alias) /srv/something -> (symlink) to overcome security concerns
  • being error prone in like the broken config with typos suggested in comment 35
    In comment 35, I made some typos. The typos did not break 7 entry points and cause countless other errors. They are corrected in comment 37.

  • is more difficult to mass upgrade (what happens with new/removed symlinks, where on the filesystem do they live? The symlink solutions are opt-out, while the current one is opt-in)
    Mediawiki is impossible to "mass upgrade" between major version updates as it requires database updates as John mentioned. This issue would be solved with proper documentation (the same type of documentation that would warn users of database upgrades).

  • He started to argue from the viewpoint of users on a non-editable mediawiki tree, which equally applies to both the patch method and the symlink one. In general any packaged version will never allow direct editing by end (non-root) users, only copying/symlinking would be allowed. So much of his comments are about user tarball installs vs packaged installs.
    This is irrelevant. As per the quote in my last comment, he seems perfectly happy to support /usr/share/mediawiki as an alternative to the default "copy the whole source tree" approach.

  • His alternative suggestion of hardcoding $IP = '/usr/share/mediawiki'; or offering altered input.php and friends is actually part of what the patch does (although the patch modifies the files themselves to set $IP, while simetrical suggests wrapper and includes, but I'd consider this an implementation detail).
    The wrapper method would not have introduced the breakages that the patch did. It would also not have touched any of the upstream source in core places. We did some testing with this method, and it might cause issues with the installation script, so it is not currently clear how feasible this method is compared to symlinks yet.

  • On the difficulty to setup/update mediawiki (or place any LAMP or mysql controlled application here):

  • Setup/migration/updates of mediawiki/moodle/joomla/bugzilla/drupal/... that require an authenticated database setup and will be difficult to manage.
  • The patch eases on the users' work by doing all the code related updating for all instances on the server - the user will still have to manually update the databases themselves.
    Unless the list of files in comment 35 change upstream or new required files are added, the symlink method is unlikely to leave dangling symlinks (upstream can probably confirm how unlikely these changes would be). If symlinks are broken between package updates the issue should be documented properly along with database upgrades. I do not consider this a downside of a symlink approach. With the current layout, users must symlink entry points into the wiki directory if they need them. If the filenames were to change, however unlikely this is, they would need to update their symlinks as well.

  • Is this patch worth upstream? It continues the separation of code vs data access which is useful for security and multi-instance (farms/families) setups, so if polished and reviewed by mediawiki developers has IMHO a chance of getting into upstream (even though simetrical currently considers it bad).
    It was mentioned in comment 35 that everybody would be happy if the patch got accepted upstream, because it would receive more attention and testing, which would make such breakages far less likely. However, upstream has described the patch in its current form as "unsalvageable" and "unmaintainable" as it tries to change references to $IP almost everywhere, so this seems highly unlikely.

  • How well is the setup documented? I admit not at all. Perhaps instead of writing replies to Ricky I should document this.
    Even though this is beside the technical issues that I would like to discuss here, in bug 475841, the reporter has written a draft of a README.FEDORA and seen no response in over 7 months.

  • Is this a package/patch which has/had a bug? God forbid, there are bugs? Let's not kill the bugs, but the patch. :)
    The idea of switching half of the references to $IP to $DIR instead is "unmaintainable" and "unsalvageable," as upstream has said. This approach has led to a large number of breakages. I performed some basic testing and found that a number of entry points scripts are broken, as discussed in the bug. I would not be surprised if some maintenance scripts were broken as well.

I think Toshio summed up the issue this ticket is about pretty well:

< abadger1999> From the Fedora packager perspective -- the packager has created a Fedora-specific patch that he believes makes the package better. The patch breaks certain areas of mediawiki. Upstream and some users (fedora infra) think that the patch is unnecessary and (from what I can tell) isn't going into upstream's code base.
< abadger1999> So the upstream upstream upstream mantra is being broken here. Whether it's grounds for FESCo to overrule the maintainer is the question.

Which he later clarified: "unnecessary means that upstream is proponent of doing things in a different (currently supported) way."

While I am normally not crazy about the idea of FESCo stepping into packaging details like this, I currently see no other path to getting what upstream, Fedora Infrastructure, and I consider to be a sane and unbroken mediawiki package in Fedora.

Thanks for all the details provided. So I take it that the patch is not actually required to follow the FHS, which makes this a lot harder to decide whether it should be kept or not. FESCo has tasked me with acting as a mediator here, so I'll look carefully at the issues involved to be able to arbitrate.

I'm the upstream developer mentioned. A few comments:

Replying to [comment:7 athimm]:

  • This patch adds higher security since other than the defined entry points (until recently just index.php), no code is visible to the outside. This is a big improvement towards typical mediawiki installs and some of the CVEs encountered in the last years would not affect Fedora due to this.

Which CVEs, specifically? Every !MediaWiki source file that's not an entry point either only defines classes/functions, or dies if ( !defined( 'MEDIAWIKI' ) ). I certainly can recall no vulnerability that this would have averted. I can absolutely see the value in keeping things in line with the standard filesystem layout, for users who are able to do that, but I doubt it will prevent security issues.

Multiple instances are just as easy to set up in the default "all files together" way of doing things. That's how, e.g., Wikipedia works.

  • There are more entry points, but not all of them make sense to automatically export. For instance !StartProfiler.php needs manual intervention anyway. A packager/maintainer should weigh in the typical use cases and activate the most common ones. We have now index.php and api.php, some may follow. Actually I do have some headaches about unleashing api.php on every mediawiki instance, I'd prefer this to be a choice of the wiki admin, so maybe a future mechanism will push more choices to the wiki admin.

!StartProfiler.php is not an entry point. I gave a list of entry points I spotted offhand here, at (3) (although it's possible there are a few more):

https://bugzilla.redhat.com/show_bug.cgi?id=484855#c40

You're currently missing at least five of them. This in particular breaks !OpenSearch support, as far as I can tell without actually trying to test the package (I don't have a Fedora install handy), which would normally work out of the box with no configuration. !OpenSearch is a feature that's likely to be pretty widely used.

Admins have the option to disable the API by setting $wgEnableAPI = false; in !LocalSettings.php. However, the default is true for good reason. The API is intended to be an integrated part of !MediaWiki and is steadily becoming more essential over time.

  • What is the patch really doing?
  • The patch starts to differentiate between pure code parts ($IP) and data/instance parts ($DIR, perhaps $DATADIR would had been less a misnomer)

I don't understand why two variables are needed here. $IP is the path where all the wiki files are. It should be fine to just set it to /usr/share/mediawiki and leave it at that, if all files are present there -- does that break anything? $IP should never need to point to the web path.

  • did this patch cause "serious breakages"? I disagree, the patch's bugs were undetected for almost three years, and the "serious breakages" are observed only when using corner case scenarios like api.php. Which doesn't mean it shouldn't be fixed, but 99.9% of Fedora's mediawiki users probably couldn't care less.

AFAICT, styles don't work out of the box. If you install the package and try viewing a page, everything is apparently plain black on white text in the default font, or something like that, since you didn't move the skins/ directory. I noticed this while reading the patch and didn't test it, but Ricky confirms it's the case. That sounds like pretty serious breakage to me.

  • His alternative suggestion of hardcoding $IP = '/usr/share/mediawiki'; or offering altered input.php and friends is actually part of what the patch does (although the patch modifies the files themselves to set $IP, while simetrical suggests wrapper and includes, but I'd consider this an implementation detail).

The problem is you have to find and update all references to $IP. There are a lot, and even granted that you don't need to change all of them for your purposes, I'm pretty sure you missed some:

{{{
$ git grep '$IP' | wc -l
224
}}}

Or counting extensions in !MediaWiki SVN as well (there are many third-party extensions that I can't even count):

{{{
$ git grep '$IP' | wc -l
789
}}}

Changing the semantics of $IP is intrusive and will likely cause maintenance problems, so I'd recommend against it.

  • On the difficulty to setup/update mediawiki (or place any LAMP or mysql controlled application here):
  • Setup/migration/updates of mediawiki/moodle/joomla/bugzilla/drupal/... that require an authenticated database setup and will be difficult to manage.
  • The patch eases on the users' work by doing all the code related updating for all instances on the server - the user will still have to manually update the databases themselves.
  • Debian has a more progressive approach auto-updating the databases, but I feel too conservative to have a yum update starting to mess with the users' database, perhaps this will come too at some point in time.

This is orthogonal to the bulk of this discussion, but FWIW: if you don't auto-update the database, the wiki will stop working when you upgrade, and start throwing database errors on most or all pages. The user has no way to fix this other than upgrading the database manually, so I don't see a lot of value in not doing it automatically. Some very careful users may wish to back up their database first, but I'm pretty sure the overwhelming majority would prefer to have it Just Work. Careful users will be keeping backups anyway, one hopes.

  • Is this patch worth upstream? It continues the separation of code vs data access which is useful for security and multi-instance (farms/families) setups, so if polished and reviewed by mediawiki developers has IMHO a chance of getting into upstream (even though simetrical currently considers it bad).

I can state with some confidence that your approach (replacing some but not all instances of $IP) has no chance of making it upstream, for the reasons I've given. It's also unlikely to be easy for you to maintain correctly downstream. I doubt it's correct now, since you probably missed some places and broke some uses of the software. Perhaps not major ones, but is it okay to break things as long as they're not major? Ricky notes that he has identified things that have broken because of the change.

Replying to [comment:8 ricky]:

The wrapper method would not have introduced the breakages that the patch did. It would also not have touched any of the upstream source in core places. We did some testing with this method, and it might cause issues with the installation script, so it is not currently clear how feasible this method is compared to symlinks yet.

I'd be happy to review and commit any necessary patches to the install script, as long as they're suitably flexible (i.e., don't break standard tarball installation). Currently it generates a !LocalSettings.php that it tries to copy to $IP. This won't work if $IP isn't writable by the web server. A workaround would be to leave config/ in the web root (not moved to /usr/share/mediawiki) and symlink /usr/share/mediawiki/LocalSettings.php -> /var/www/wiki/LocalSettings.php or such. (The config script might need to be tweaked so it won't think the broken symlink means !LocalSettings.php already exists.) This isn't ideal, since !LocalSettings.php should really be in /etc/mediawiki, but it's an improvement, anyway.

Unless the list of files in comment 35 change upstream or new required files are added, the symlink method is unlikely to leave dangling symlinks (upstream can probably confirm how unlikely these changes would be).

We do sometimes add or remove entry points, as needs require. It's not common. Ideally I'd like to see a shell script or something committed upstream that will do all the setup here, splitting apart the entry points and web-accessible files (e.g. skins/ and images/) so that downstreams don't all have to separately work out how to do it. Then changes in entry points can update that script too; or if it gets forgotten, at least we can fix it once upstream and all distributors will get the changes, so it will be fixed faster.

Replying to [comment:10 simetrical]:

Replying to [comment:7 athimm]:

  • This patch adds higher security since other than the defined entry points (until recently just index.php), no code is visible to the outside. This is a big improvement towards typical mediawiki installs and some of the CVEs encountered in the last years would not affect Fedora due to this.

Which CVEs, specifically?

I know of at least CVE-2008-0460, but there are a couple api related more http://www.google.com/search?q=cve+mediawiki+api.php

Every !MediaWiki source file that's not an entry point either only defines classes/functions, or dies if ( !defined( 'MEDIAWIKI' ) ). I certainly can recall no vulnerability that this would have averted.

Certainly every developer trusts that there are none to very few security bugs in his code, and I'm not going to try to make a security review of mediawiki. The point is not to have to, since I do not expose the code. If I just exclude all php files that are included in the tarball that seem to check against the MEDIAWIKI guardian I still have 823 to check:

{{{
$ filestocheck="find mediawiki-1.15.1 -name \*.php\*|xargs grep -c "'MEDIAWIKI'" \ | grep :0 | sed -e's,:0$,,'"
$ echo There are echo "$filestocheck" | wc -l files to check; echo "$filestocheck" \
| sed -e's,/[^/]$,,'| sort | uniq -c|sort -nr | while read n dir; do (echo -n $n in $dir:\ ;\
echo "$filestocheck" | grep "$dir/[^/]
$" | xargs ls -S | sed -e"s,^$dir/,," | tr '\n' ' ' \
| cut -c-300 | tr '\n' ' '; echo "[...]") | fmt; done|head -n 50
There are 823 files to check
335 in mediawiki-1.15.1/languages/messages: MessagesSi.php MessagesEl.php
MessagesMl.php MessagesTe.php MessagesTa.php MessagesTh.php MessagesRu.php
MessagesMk.php MessagesKm.php MessagesBe_tarask.php MessagesUk.php
MessagesMr.php MessagesBg.php MessagesBn.php MessagesFa.php MessagesHi.php
MessagesSah.php MessagesSr_ec.php MessagesMdf.php Mes [...]
119 in mediawiki-1.15.1/includes: ZhConversion.php Article.php Title.php
User.php EditPage.php Linker.php Sanitizer.php ChangesList.php Exif.php
SearchEngine.php Import.php SpecialPage.php AutoLoader.php Pager.php
Revision.php LogEventsList.php memcached-client.php Export.php
IEContentAnalyzer.php MessageCache.php MimeMagic.php Bloc [...]
102 in mediawiki-1.15.1/maintenance: fuzz-tester.php
installExtension.php dumpTextPass.php generateSitemap.php
importUseModWiki.php namespaceDupes.php preprocessorFuzzTest.php
mwdocgen.php importImages.php orphans.php rebuildImages.php addwiki.php
fixSlaveDesync.php cleanupTitles.php cleanupImages.php importDump.php
findhooks.php reass [...]
83 in mediawiki-1.15.1/includes/specials: SpecialUpload.php
SpecialRevisiondelete.php SpecialSearch.php SpecialPreferences.php
SpecialUndelete.php SpecialUserlogin.php SpecialBlockip.php
SpecialRecentchanges.php SpecialUserrights.php SpecialContributions.php
SpecialIpblocklist.php SpecialMovepage.php SpecialWatchlist.php
SpecialAllpages.php [...]
57 in mediawiki-1.15.1/languages/classes: LanguageKk_cyrl.php
LanguageKk.php LanguageTyv.php LanguageKu.php LanguageZh.php
LanguageSr.php LanguageGan.php LanguageFi.php LanguageRu.php
LanguageUk.php LanguageLa.php LanguageEo.php LanguageWa.php LanguageHy.php
LanguageHe.php LanguageTg.php LanguageCu.php LanguageKaa.php
LanguageJa.php Languag [...]
18 in mediawiki-1.15.1/includes/filerepo: LocalFile.php File.php
FileRepo.php FSRepo.php ArchivedFile.php RepoGroup.php OldLocalFile.php
ForeignAPIRepo.php LocalRepo.php FileCache.php ForeignAPIFile.php
UnregisteredLocalFile.php Image.php ForeignDBRepo.php ForeignDBFile.php
ForeignDBViaLBRepo.php FileRepoStatus.php NullRepo.php [...]
17 in mediawiki-1.15.1: profileinfo.php thumb.php index.php api.php
img_auth.php opensearch_desc.php trackback.php StartProfiler.php
redirect.php php5.php5 opensearch_desc.php5 trackback.php5 img_auth.php5
redirect.php5 thumb.php5 index.php5 api.php5 [...]
14 in mediawiki-1.15.1/includes/parser: Parser.php Preprocessor_Hash.php
Preprocessor_DOM.php CoreParserFunctions.php LinkHolderArray.php
Parser_LinkHooks.php ParserOptions.php DateFormatter.php ParserOutput.php
Tidy.php Preprocessor.php ParserCache.php Parser_DiffTest.php
CoreLinkFunctions.php [...]
13 in mediawiki-1.15.1/maintenance/language: diffLanguage.php lang2po.php
transstat.php rebuildLanguage.php date-formats.php makeMessageDB.php
digit2html.php countMessages.php langmemusage.php checkExtensions.php
dumpMessages.php checkLanguage.php alltrans.php [...]
11 in mediawiki-1.15.1/includes/normal: Utf8Case.php UtfNormal.php
CleanUpTest.php UtfNormalTest.php UtfNormalGenerate.php Utf8Test.php
UtfNormalUtil.php RandomTest.php Utf8CaseGenerate.php UtfNormalBench.php
UtfNormalDefines.php [...]
}}}

I'm quite sure that most of these files are OK, but it requires a review now. And there are many developers to mediawiki, can one be sure that a language file developer may not accidentally do call some mediawiki function making his language php file suddenly exploitable?

Since these file need never be accessed directly let's hide them and only deal with direct exploits to the real entry points.

I can absolutely see the value in keeping things in line with the standard filesystem layout, for users who are able to do that, but I doubt it will prevent security issues.

These two goals are orthogonal, the security issues are about allowing external users to access parts of code they were not supposed to either by choice of upstream (like a language php file) or by choice of the site admin (like !StartProfiler.php).

  • There are more entry points, but not all of them make sense to automatically export. For instance !StartProfiler.php needs manual intervention anyway. [...]

!StartProfiler.php is not an entry point. I gave a list of entry points I spotted offhand here, [...]

I mentioned !StartProfiler.php because ricky suggested exposing it and according to him is even currently exposed for Fedora wikis (or maybe he now fixed it).

I don't oppose to opting in some more entry points like !OpenSearch and guided by a developer of mediawiki this is easier to do than blindly whitelisting some subset of the 17 php files found in the root. The nice thing about the method used in the package is that you decide what to opt in starting at nothing while the all-in-one method needs to start to opt out (many) things.

  • What is the patch really doing?
  • The patch starts to differentiate between pure code parts ($IP) and data/instance parts ($DIR, perhaps $DATADIR would had been less a misnomer)

I don't understand why two variables are needed here. $IP is the path where all the wiki files are. It should be fine to just set it to /usr/share/mediawiki and leave it at that, if all files are present there -- does that break anything? $IP should never need to point to the web path.

It doesn't, it always points to /usr/share/mediawiki. But a (very) few references through $IP do access the web data. You can see these in the patch:

{{{
- if( !file_exists( "$IP/LocalSettings.php" ) ) {
+ if( !file_exists( "$DIR/LocalSettings.php" ) ) {
}}}

!LocalSettings.php is per wiki instance and can't live under /usr/share/mediawiki. $DIR points to the wiki data.

  • did this patch cause "serious breakages"? I disagree, [...]

AFAICT, styles don't work out of the box. [...]

The site admin is supposed to use the Alias method, see http://www.google.com/search?q=site%3Amediawiki.org+skins+alias+method

I added a default apache config file for documenting this clearly in rawhide. See also my comment with the Alias config you replied to.

  • His alternative suggestion of hardcoding $IP = '/usr/share/mediawiki'; or offering altered input.php and friends is actually part of what the patch does (although the patch modifies the files themselves to set $IP, while simetrical suggests wrapper and includes, but I'd consider this an implementation detail).

The problem is you have to find and update all references to $IP. There are a lot, and even granted that you don't need to change all of them for your purposes, I'm pretty sure you missed some:
{{{
$ git grep '$IP' | wc -l
224
}}}

You can grep out 4/5 from that
{{{
$ find mediawiki-1.15.1 -name *.php* | xargs grep -r '\$IP' \
| grep -Ev '\$IP(| . .)/(includes|languages|maintenance|skins)' \
| grep -v 'global .\$IP'| grep -v '\$IP ='| wc -l
39
}}}
But you can even argue that you only need to check occurences of $IP referring to images, LocalSettings.php, config etc, so there is really little to touch. The patch itself is only a few lines:
{{{
api.php | 5 ++++-
config/index.php | 15 +++++++++------
includes/Setup.php | 2 +-
includes/WebStart.php | 4 ++--
includes/templates/NoLocalSettings.php | 2 +-
index.php | 4 +++-
maintenance/archives/upgradeWatchlist.php | 2 +-
maintenance/commandLine.inc | 6 ++++--
maintenance/update.php | 3 +++
9 files changed, 28 insertions(+), 15 deletions(-)
}}}

Changing the semantics of $IP is intrusive and will likely cause maintenance problems, so I'd recommend against it.

It doesn't really change the semantics of $IP, it still is mediawiki's install path, e.g. where mediawiki was installed. It only needs to be fixed where it references wiki instance data like !LocalSettings.php, uploaded images and similar. It is not that intrusive, if you check it out.

  • On the difficulty to setup/update mediawiki (or place any LAMP or mysql controlled application here): [...]
  • Setup/migration/updates of mediawiki/moodle/joomla/bugzilla/drupal/... that require an authenticated database setup and will be difficult to manage.
  • The patch eases on the users' work by doing all the code related updating for all instances on the server - the user will still have to manually update the databases themselves.
  • Debian has a more progressive approach auto-updating the databases, but I feel too conservative to have a yum update starting to mess with the users' database, perhaps this will come too at some point in time.

This is orthogonal to the bulk of this discussion,

I agree, I just didn't want this to go unanswered.

but FWIW: if you don't auto-update the database, the wiki will stop working when you upgrade, ![...]

We should revisit this rather soon. This hasn't anything to do with the axel/ricky suggestions for mediawiki, but is important to have accounted for. The main decision is automate vs let the user decide, and the latter would nuke both ricky's and axel's methods, since both are upgrading the code beneath the users' feet. Just to finish the thought: It needs to be fully automated, or it needs to be a copy operation.

  • Is this patch worth upstream? It continues the separation of code vs data access which is useful for security and multi-instance (farms/families) setups, so if polished and reviewed by mediawiki developers has IMHO a chance of getting into upstream (even though simetrical currently considers it bad).

I can state with some confidence that your approach (replacing some but not all instances of $IP) has no chance of making it upstream, for the reasons I've given.

I've tried to lighten up your view and perhaps resolve some misunderstands about the patch. After all the separation of code vs data is somewhat already there in the code either in the form of MW_INSTALL_PATH or by implicit assumptions ($DIR = cwd). This is why the patch is just changing a handful of lines.

Even if this patch is not what mediawiki would finally include, from the semantics' POV would you agree that an optional separation of mediawiki code vs user data is something mediawiki can benefit from? If yes, we can certainly rethink and discuss the use case. And most probably on wikitech-l.

Replying to [comment:8 ricky]:

Unless the list of files in comment 35 change upstream or new required files are added, the symlink method is unlikely to leave dangling symlinks (upstream can probably confirm how unlikely these changes would be).

We do sometimes add or remove entry points, as needs require. It's not common. Ideally I'd like to see a shell script or something committed upstream that will do all the setup here, splitting apart the entry points and web-accessible files (e.g. skins/ and images/) so that downstreams don't all have to separately work out how to do it. Then changes in entry points can update that script too; or if it gets forgotten, at least we can fix it once upstream and all distributors will get the changes, so it will be fixed faster.

Wouldn't any mechanism in the core involving symlinks be already banned due to mediawiki's support of filesystems that don't support symlinks? The symlink method pretends to mediawiki that everything is one piece, if mediawiki starts to implement different paths to code vs data you may end up with something resembling the patch in discussion.

The wrapper method is like applying the patch w/o touching the files. It's perhaps OK for downstream, but an upstream wrapper method would make little sense, or not?

Replying to [comment:11 athimm]:

I know of at least CVE-2008-0460, but there are a couple api related more http://www.google.com/search?q=cve+mediawiki+api.php

Ah, yes, certainly if you go and disable major features like the API you'll prevent some vulnerabilities. You could have done that by setting $wgEnableAPI = false; in !LocalSettings.php without causing any of this breakage, however. It's not a benefit of the file moving, it's a benefit of disabling the API. The file moving might hypothetically prevent some improbable vulnerabilities, but given the demonstrable level of breakage it's caused I don't think that's a very good justification for anything.

It doesn't, it always points to /usr/share/mediawiki. But a (very) few references through $IP do access the web data. You can see these in the patch:

{{{
- if( !file_exists( "$IP/LocalSettings.php" ) ) {
+ if( !file_exists( "$DIR/LocalSettings.php" ) ) {
}}}

!LocalSettings.php is per wiki instance and can't live under /usr/share/mediawiki. $DIR points to the wiki data.

!LocalSettings.php doesn't have to be per-instance. You can (and IMO should) have only one !LocalSettings.php for all your wiki instances.

The site admin is supposed to use the Alias method, see http://www.google.com/search?q=site%3Amediawiki.org+skins+alias+method

I added a default apache config file for documenting this clearly in rawhide. See also my comment with the Alias config you replied to.

Nevertheless, if users installed the mediawiki package without using this method -- for instance, if they used lighttpd instead of Apache -- they would get a broken page, right? Which is only caused because you moved the files around like this?

It doesn't really change the semantics of $IP, it still is mediawiki's install path, e.g. where mediawiki was installed. It only needs to be fixed where it references wiki instance data like !LocalSettings.php, uploaded images and similar. It is not that intrusive, if you check it out.

Nevertheless, apparently several things broke because you missed some mentions of this?

I've tried to lighten up your view and perhaps resolve some misunderstands about the patch. After all the separation of code vs data is somewhat already there in the code either in the form of MW_INSTALL_PATH or by implicit assumptions ($DIR = cwd). This is why the patch is just changing a handful of lines.

Even if this patch is not what mediawiki would finally include, from the semantics' POV would you agree that an optional separation of mediawiki code vs user data is something mediawiki can benefit from? If yes, we can certainly rethink and discuss the use case. And most probably on wikitech-l.

I don't think I saw a clear explanation before of what you were trying to do by changing only some of the uses of $IP. It makes somewhat more sense now, but I still don't think it's a good idea. !LocalSettings.php should be shared by all wiki installs -- there are of course other ways to do things, since it can contain any PHP code, but that's the simpler way here, and it's what I'd recommend anyway (and what, e.g., Wikipedia uses). As for images/, nothing should be using $IP to find that except for the default setting, so you can just make sure that $wgUploadDirectory is set in !LocalSettings.php to something else (which can be different per wiki if you have multiple wikis). So I think the current uses of $IP are correct and no new variable needs to be created.

Wouldn't any mechanism in the core involving symlinks be already banned due to mediawiki's support of filesystems that don't support symlinks? The symlink method pretends to mediawiki that everything is one piece, if mediawiki starts to implement different paths to code vs data you may end up with something resembling the patch in discussion.

The wrapper method is like applying the patch w/o touching the files. It's perhaps OK for downstream, but an upstream wrapper method would make little sense, or not?

My idea would be that we'd keep the default tarball configuration, but include a script in maintenance/ that you could use to move the files around, like:

maintenance/separateFiles --source=$ROOT/usr/src/mediawiki --web=$ROOT/var/www/wiki --upload=$ROOT/var/lib/mediawiki/upload --config=$ROOT/etc/mediawiki

or whatever you want. It would then create the wrappers and shuffle around the needed files, and make the needed adjustments to config/index.php (maybe by putting an extra file in a magic place that it would check for and include if present), possibly creating symlinks or whatnot. The result could then be packaged by distributors, with the logic used to generate it kept in one place upstream.

Replying to [comment:12 simetrical]:

!LocalSettings.php is per wiki instance and can't live under /usr/share/mediawiki. $DIR points to the wiki data.

!LocalSettings.php doesn't have to be per-instance. You can (and IMO should) have only one !LocalSettings.php for all your wiki instances.

Isn't this file considered to be the place to perform per instance configurations like the sitename, which extensions and features to enable etc? I'm thinking here along instances like /srv/ivtvdriver.org/wiki/ vs /srv/lm-sensors.org/wiki where the different groups may want different features.

I guess there are mechanisms to merge these into one file, so even if !LocalSettings.php can be eliminated, there are still per wiki data to take care of like image uploads. This per wiki specific data is what lies under $DIR.

(also even if !LocalSettings.php could be merged it would not be able to reside under /usr/share/mediawiki, so including $IP/LocalSettings.php needs to be diverted somehow).

I don't think I saw a clear explanation before of what you were trying to do by changing only some of the uses of $IP. It makes somewhat more sense now, but I still don't think it's a good idea. !LocalSettings.php should be shared by all wiki installs

OK, I think we're getting closer to understanding each others' ideas, which is a good thing. :)

I still don't understand how the shared !LocalSettings.php could work, I need to read some more on mediawiki.org, but I think I would still prefer a per wiki file. The reason is that with different files I can have different admins managing their wikis, they could even spawn off a wiki into their $HOME/public_html w/o the need for someone managing a global !LocalSettings.php.

My idea would be that we'd keep the default tarball configuration, but include a script in maintenance/ that you could use to move the files around, like:

maintenance/separateFiles --source=$ROOT/usr/src/mediawiki --web=$ROOT/var/www/wiki --upload=$ROOT/var/lib/mediawiki/upload --config=$ROOT/etc/mediawiki

I like this approach, it would make packaging across all distros much easier.

or whatever you want. It would then create the wrappers and shuffle around the needed files, and make the needed adjustments to config/index.php (maybe by putting an extra file in a magic place that it would check for and include if present), possibly creating symlinks or whatnot. The result could then be packaged by distributors, with the logic used to generate it kept in one place upstream.

I still think that the wrapper stub files are equivalent to patching and introducing a where-is-my-wiki variable like $DIR in the patch, but we can certainly find a middle path somewhere.

I need to check on !LocalSettings.php, thanks for the input!

Replying to [comment:13 athimm]:

I still think that the wrapper stub files are equivalent to patching and introducing a where-is-my-wiki variable like $DIR in the patch, but we can certainly find a middle path somewhere.

Actually to share some history on the patch to make things more understandable: When I first run into the need to have two different paths, one for mediawiki's code and one for the wiki data, the first approach I took reading the code was to have $IP as an absolute path to the code and the relative path pointing to the data. That would had been a minimal solution.

These semantics are present in most of mediawiki, but not everywhere, sometimes the cwd is assumed to start in config, other times in maintenance, other times again $IP is prepended to paths it shouldn't need to. There are also several chdirs in the code (17 in 1.15.1) making the assumption of the current working directory always pointing to the data false.

So after the minimal solution patch started to grow out of control I decided to reverse the semantics: The patch chdirs to $IP, but before doing so remembers where the data resides in $DIR. Now the mediawiki code only needs to address $DIR when it needs to access the per wiki data. And it does so less than a dozen times, which is why the patch is so small.

More explicitly the $DIR variable is only needed for access to /images, !LocalSettings.php and !AdminSettings.php, and there is just one /images reference (the default for $wgUploadDirectory), the other are indeed just *Settings.php.

I'm just giving the rationale here. The alternatives for packaging mediawiki were/are
* all-in-one (symlinks), so $IP=$DIR with the problems discussed,
* data-is-in-cwd-code-in-$IP which requires a larger effort to polish and
* two variables $DIR!=$IP (e.g. the patch being discussed).

(as mentioned a couple of times $DIR is a misnomer, $WIKIDATA, $DATADIR or anything else will be a better name)

Replying to [comment:12 simetrical]:

!LocalSettings.php doesn't have to be per-instance. You can (and IMO should) have only one !LocalSettings.php for all your wiki instances.
This seems to be the upstream-recommended method, and apparently what wikipedia does. In a setup like this, /usr/share/mediawiki/LocalSettings.php would be a symlink to /etc/mediawiki/LocalSettings.php, and users would be able to configure multiple instances as per the upstream documentation about this method.

Replying to [comment:13 athimm]:

I still think that the wrapper stub files are equivalent to patching and introducing a where-is-my-wiki variable like $DIR in the patch, but we can certainly find a middle path somewhere.
I don't think they are equivalent at all. It is a far less error-prone approach that wouldn't have created any of the bugs here. It would also be able to be supported and documented upstream, and not require non-upstreamable any patches to mediawiki code (we did some reasearch here and some install script patches might be needed).

Replying to [comment:14 athimm]:

I'm just giving the rationale here. The alternatives for packaging mediawiki were/are
* all-in-one (symlinks), so $IP=$DIR with the problems discussed,
I think we have given reasonable solutions to all of the issues brought up. The security issues are completely solved with proper web server configuration. The upgradability concern can be solved with upstreamable scripts to handle creating and upgrading the necessary symlinks in each instance.

Sample web server configuration as well as instructions for using the necessary upgrade scripts could be provided in package documentation.

The three most promising approaches to the multiple instances problem that I see now are basically the ones mentioned above:

  • Single !LocalSettings.php - the main upstream-supported, configuration files in /etc that requires other files with site-specific configuration items such as upload directories. 100% of the code lives in /usr/share/mediawiki. Downsides: root access would be required to start a new instance, users would need basic familiarity with PHP to do the multi-instance setup described in the upstream documentation.
  • Symlinks method - separate instances in separate directories symlinked from /usr/share/mediawiki. Almost all upstream documentation meant for instances being run straight out of a tarball applies directly to this setup. Downsides: Version upgrades can break symlinks, might need a simple upstreamable script to handle creating/upgrading these which needs to be documented as one additional step during version upgrades. Need to expose each entry point separately in the web server configuration for it to be exposed.
  • Require method - most similar to the current method, except with no code patching - could be a good compromise that everybody is happy with. May need patches to install script (though upstream would be happy to take them if they are sufficiently flexible). Downsides: Needs a separate (though trivial) wrapper file for each entry point (although upstream has good documentation on exactly which files are entry points). If entry points change, instance directories need to be updated (this applies to the current patch method as well as the symlinks method). Probably the least tested of these options now, although I plan to change that soon :-)

It looks like the wrapper method is turning out to need the same patching as currently done (save the parts that live in the wrapper). Actually that wouldn't have needed any testing as otherwise if the wrapper method could live w/o the patches and introduction of $DIR, then the unwrapped one wouldn't need them also.

Although technically not correct for the sake of our context "requires" is just a preprocessor includes, so the wrapped version could always be flattened into a pure-patch solution. E.g. wrapping&patching or just patching are indeed equivalent, and the solutions from one model can be mapped to the other.

In any case, in a code/data split scenario mediawiki code needs to know where what is. We know we cannot rely on cwd, so one needs two variables. One is $IP and the other is a-better-name-for-$DIR. For example if mediawiki code uses $IP/include/that.php and $IP/images/uploaded-this.png, then one $IP needs to be changed (the former would point into /usr/share/mediawiki and the latter to /srv/my.third.domain/wiki).

I think the a-better-name-for-$DIR variable can be introduced with a patch that defaults it to be the same as $IP for merged code/data setups (e.g. simple tarball extracts). The spawning script can then properly set the variables while creating the stub instances.

Replying to [comment:13 athimm]:

Isn't this file considered to be the place to perform per instance configurations like the sitename, which extensions and features to enable etc? I'm thinking here along instances like /srv/ivtvdriver.org/wiki/ vs /srv/lm-sensors.org/wiki where the different groups may want different features.

I guess there are mechanisms to merge these into one file, so even if !LocalSettings.php can be eliminated, there are still per wiki data to take care of like image uploads. This per wiki specific data is what lies under $DIR.

(also even if !LocalSettings.php could be merged it would not be able to reside under /usr/share/mediawiki, so including $IP/LocalSettings.php needs to be diverted somehow).

The way this is done is like

{{{
if ( $_SERVER['SERVER_NAME'] == 'ivtvdriver.org' ) {
$wgDBname = 'ivtvdriver';
} else { ...
}}}

This allows you to share common configuration settings (e.g., $wgUseTeX, common extensions) easily. It also allows you to flexibly apply certain settings to groups of wikis. It's a lot like using a single configuration file for many virtual hosts in a web server, say. You can of course require_once() a separate config file for each wiki, if you want to keep each site's config in its own file (e.g. so it can be edited by unprivileged users).

Most Wikipedia configuration is available at http://noc.wikimedia.org/conf/. The relevant bit seems to be in http://noc.wikimedia.org/conf/highlight.php?file=CommonSettings.php. You could have a symlink from /usr/share/mediawiki/LocalSettings.php -> /etc/mediawiki/LocalSettings.php if you liked.

It's true that this wouldn't allow unprivileged users to set up their own wiki from the common code. A root user would have to add a few lines to include their config file conditionally as necessary. I hadn't thought of that. It seems kind of marginal, to be honest: they'd have no control over upgrading the code, for instance, so if the main code gets updated their wiki would break until they upgraded their database. They'd probably be better off using their own copy.

If unprivileged creation of new wikis is a use-case we really want to support, it may be worth introducing special logic to figure out where !LocalSettings.php is. It's also worth considering that having the file accessed directly from /etc/mediawiki instead of via symlink is more elegant. I don't think there are any other things where we'd need to have a different path from $IP.

I still think that the wrapper stub files are equivalent to patching and introducing a where-is-my-wiki variable like $DIR in the patch, but we can certainly find a middle path somewhere.

It's possible something like $localSettingsDir would be useful to add, but I'm not sure yet. I think the name you chose was part of the problem here, I didn't understand what you were getting at.

Replying to [comment:14 athimm]:

More explicitly the $DIR variable is only needed for access to /images, !LocalSettings.php and !AdminSettings.php, and there is just one /images reference (the default for $wgUploadDirectory), the other are indeed just *Settings.php.

I don't think the /images access is important, since the user can just set $wgUploadDirectory in !LocalSettings.php to override the default. !AdminSettings.php looks like it will be [http://www.mediawiki.org/wiki/Special:Code/MediaWiki/53664 removed in 1.16].

Replying to [comment:16 athimm]:

Although technically not correct for the sake of our context "requires" is just a preprocessor includes, so the wrapped version could always be flattened into a pure-patch solution. E.g. wrapping&patching or just patching are indeed equivalent, and the solutions from one model can be mapped to the other.

The problem with just patching is that then the entry points contain some nontrivial code, and are in a different place from the rest of the code. So you could have a 1.15 index.php pointing to 1.16 code, or something. That might break. Changes to the entry points themselves are rare, but they do occur sometimes. So having a really trivial wrapper seems safer.

I've added a documentation file to the MediaWiki tree that should contain useful information for distributors in general:

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/docs/distributors.txt?view=markup

Let me know if it's helpful.

Ricky will continue to work on this, and we'll revisit in a bit. Removing from agenda for now.

Reassigning to me as per today's fesco meeting. I won't be able to look at this until at least February, but at least it's on the radar.

Any news on this ticket?

Nope. Unfortunately, I haven't had the time to pursue this since school started.

Any further progress here from anyone? ;)

Login to comment on this ticket.

Metadata