5716177 ticket 1669 - improve i18n docstring extraction

31 files Authored by jdennis 12 years ago, Committed by rcritten 12 years ago,
31 files changed. 634 lines added. 914 lines removed.
    ticket 1669 - improve i18n docstring extraction
    
    This patch reverts the use of pygettext for i18n string extraction. It
    was originally introduced because the help documentation for commands
    are in the class docstring and module docstring.
    
    Docstrings are a Python construct whereby any string which immediately
    follows a class declaration, function/method declaration or appears
    first in a module is taken to be the documentation for that
    object. Python automatically assigns that string to the __doc__
    variable associated with the object. Explicitly assigning to the
    __doc__ variable is equivalent and permitted.
    
    We mark strings in the source for i18n translation by embedding them
    in _() or ngettext(). Specialized extraction tools (e.g. xgettext)
    scan the source code looking for strings with those markers and
    extracts the string for inclusion in a translation catalog.
    
    It was mistakingly assumed one could not mark for translation Python
    docstrings. Since some docstrings are vital for our command help
    system some method had to be devised to extract docstrings for the
    translation catalog. pygettext has the ability to locate and extract
    docstrings and it was introduced to acquire the documentation for our
    commands located in module and class docstrings.
    
    However pygettext was too large a hammer for this task, it lacked any
    fined grained ability to extract only the docstrings we were
    interested in. In practice it extracted EVERY docstring in each file
    it was presented with. This caused a large number strings to be
    extracted for translation which had no reason to be translated, the
    string might have been internal code documentation never meant to be
    seen by users. Often the superfluous docstrings were long, complex and
    likely difficult to translate. This placed an unnecessary burden on
    our volunteer translators.
    
    Instead what is needed is some method to extract only those strings
    intended for translation. We already have such a mechanism and it is
    already widely used, namely wrapping strings intended for translation
    in calls to _() or _negettext(), i.e. marking a string for i18n
    translation. Thus the solution to the docstring translation problem is
    to mark the docstrings exactly as we have been doing, it only requires
    that instead of a bare Python docstring we instead assign the marked
    string to the __doc__ variable. Using the hypothetical class foo as
    an example.
    
    class foo(Command):
        '''
        The foo command takes out the garbage.
        '''
    
    Would become:
    
    class foo(Command):
        __doc__ = _('The foo command takes out the garbage.')
    
    But which docstrings need to be marked for translation? The makeapi
    tool knows how to iterate over every command in our public API. It was
    extended to validate every command's documentation and report if any
    documentation is missing or not marked for translation. That
    information was then used to identify each docstring in the code which
    needed to be transformed.
    
    In summary what this patch does is:
    
    * Remove the use of pygettext (modification to install/po/Makefile.in)
    
    * Replace every docstring with an explicit assignment to __doc__ where
      the rhs of the assignment is an i18n marking function.
    
    * Single line docstrings appearing in multi-line string literals
      (e.g. ''' or """) were replaced with single line string literals
      because the multi-line literals were introducing unnecessary
      whitespace and newlines in the string extracted for translation. For
      example:
    
      '''
      The foo command takes out the garbage.
      '''
    
      Would appear in the translation catalog as:
    
    "\n
      The foo command takes out the garbage.\n
      "
    
      The superfluous whitespace and newlines are confusing to translators
      and requires us to strip leading and trailing whitespace from the
      translation at run time.
    
    * Import statements were moved from below the docstring to above
      it. This was necessary because the i18n markers are imported
      functions and must be available before the the doc is
      parsed. Technically only the import of the i18n markers had to
      appear before the doc but stylistically it's better to keep all the
      imports together.
    
    * It was observed during the docstring editing process that the
      command documentation was inconsistent with respect to the use of
      periods to terminate a sentence. Some doc had a trailing period,
      others didn't. Consistency was enforced by adding a period to end of
      every docstring if one was missing.
    
        
file modified
+1 -10
file modified
+32 -63
file modified
+28 -38
file modified
+12 -15
file modified
+17 -25
file modified
+39 -53
file modified
+43 -55
file modified
+19 -32
file modified
+31 -52
file modified
+17 -23
file modified
+18 -27
file modified
+11 -9
file modified
+36 -48
file modified
+14 -27
file modified
+14 -17
file modified
+19 -19
file modified
+6 -6
file modified
+24 -30
file modified
+10 -12
file modified
+16 -24
file modified
+7 -6
file modified
+10 -11
file modified
+17 -30
file modified
+21 -25
file modified
+19 -35
file modified
+17 -25
file modified
+32 -39
file modified
+17 -26
file modified
+15 -28
file modified
+48 -70
file modified
+24 -34