#12050 developer sync broken with move to rhel9
Closed: Fixed with Explanation 6 months ago by kevin. Opened 6 months ago by kevin.

From cron emails:

Subject: Cron <apache@sundries01> /usr/local/bin/lock-wrapper syncDeveloper /usr/local/bin/syncDeveloper

Traceback (most recent call last):
  File "/usr/local/bin/rss.py", line 12, in <module>
    feedparser._HTMLSanitizer.unacceptable_elements_with_end_tag.add('<div>')
AttributeError: module 'feedparser' has no attribute '_HTMLSanitizer'

Looks like the class has been moved to feedparser.sanitizers.HTMLSanitizer (class, attribute).

interestingly enough, in the 6.0.z line, it is still _HTMLSanitizer, so the appropriate path should be feedparser.sanitizer._HTMLSanitizer

Making the PR now

Metadata Update from @pcreech17:
- Issue assigned to pcreech17

6 months ago

Metadata Update from @phsmoura:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-gain, low-trouble, ops

6 months ago

That fixed the traceback... now it seems to be outputting:

AttributeError. Going to next item
AttributeError. Going to next item

There seems to be some issues with the feed, where some elements do not contain a 'description'

https://fedoraplanet.org/rss20.xml

It is erroring out when attempting to access the description of the feed item, then proceeds to the next item. Looks like this is by design of the original script, and it outputs this message then continues to the next item in the list.

        try:
            article_desc = '\n'.join(item.description.split('\n')[1:])
            # remove html tags from description
            article_desc = re.sub('<[^<]+?>', '', article_desc)
            article_desc = re.sub('<', '&lt;', article_desc)
            article_desc = re.sub('>', '&gt;', article_desc)
            if len(article_desc) > 140:
                article_desc = ' '.join(article_desc.split()[0:25]) + '...'
            if not article_desc.startswith('<p>'):
                article_desc = '<p>%s</p>' % article_desc
        except ex as AttributeError:
            print ('AttributeError. Going to next item')
            continue

An example of a bad entry in the list:

<item>
<title>Huiren Woo: Failed to download metadata for repo ‘fedora-cisco-openh264’: GPG verification is enabled, but GPG signature is not available</title>
<guid isPermaLink="false">https://woohuiren.me/blog/?p=905</guid>
<link>https://woohuiren.me/blog/failed-to-download-metadata-for-repo-fedora-cisco-openh264-gpg-verification-is-enabled-but-gpg-signature-is-not-available/</link>
<pubDate>2024-07-11T18:21:51+00:00</pubDate>
</item>

You can see this has no item.description value, and thus is not considered a valid entry for the sake of the script.

I can modify the code to use the title instead of description in this case, or we can let it remain as-is.

Hum... where does planet fit in here? This should be making a rss feed from developer.fedoraproject.org/index.html right? or am I confused?

The script, rss.py, will pull the feed from that above url on fedoraplanet.org, based on this line and this line in conjunction with each other, and update the section of the developer.fedoraproject.org/index.html page between these two comment entries

<!-- BLOG_HEADLINES_START -->
...
<!-- BLOG_HEADLINES_END -->

With the recent entries it finds. It does this as part of the syncDeveloper.sh script here

ah! I missed that linkage there... ok.

I guess using title in this case makes sense?

It does to me, will get the PR to use title for description up soon enough!

Seems like everything is working now!

Thanks!

Metadata Update from @kevin:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

6 months ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog