#1045 start.fp.o parses the Magazine RSS feed incorrectly when an article has embedded video
Closed: Fixed 3 years ago by bcotton. Opened 3 years ago by bcotton.

The magfeed.py script blindly takes the first <enclosure> element to use as the thumbnail for start.fedoraproject.org. This is fine in most cases, but if the article has embedded video, we end up with a blank thumbnail. The RSS feed in that case contains multiple enclosure elements and the featured image is listed last.

<enclosure url="https://fedoramagazine.org/wp-content/uploads/2020/06/Screencast-from-25-06-20-100214.webm" length="410087" type="video/webm"/>
<enclosure url="https://fedoramagazine.org/wp-content/uploads/2020/06/Screencast-from-25-06-20-100811.webm" length="388173" type="video/webm"/>
<enclosure url="https://fedoramagazine.org/wp-content/uploads/2020/06/Screencast-from-24-06-20-234555.webm" length="398605" type="video/webm"/>
<enclosure url="https://fedoramagazine.org/wp-content/uploads/2020/06/Screencast-from-25-06-20-113943.webm" length="1964954" type="video/webm"/>
<enclosure url="https://fedoramagazine.org/wp-content/uploads/2020/07/networkansible-300x127.png" length="36634" type="image/jpg"/>

We want it to only use enclosures with a type of "image/jpg" (note that the RSS feed uses this type for both jpg and png images) to prevent this.


Hello, I am new to open source contribution, and I would like to work on solving this issue.

Metadata Update from @bcotton:
- Issue assigned to jakfrost

3 years ago

Beginning with this morning's article, the magfeed script is failing on parsing the Fedora Magazine RSS feed.

start.fedoraproject.org build failed
====================================
python2.7 /srv/web/fedora-websites/start.fedoraproject.org/../build.d/build.py -o out -s static -b /
Caching Magazine Feed...
python2.7 /srv/web/fedora-websites/start.fedoraproject.org/build/magfeed.py -o out -s static -b /
Traceback (most recent call last):
  File "/srv/web/fedora-websites/start.fedoraproject.org/build/magfeed.py", line 25, in <module>
    if image_enclosure['type'] == 'image/jpg':
  File "/usr/lib/python2.7/site-packages/feedparser.py", line 375, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'type'
make: *** [rss-cache] Error 1

The RSS output contains an audio/ogg enclosure, which is maybe causing the problem? Or is maybe a red herring?

<enclosure url="https://gathman.org/music/ogg/LUX%20MIX_1.ogg" length="2191823" type="audio/ogg"/>

The line in question is:

            if image_enclosure['type'] == 'image/jpg':
                break

Guessing we can probably replace it with:

            if image_enclosure.get('type') == 'image/jpg':
                break

Made the change suggested by @codeblock. I can't reproduce the failure locally, so we'll test it in production. 🤷

Looks like that fixed the glitch! Closing this issue

Metadata Update from @bcotton:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.

Metadata