#425 Create a link checker script
Closed: Won't fix / Can't fix 3 years ago Opened 7 years ago by shaiton.

We need a way to check the links on our websites.
It could be in ~/tools/ and use http://search.cpan.org/dist/W3C-LinkChecker/

First of all, we should replace all download.fpo urls by dl.fpo to avoid mirrors issue.

That will help us checking external links, but also internal ones that change for each new releases (checksums, iso..)

I will have a look at this one.

Nice, ask if you have any questions.
My original idea was to check a local build, or even better the staging websites.

When I wrote "replace download.fpo by dl.fpo" it is just during the check of course. Not for the website itself.

I am sure I will have lots of questions, so nice to have your help and support.

I've been thinking about this script and lots of questions popped up. The way I see it this script should go through every directory in fedora-web repo. In every directory it should parse html files and check if links are valid or not. When it checks every directory it should write a report and/or email it.

Shaiton is that what you had in mind?
Is it ok if I don't use linkchecker and just write a simple python script?

Could you explain what you meant by: "My original idea was to check a local build, or even better the staging websites." Basically I would like to know where do I find local build and where do I find staging websites.

Talked to Robyduck so things are cleared now. He answered all my questions for the moment.

With robert we checked today links. Unless third party websites, we should be all good for the Release.

This feature is still needed of course. Just for the record, I created the 'tt' file (see http://ur1.ca/g7l2l) with few grep, sed and vim commands.

Then, just ran

for i incat tt; do curl -o /dev/null --silent --head --write-out '%{http_code}\n' $i |xargs ; [[ $1 != '200' ]] && echo $i ; echo "";

Probably I will use the above program to check links. So I hope it doesn't bother you if it isn't pure python. I think this way it will be easier to maintain and change the script. I will let you know what happens.

Managed to install checklink program and ran it a few times. But I am wondering what exactly would you like to see in the output of a script? Checklink produces quite a lot of output. And what do we do with the output, does it get logged, emailed,...?

The Python utility 'linkchecker' works for this purpose, and is packaged in Fedora. I'm currently working on an implementation using that:

Fixing Makefile.in so the 'make linktest' rule runs the test. Currently the screen output is fairly verbose, and includes warnings. This also generates a file linkchecker-out.csv.

I'm also putting together a 'linktest.py' script that parses the CSV and only outputs relevant values for the rule above. A tester can easily load the CSV into e.g. LibreOffice to see a lot more detail if needed, and there will be output to this effect too.

Any progress here? I like the idea to run a make linkcheck, but we cannot check all links at any time. Links on dl.fp.o are often slightly different to the final download.fp.o URL.

Metadata Update from @robyduck:
- Issue close_status updated to: None
- Issue set to the milestone: None (was: ASAP)

4 years ago

Metadata Update from @robyduck:
- Issue tagged with: tools

4 years ago

@pfrields is this something you still see value in?

@robyduck and I were discussing this in IRC, and we felt this tool was more important in the old fedoraproject.org when there were a lot more links to Docs that could have broken.

I'm thinking I agree with @robyduck and @ryanlerch on this. It has been several years and there's really been no traction on this.

We could locally run linkchecker manually too.

This seemed to work for me

$ make en test
$ linkchecker http://localhost:5000

Closing this unless someone feels strongly otherwise.

Metadata Update from @sijis:
- Issue close_status updated to: Won't fix / Can't fix
- Issue status updated to: Closed (was: Open)

3 years ago

Login to comment on this ticket.