Finding broken links in a website
Websites must be tested for broken links. It's not feasible to do this manually for large websites. Luckily, this is an easy task to automate. We can find the broken links with HTTP manipulation tools.
Getting ready
We can use lynx
and curl
to identify the links and find broken ones. Lynx has the -traversal
option, which recursively visits pages on the website and builds a list of all hyperlinks. cURL is used to verify each of the links.
How to do it...
This script uses lynx
and curl
to find the broken links on a web page:
#!/bin/bash #Filename: find_broken.sh #Desc: Find broken links in a website if [ $# -ne 1 ]; then echo -e "$Usage: $0 URL\n" exit 1; fi echo Broken links: mkdir /tmp/$$.lynx cd /tmp/$$.lynx lynx -traversal $1 > /dev/null count=0; sort -u reject.dat > links.txt while read link; do output=`curl -I $link -s \ | grep -e "HTTP/.*OK" -e "HTTP/.*200"` if [[ -z $output ]]; then output=`curl...