Finding broken links in a website
Websites must be tested for broken links. It's not feasible to do this manually for large websites. Luckily, this is an easy task to automate. We can find the broken links with HTTP manipulation tools.
Getting ready
We can use lynx and curl to identify the links and find broken ones. Lynx has the -traversal option, which recursively visits pages on the website and builds a list of all hyperlinks. cURL is used to verify each of the links.
How to do it...
This script uses lynx and curl to find the broken links on a web page:
#!/bin/bash
#Filename: find_broken.sh
#Desc: Find broken links in a website
if [ $# -ne 1 ];
then
echo -e "$Usage: $0 URL\n"
exit 1;
fi
echo Broken links:
mkdir /tmp/$$.lynx
cd /tmp/$$.lynx
lynx -traversal $1 > /dev/null
count=0;
sort -u reject.dat > links.txt
while read link;
do
output=`curl -I $link -s \
| grep -e "HTTP/.*OK" -e "HTTP/.*200"`
if [[ -z $output ]];
then
output=`curl...