Finding the frequency of words used in a given file
Finding the frequency of words used in a file is an interesting exercise to apply the text-processing skills. It can be done in many different ways. Let's see how to do it.
Getting ready
We can use associative arrays, awk, sed, grep, and so on, to solve this problem in different ways. Words
are alphabetic characters, delimited by space or a period. First, we should parse all the words in a given file and then the count of each word needs to be found. Words can be parsed by using regex with any of the tools, such as sed, awk, or grep.
How to do it...
We just saw the logic and ideas about the solution; now let's create the shell script as follows:
#!/bin/bash
#Name: word_freq.sh
#Desc: Find out frequency of words in a file
if [ $# -ne 1 ];
then
echo "Usage: $0 filename";
exit -1
fi
filename=$1
egrep -o "\b[[:alpha:]]+\b" $filename | \
awk '{ count[$0]++ }
END{ printf("%-14s%s\n","Word","Count") ;
for(ind in count)
{ printf("%-14s%d\n...