shapeshed

Ruby & JavaScript Hacker

Nifty Unix Tools - Wc

Word Count

Text files have lots of statitics that can be really useful when you want to extract information from them. With the word count tool it is possible to access

  • The number of bytes
  • The number of lines
  • The number of characters
  • The number of words
1
wc /usr/share/dict/words

This gives us

1
234936  234936 2486813 /usr/share/dict/words

The output is [number of lines] [number of words] [number of bytes] [filename].

We can access each of these statistics individually by passing an option to the command.

1
2
3
4
5
6
# The number of bytes
wc -c /usr/share/dict/words
# The number of lines
wc -l /usr/share/dict/words
# The number of words
wc -w /usr/share/dict/words

Piping is powerful

The wc command really starts to become useful when it is piped to other commands. Here’s an example we have 5 csv files that are full of data. We want to find out a sum of how many records there are in all five files. We can do this easily by piping the output of the cat command to wc.

1
2
cat *.csv | wc -l
1866

Done - we have 1866 records across the 5 files.

Another example might be looking for the number of occurences of a word or pattern in a file. We can combine grep with wc to achieve this.

1
2
grep "union" /usr/share/dict/words | wc -l
41

wc is a really useful tool for summing up results and gathering information on text files or streams.

Further reading