I was trying to explain to a colleague a few days ago how a few shell commands can be really useful, when today I came across an example to try to illustrate. My problem was that I had 245 log files each about 70-80MB in size – roughly 4 million lines in each log file. Each line in the log file uses the following (squid) format:

Now my problem was that I wanted to examine or graph the number of unique IP addresses seen in each log file per day to give me a rough idea of how many computers have been using the service each day. The reasoning is that I want to check the effect of new computer deployments.

So to get the number of distinct IP addresses per day – a simple shell script and I have csv values I can import into a spreadsheet to graph.

So I reckon it would be hard to find a quicker, friendlier way to solve that problem.

