Home

Language Translator

Hacking Zone

Hacking Tools
Attacking

Configure Windows

Windows Configuration

Mix Tutorials

Asterisk
Website Building

Novels

Mix Novels

Human Personality

Body Language

Login Form






Lost Password?
No account yet? Register
Awk Tutorial Part 1 Print E-mail
Article Index
Awk Tutorial Part 1
Page 2

Awk Tutorial Part 1

 

              If you are system administrator or developer, you need to process log files to have a better grasp of situation. Many people use Perl or Python to help with this task. However, many times using one of the P languages is overkill. Furthermore, every single day I am on a machine that I cannot make changes to and thus cannot use my helper script. However, awk has the tools available to solve most on-the-fly log processing problems, directly from the command line. In addition, awk can provide a more concise and faster solution the the pipeline of cut, grep, sort, and other commands you are currently using.
 

In this article, this is the format of the file I am working with:

$ tail -n 1 access_log-2008-01
1.1.1.1 - - [10/Jan/2008:17:26:51 -0600] "GET / HTTP/1.1" 200 38856

Basically what we have here is: ip address, date, request, response code, response size. (Ignoring the dashes after the ip address.)

How would you find the largest response sent by your HTTP server? My typical solution has always been:

$ awk '{print $NF}' access_log-2008-01 | egrep -v '\-'  | sort -n | tail -n 1
10678272

However, there is clearly a better solution.

By default, awk splits input lines by spaces, and assigns the entire line to $0, each field to $n, and the number of fields to NF. See this example:

$ echo a b c d e f | awk '{print $0}'
a b c d e f
$ echo a b c d e f | awk '{print $1}'
a
$ echo a b c d e f | awk '{print $2}'
b
$ echo a b c d e f | awk '{print NF}'
6

Note that you can print the last field by saying print the (NF)’s variable:

$ echo a b c d e f | awk '{print $NF}'
f

Or print the second variable from the end:

$ echo a b c d e f | awk '{print $(NF-1)}'
e

Look at my example again:

$ awk '{print $NF}' access_log-2008-01 | egrep -v '\-'  | sort -n | tail -n 1
10678272

That solution starts three processes and filters the data three times. That is exceedingly inefficient! How about this:

$ awk '{if ($NF > max) { max = $NF;}} END {print max}' access_log-2008-01
10678272

This starts one process and filters the data only one time. That command in English says: For each line, if the last field is greater than the max, set it to the variable “max”. Once we have processed all the lines, print the variable max.

Which command do you suppose is faster?

 $ time awk '{print $NF}' access_log-2008-01 | egrep -v '\-'  | sort -n | tail -n 1
10678272
real    0m1.107s
user 0m1.070s
sys 0m0.037s
$ time awk '{if ($NF > max) { max = $NF;}} END {print max}' access_log-2008-01
10678272
real    0m0.207s
user 0m0.194s
sys 0m0.012s


 
< Prev   Next >
Your Ad Here

RSS socialnet

Add to MyYahoo!
Subscribe in NewsGator Online
Add to Newsburst
Add to Google
Add to My AOL
Add to Pluck
Subscribe in FeedLounge
Add to Windows Live
Add to NetVibes
Subscribe in Rojo
Subscribe in Bloglines
Add to MyMSN
Add to Plusmo for your cellphone
Add to PageFlakes
Add to Technorati
Add to BlinkBits
Professionelle Webseitenoptimierung für Ihre Webseite.