Awk Tutorial Part 2 Today I was able to meet Bryan of Guru Labs. During our conversation he posed the following question. “Find the 3rd field in a file consisting of space separated fields, the first being an ip address, in the range 192.168.1-2.1-255. There maybe lines in the file containing invalid ip addresses.”
I used grep to find the lines and then used awk to find the field. For example: $ egrep '^192\.168\.[1-2]\.([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|2[0-5]{2})' access_log | \ awk '{print $(NF-1)}' 200 304 304 ... He pointed out that, while this works, there is no reason to invoke grep. He is certainly correct. Indeed, awk is all powerful! The default usage of awk is: awk 'pattern { command }' In its most common and simple usage, to print a field deliminated by spaces: awk '{print $3}' You are specifying no pattern, which matches every line. When solving the problem posed by Bryan, simply specify the pattern and eliminate grep from the pipe line. Here is the equivalent awk command: $ awk '/^192\.168\.[1-2]\.([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|2[0-5]{2})/ {print $(NF-1)}' \ access_log 200 304 304 ... Awk has some extremely powerful selecting operators. Here I am using the ~ operator to match the third field from the right (resource), to ^/man, and printing the matched field: $ awk '$(NF-3) ~ /^\/man/ {print $(NF-3)}' access_log /man/cmd/info /man/cmd/Mail /man/s/Z /man/cmd/mv ... This invocation uses the !~ operator, to match lines where the resource does not match the pattern ^/man: $ awk '$(NF-3) !~ /^\/man/ {print $(NF-3)}' access_log /feed/ /feed /robots.txt /10-linux-commands-youve-never-used.html ... Here I am selecting lines where the response code $(NF-1) is greater than or equal to 200, but less than 400 and printing the resource and response code. I use awk’s boolean “and” operator && to perform this operation: $ awk '$(NF-1) >= 200 && $(NF-1) <= 399 {print $(NF-3), $(NF-1)}' access_log /man/cmd/info 200 /feed/ 304 /feed 304 ... The following example uses the boolean “or” operator || to print lines where there resource matches ^/feed or ^/sitemap: $ awk '$(NF-3) ~ /^\/feed/ || $(NF-3) ~ /^\/sitemap/ {print $0}' access_log 192.168.1.2 - - [01/Jan/2008:00:00:31 -0600] "GET /feed/ HTTP/1.1" 304 - 192.168.1.3 - - [01/Jan/2008:00:01:09 -0600] "GET /feed HTTP/1.1" 304 - ...
|