Wednesday, April 16, 2008

Beginners AWK programming with examples

AWK derives it name from its creators Aho,Kernighan and Weinberger. Awk has two faces: it is a utility for performing simple text-processing tasks, and it is a programming language for performing complex text-processing tasks.It is also an "interpreted" language -- that is, an Awk program cannot run on its own, it must be executed by the Awk utility itself.

Basic Structire

awk [options] 'pattern action ...' [filenames]

Examples :
awk '/root/' /etc/passwd # root is the pattern here delimited by / & /
awk '{print}' /etc/passwd # prints the whole file

AWK supports multiple pattern action statements ( use shell's multiline capability )

Records and Fields
Each Line is a record.

$0 is the entire record.
$1..$127 are the fields 1 .. 127

Examples :
awk -F: '/root/{print $1}' /etc/passwd # -F specifies the field seperator.
# prints the first field of each entry.

awk -F: '/root/{print $1,$7}' /etc/passwd # prints the 1st and 7th fields
# comma uses OFS which is a space

ls -l | awk '{print $9"\t"$5}'

awk '/^$/ {print "This is a blank line"}
/[a-zA-Z]+/ {print "Alphabets"}
/[0-9]+/ { print "Numerals"}'

What would the output of the below statement ?
awk -F: '/root/{print $ $7}

Arithmatic
Examples :
awk -F: '{print $3,$3+1}' /etc/passwd
awk -F: '{printf("%10s %15s\n",$1,$7)}' /etc/passwd

Note print introduces a newline , but printf dosen't.

Relational Operators ( <,<=,>,>= )
Examples :
awk -F: '$3>500' /etc/passwd
awk -F: '$3==500' /etc/passwd
awk -F: '$3>500 && $3<510' /etc/passwd
awk -F: '$1 == "root" || $1 == "halt"' /etc/passwd

Regular Expression Operators
Regular expressions can also be used in matching expressions.The two operators, `~' and `!~', perform regular expression comparisons. Expressions using these operators can be used as patterns or in if, while, for, and do statements.

Examples :
awk '$1 ~ /^root/' # lines starting with root are printed
awk '$1 !~ /$root/'

Built-In Variables
1. NR ( No. of records processed so far )
NR gives the current line's sequential number.

Examples :
awk '/root/ { print NR,$0}' /etc/passwd # if matches print line no. and line.
awk 'NR>40' /etc/passwd # print from the 41st line
awk 'NR==5 , NR==10 {print NR}' /etc/passwd # print line nos 5 to 10
awk 'NR>5 && NR<10 { print NR}'/etc/passwd # print line no. > 5 and < 10
awk 'NR%2 == 1 { print NR }' /etc/passwd # print odd line numbers

2.FNR
NR counts the lines from the very begining countinuously until the end. FNR restarts the counting at the begining of each input file.

So, for the first file processed they will be equal but on the first line of the second and subsequent files FNR will start from 1 again.

Examples :
awk '{print FNR,$0}' out out1 out2

3.NF ( Contains the no. of fields in the current line/record )
Examples :
awk '{print NF}'
awk 'NF>4' # print lines having > 4 fields

What would the following line output ?
awk '{print $NF}' /etc/passwd

Output Redirection

Examples :
awk '/root { print NR,$0 > "out" }' /etc/passwd # redirects o/p to file named out
ls -l | awk '{print $5 | "sort -rn > sorted" }'
The above calls the sort command and redirects o/o to file sorted.Any external command should always be given in quotes.

ls -l | awk '{print $5 | "sort -nr | uniq "}'
ls -l | awk '{print $5 | "sort -nr | uniq > out"}'

BEGIN & END Blocks

BEGIN and END are special patterns. They are not used to match input records. Rather, they are used for supplying start-up or clean-up information to your awk script. A BEGIN rule is executed, once, before the first input record has been read. An END rule is executed, once, after all the input has been read.An awk program may have multiple BEGIN and/or END rules. They are executed in the order they appear, all the BEGIN rules at start-up and all the END rules at termination.

BEGIN {actions}

-- The body of the AWK script --

END {actions}

Examples :
awk 'BEGIN{FS=":"} { print $1}' /etc/passwd # begin initializes FS to :
awk 'BEGIN{FS=":" ; OFS="+"} {print $1,$7}' /etc/passwd
awk 'BEGIN{FS=":";OFS="+";print "List of users"}{print $1,$7}' /etc/passwd
awk 'BEGIN{print "Welcome"}'
ls -l | awk '{sum=sum+$5} END{print sum}' # sum is accessed as such , not with $.

Built-In AWK Functions

Examples :
awk '{print int($1)}'
awk '{print sqrt($1)}' # square root function
awk '{print length($1}' # length function
awk '{print length}' # prints length of i/p line
awk 'length>60' /etc/passwd
awk 'length>60 { print length,$0}' /etc/passwd

awk print substr($1,3,2)}' # From 3rd char , print 2 chars.
awk '{print substr($1,3,2) > 50'
awk '{print substr($1,3,2) >50 && substr($1,3,2) <60}'

awk '{print toupper($0)}'

No comments: