Wednesday, April 16, 2008

Beginners AWK programming with examples

AWK derives it name from its creators Aho,Kernighan and Weinberger. Awk has two faces: it is a utility for performing simple text-processing tasks, and it is a programming language for performing complex text-processing tasks.It is also an "interpreted" language -- that is, an Awk program cannot run on its own, it must be executed by the Awk utility itself.

Basic Structire

awk [options] 'pattern action ...' [filenames]

Examples :
awk '/root/' /etc/passwd # root is the pattern here delimited by / & /
awk '{print}' /etc/passwd # prints the whole file

AWK supports multiple pattern action statements ( use shell's multiline capability )

Records and Fields
Each Line is a record.

$0 is the entire record.
$1..$127 are the fields 1 .. 127

Examples :
awk -F: '/root/{print $1}' /etc/passwd # -F specifies the field seperator.
# prints the first field of each entry.

awk -F: '/root/{print $1,$7}' /etc/passwd # prints the 1st and 7th fields
# comma uses OFS which is a space

ls -l | awk '{print $9"\t"$5}'

awk '/^$/ {print "This is a blank line"}
/[a-zA-Z]+/ {print "Alphabets"}
/[0-9]+/ { print "Numerals"}'

What would the output of the below statement ?
awk -F: '/root/{print $ $7}

Arithmatic
Examples :
awk -F: '{print $3,$3+1}' /etc/passwd
awk -F: '{printf("%10s %15s\n",$1,$7)}' /etc/passwd

Note print introduces a newline , but printf dosen't.

Relational Operators ( <,<=,>,>= )
Examples :
awk -F: '$3>500' /etc/passwd
awk -F: '$3==500' /etc/passwd
awk -F: '$3>500 && $3<510' /etc/passwd
awk -F: '$1 == "root" || $1 == "halt"' /etc/passwd

Regular Expression Operators
Regular expressions can also be used in matching expressions.The two operators, `~' and `!~', perform regular expression comparisons. Expressions using these operators can be used as patterns or in if, while, for, and do statements.

Examples :
awk '$1 ~ /^root/' # lines starting with root are printed
awk '$1 !~ /$root/'

Built-In Variables
1. NR ( No. of records processed so far )
NR gives the current line's sequential number.

Examples :
awk '/root/ { print NR,$0}' /etc/passwd # if matches print line no. and line.
awk 'NR>40' /etc/passwd # print from the 41st line
awk 'NR==5 , NR==10 {print NR}' /etc/passwd # print line nos 5 to 10
awk 'NR>5 && NR<10 { print NR}'/etc/passwd # print line no. > 5 and < 10
awk 'NR%2 == 1 { print NR }' /etc/passwd # print odd line numbers

2.FNR
NR counts the lines from the very begining countinuously until the end. FNR restarts the counting at the begining of each input file.

So, for the first file processed they will be equal but on the first line of the second and subsequent files FNR will start from 1 again.

Examples :
awk '{print FNR,$0}' out out1 out2

3.NF ( Contains the no. of fields in the current line/record )
Examples :
awk '{print NF}'
awk 'NF>4' # print lines having > 4 fields

What would the following line output ?
awk '{print $NF}' /etc/passwd

Output Redirection

Examples :
awk '/root { print NR,$0 > "out" }' /etc/passwd # redirects o/p to file named out
ls -l | awk '{print $5 | "sort -rn > sorted" }'
The above calls the sort command and redirects o/o to file sorted.Any external command should always be given in quotes.

ls -l | awk '{print $5 | "sort -nr | uniq "}'
ls -l | awk '{print $5 | "sort -nr | uniq > out"}'

BEGIN & END Blocks

BEGIN and END are special patterns. They are not used to match input records. Rather, they are used for supplying start-up or clean-up information to your awk script. A BEGIN rule is executed, once, before the first input record has been read. An END rule is executed, once, after all the input has been read.An awk program may have multiple BEGIN and/or END rules. They are executed in the order they appear, all the BEGIN rules at start-up and all the END rules at termination.

BEGIN {actions}

-- The body of the AWK script --

END {actions}

Examples :
awk 'BEGIN{FS=":"} { print $1}' /etc/passwd # begin initializes FS to :
awk 'BEGIN{FS=":" ; OFS="+"} {print $1,$7}' /etc/passwd
awk 'BEGIN{FS=":";OFS="+";print "List of users"}{print $1,$7}' /etc/passwd
awk 'BEGIN{print "Welcome"}'
ls -l | awk '{sum=sum+$5} END{print sum}' # sum is accessed as such , not with $.

Built-In AWK Functions

Examples :
awk '{print int($1)}'
awk '{print sqrt($1)}' # square root function
awk '{print length($1}' # length function
awk '{print length}' # prints length of i/p line
awk 'length>60' /etc/passwd
awk 'length>60 { print length,$0}' /etc/passwd

awk print substr($1,3,2)}' # From 3rd char , print 2 chars.
awk '{print substr($1,3,2) > 50'
awk '{print substr($1,3,2) >50 && substr($1,3,2) <60}'

awk '{print toupper($0)}'

Tuesday, April 15, 2008

Perl arrays

An array in perl is an ordered collection of scalar items.While scalar data (single pieces of data) use the $ sign, arrays use the @ symbol in perl.Array indices are whole numbers and the first index is 0.

There are 3 distict characteristics for arrays in perl :

1. Perl supports only single dimentional arrays.
2. Array size cannot be fixed.
3. Collection of data items of any types.

Examples :
@strn=("abc",34,56.7,"hello"); # Declares and initialises an array
print @strn; # Print all the elements

What does the below code fragment do ?
$x=("abc",34,56.7,"hello");

Since we are assigning a list to a scalar,it takes the last value , ie, "hello".

PS : For the difference between arrays and lists , see here.

The syntax that is used to access arrays is closer to arrays in C. In fact, one can often treat Perl's arrays as if they were simply C arrays, but they are actually much more powerful than that.

$, is a global variable and is called the field seperator.By default its not set to anything.Therefore print @string statement above prints all the elements without any spaces.Now , we can use the field seperator variable to our own type of seperator.

Examples :
$,=" ";
print @strn;

What do you think the below code fragment should output ?
$,=":";
print "value of $x is ",$x,"\n" ;

Some more special global variables :
$# Gives the size of the array.
$" Special variable used when printing an array . Default is a space.
$\ Output record seperator.Default is nothing.
$/ Input record seperator.Default is \n.

Below examples show how the size of an array is referenced
$s=@strn; # Assigning the array to a scalar gives the no. of elements of the array.
print @strn>5 ; # In an scaler context we compare with the size of the array.
print scalar @strn; # Explicitly request the size of an array.
print $#strn; # Returns the last index no.

Note we can have a scalar and an array with the same name.
Examples :
$strn=44;
print $strn[0]; # The square bracket differentiates it to be an array.
$strn[100]="rrr"; # Now the array size is 101.All the uninitialized values are 0.
# Array elements beyond the array size is undef/null.
$#strn=5; # Reduces the size of the array.
print @strn[0,4,2]; # prints 0th , 4th and 2nd element.

.. is the range operator.Range should always be positive.
Examples :
@strn[11..15]=(45,6,7,8,9); # truncates any additional values given.
print $strn[-1]; # -1 is the last index no.
print $strn[-2]; # -2 is second last index and so on.

Build-In Array functions :

1. Push ( push array,list of elements )
Push 1 or more elements.Push returns size of the new array.

Examples :
@n=qw(a b c d e f); # qw stands for quote words.
push @n,"56",33,"aa";
print push @n,"ui","ll"; # prints the size of the new array.
print push @n; # returns the size of array.
print @n;

@n=("hello","world");
is the same as
@n=qw(hello world);

2. Pop ( pop arrayname )
Removes the last element of an array and decrease the size of array.
Returns the element removed.

Examples :
$\="\n";
print @ARGV;
pop; # pop looks into @ARGV & removes the last element.
print @ARGV;

3. Unshift ( unshift arrayname,list of elements )
Adds the elements at the begining of array ( the opposite of push )

Examples :
unshift @n,"first",second";
print @n;

4. Shift ( shift arrayname )
Same as pop , but removes the first element.

Examples :
my @numbers = (1 .. 10);
while(scalar(@numbers) > 0)
{
my $i = shift(@numbers);
print $i, "\n";
}

5. Splice ( splice arr,startindex,no. of elem to be removed,list of elem to add )
Overwrite/Append anywhere in an array.

Examples :
@cities=("bang","hyd","mum","chn");
splice @cities,2,1,"mys";
print "@cities";

splice @cities,0,0,"mum","sri","bhu"; # appends at the begining.
print "@cities";

splice @cities,1,2; # remove 2 elements begining at index 1.Index starts at 0.
splice @cities,3; # removes all the elements starting from the 3rd index.
splice @cities; # deletes all the elements.


6. Sort ( sort arrayname )
Sort the array elements by ASCII ascending order ( default ).This dosen't modify the array,returns a new sorted array.By default the array elements are compared with the string comparision operator.

Examples :
$,=" ";
@cities=("bang","hyd","mum");
print sort @cities; # prints a ascii sorted list with a space inbetween.
print @cities;

@cities=sort @cities; # overwrites the array with the sorted array
print @cities;

Below examples show how to do numeric comparisions.
Examples :
@nn=(45,67,1,11,20,30);
print sort @nn; # o/p 1,11,20,30,45,67 ( ascii sort ).
print sort{$a <=> $b} @nn; # ascending order . Remember this construct.
print @nn;
print sort{$b <=> $a} @nn; # descending order. Remember this construct.
print sort{$b cmp $a} @cities; # string (ascii)comparision in descending order.

7. Reverse ( reverse arrayname )
Reverse the array elements.Dosen't modify the array.

Examples :
print reverse @cities; # prints reverse.
print reverse sort @cities; # descending order.

8. split ( split ,string )
Returns an array splitting on a character or string.

Examples :
$s="Hello:world::perl";
@arr2=split(m/:+/,$s); # m stands for match.
# The contents between / / is the regex pattern
# $s is the string to be searched.

9. Join ( Join char/string,string )
Its the opposite of split.Returns a string.

Examples :
$st=join "-",@cities;
print $st;
print join "\n",@cities;

10. Delete ( delete array )
Deletes any element of an array.Deleting an element other than the last element of the array dosen't change the size of the array , else it changes.

Examples :
delete $cities[1];
print "@cities";
print scalar @cities;
delete $cities[$#cities];
print scalar @cities;
print "@cities";