Techno Overload

Thursday, July 9, 2009

Determining if your kernel and hardware is 32bit or 64bit on Unix environments

HP UNIX

This technote explains how to establish if an HP-UX® 11.x kernel is 32-bit or 64-bit capable.

Run getconf KERNEL_BITS on the system in question. The output, either "32" or "64", corresponds to 32-bit or 64-bit kernels, respectively.

# getconf KERNEL_BITS
64

Check the vmunix file for the following entries:

# file /stand/vmunix
/stand/vmunix: PA-RISC1.1 executable ---> 32-bit
/stand/vmunix: ELF-64 executable object file ---> 64-bit

This will tell you if your currently running kernel is 64 bits or 32 bits.It returns the number of bits used by the kernel for pointer and long data types.

# getconf KERNEL_BITS
64

Returns which kernel is supported on the hardware.

# getconf HW_32_64_CAPABLE
1

This will show you if the CPU’s are capable of running 32, 64, or 32/64 bit kernels.

# getconf HW_CPU_SUPP_BITS
64

SOLARIS

The easiest way to determine which version is running on your system is to use the isainfo command. This new command prints information about the application environments supported on the system.

The following is an example of the isainfo command executed on an UltraSPARC™ system running the 64-bit operating system:

% isainfo -v
64-bit sparcv9 applications
32-bit sparc applications

One useful option of the isainfo(1) command is the -n option, which prints the native instruction set of the running platform:

% isainfo -n
sparcv9

The -b option prints the number of bits in the address space ( cpu’s bit size capabilities ) of the corresponding native applications environment :

% isainfo -b
64

% echo "Welcome to "`isainfo -b`"-bit Solaris"
Welcome to 64-bit Solaris

A related command, isalist(1), that is more suited for use in shell scripts, can be used to print the complete list of supported instruction sets on the platform. Some of the instruction set architectures listed by isalist are highly platform specific, while isainfo(1) describes only the attributes of the most portable application environments on the system. Both commands are built on the SI_ISALIST suboption of the sysinfo(2) system call. See isalist(5) for further details.

The following is an example of the isalist command executed on an UltraSPARC system running the 64-bit operating system:

% isalist
sparcv9+vis sparcv9 sparcv8plus+vis sparcv8plus sparcv8
sparcv8-fsmuld sparcv7 sparc

AIX

For AIX, we will use the bootinfo command . The below command's shows if the hardware is 32 bit or 64 capable.

# bootinfo -y
64

# getconf HARDWARE_BITMODE
64

# prtconf -c
CPU Type: 64-bit

Below commands show the running kernel’s bit size :

# bootinfo -K
64

# prtconf -k
Kernel Type: 64-bit

# getconf KERNEL_BITMODE
64

LINUX

For linux, we will look at the cpuinfo from /proc. Here, we are mainly interested in the “flags” for the CPU’s:

# cat /proc/cpuinfo | grep -i flags
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm

We are interested in three values in the output, as they indicate the bit size capabilities of the CPU:

16 Bit = rm (Real Mode)
32 Bit = tm (Transparent Mode)
64 Bit = lm (Long Mode)

This doesn’t necessarily mean the Motherboard is capable of 64 bit.

To determine the bit size of your running kernel, you can also use getconf, similar to HPUX, to find this info:

# getconf LONG_BIT
64

This shows that my kernel is running 64 bit.

Tuesday, April 14, 2009

Create shared library on Unix

We will see how we can build dynamic libraries on different Unix flavours.

Apple Mac OS X

$ gcc -arch x86_64 -fno-common -c source.c
$ gcc -arch x86_64 -fno-common -c code.c
$ gcc -dynamiclib -flat_namespace -undefined suppress -install_name /usr/local/lib/libfoo.2.dylib
-o libfoo.2.4.5.dylib source.o code.o

-dynamiclib
When passed this option, GCC will produce a dynamic library instead of an executable when linking,
using the Darwin libtool command.

-arch arch
Compile for the specified target architecture arch. The allowable values are i386,x86_64,ppc and ppc64.

-flat_namespace
Use a single level address space for name resolution and done for al Unixes.

-undefined suppress
Supress undefined symbols.It will get resolved later from dependent libraries.

GNU Linux

gcc -m64 -fPIC -g -c -Wall a.c
gcc -m64 -fPIC -g -c -Wall b.c
gcc -m64 -shared -Wl,-soname,libmystuff.so.1 -o libmystuff.so.1.0.1 a.o b.o -lc

-fpic/-fPIC
Generate position-independent code ( PIC ) suitable for use in a shared library.

-shared
Produce a shared object which can then be linked with other objects to form an executable.

-m32/-m64
Generate code for 32-bit or 64-bit environments

HP HP-UX

cc +DD64 -Aa -c +Z length.c volume.c mass.c ( 64-bit )
ld -b -o libunits.sl length.o volume.o mass.o

-Amode
Specify the compilation standard to be used by the compiler.
a
Compile under ANSI mode

+z,+Z
Both of these options cause the compiler to generate position independent code (PIC) in 32/64-bit respectively.

+DD64
Recommended option for compiling in 64-bit mode on either Itanium-based or PA-RISC 2.0 architecture. The macros __LP64__ and (on PA platforms) _PA_RISC2_0 are #defined.

+DD32
Compiles in 32-bit mode and on PA systems creates code compatible with PA-RISC 1.1 architectures. (Same as +DA1.1 and +DAportable.)

+DA2.0W
Compiles in 64-bit mode for the PA-RISC 2.0 architecture. The macros __LP64__ and _PA_RISC2_0 are #defined.

+DA2.0N
Compiles in 32-bit mode (narrow mode) for the PA-RISC 2.0 architecture. The macro _PA_RISC2_0 is #defined. +DA options are not supported on Itanium-based platforms.

SUN SOLARIS

cc -xarch=v9 -Kpic -c a.c
cc -xarch=v9 -Kpic -c b.c
ld -G -o outputfile.so a.o b.o

-Kpic/-KPIC
Generate position-independent code for use in shared libs.

-G
Produce a shared object rather than a dynamically linked executable.

-xarch=v9
Specifies compiling for a 64-bit Solaris OS on SPARC platform.

-xarch=amd64
Specifies compilation for the 64-bit AMD instruction set.The C compiler from studio 10 onwards predefines __amd64 and __x86_64 when you specify -xarch=amd64.

Links:
Using static and shared libraries across platforms

Shared Libraries (HP-UX)

Thursday, January 15, 2009

FD Passing with Unix Domain Sockets

Unix domain sockets are two-way local inter-process communication mechanism through the socket interfaces.The protocol family is AF_UNIX/AF_LOCAL/PF_UNIX/PF_LOCAL.It supports both SOCK_STREAM & SOCK_DATA mode of communication.

SOCK_STREAM unix domain sockets can also be used to pass ancillary/control information including the passing of open file descriptors from one process to another.Any valid descriptor can be passed.File descriptors are transferred between separate processes across a UNIX domain socket using the sendmsg() and recvmsg() functions.Both of these system calls pass a struct msghdr to minimize the number of directly supplied arguments.

The structure hs the below form :
struct msghdr {
void *msg_name; /* optional address */
socklen_t msg_namelen; /* size of address */
struct iovec *msg_iov; /* scatter/gather array */
int msg_iovlen; /* # elements in msg_iov */
void *msg_control; /* ancillary data, see below */
socklen_t msg_controllen; /* ancillary data buffer len */
int msg_flags; /* flags on received message */
};

msg_name -> destination address ( specified for un-connected sockets )
msg_namelen -> length of the address specified in msg_name

msg_iov -> scatter/gather buffer address
msg_iovlen -> Number of scatter/gather ( struct iov ) elements specified

msg_control -> pointer to ancillary/control header & data
msg_controllen -> total length of the control header & data's.

msg_flags -> flags on received message

The control message header declared as below :
struct cmsghdr {
u_int cmsg_len; /* data byte count, including hdr */
int cmsg_level; /* originating protocol */
int cmsg_type; /* protocol-specific type */
/* followed by u_char cmsg_data[]; */
};

cmsg_len -> No. of bytes ( header + data )
cmsg_level -> Originating protocol
cmsg_type -> Protocol specific type

As shown in this definition, normally there is no member with the name cmsg_data[]. Instead, the data portion is accessed using the CMSG_xxx() macros, as described shortly.Nevertheless, it is common to refer to the cmsg_data[] member.

When ancillary data is sent or received, any number of ancillary data objects can be specified by the msg_control and msg_controllen members of the msghdr structure, because each object is preceded by a cmsghdr structure defining the object's length (the cmsg_len member).

CMSG_LEN
unsigned int CMSG_LEN(unsigned int length);

Given the length of an ancillary data object, CMSG_LEN() returns the value to store in the cmsg_len member of the cmsghdr structure, taking into account any padding
needed to satisfy alignment requirements.

One possible implementation could be:
#define CMSG_LEN(length) ( ALIGN(sizeof(struct cmsghdr)) + length )

CMSG_SPACE
unsigned int CMSG_SPACE(unsigned int length);

Given the length of an ancillary data object, CMSG_SPACE() returns the space required by the object and its cmsghdr structure, including any padding needed to satisfy alignment requirements.This macro can be used, for example, to allocate space dynamically for the ancillary data.This macro should not be used to initialize the cmsg_len member of a cmsghdr structure,instead use the CMSG_LEN() macro.

One possible implementation could be:
#define CMSG_SPACE(length) ( ALIGN(sizeof(struct cmsghdr)) + \
ALIGN(length) )

Note the difference between CMSG_SPACE() and CMSG_LEN(), shown also in the figure in Section 4.2: the former accounts for any required padding at the end of the ancillary data object and the latter is the actual length to store in the cmsg_len member of the ancillary data object.

CMSG_FIRSTHDR
struct cmsghdr *CMSG_FIRSTHDR(const struct msghdr *mhdr);

CMSG_FIRSTHDR() returns a pointer to the first cmsghdr structure in the msghdr structure pointed to by mhdr.The macro returns NULL if there is no ancillary data pointed to the by msghdr structure (that is, if either msg_control is NULL or if msg_controllen is less than the size of a cmsghdr structure).

We provide a server and client source examples to show how descriptor passing works.

server.c

#define UDS "domain_socket"

int send_connection(int fd,int sockfd)
{
struct msghdr msg; /* message header */
struct iovec iov; /* scatter/gather buffer */
char b='b';
int rc;
/* Control Message header */
union
{
struct cmsghdr cm; /* For alignment */
char control[CMSG_SPACE(sizeof(int))];
} control_un;
struct cmsghdr *cmptr;

msg.msg_control = control_un.control;
msg.msg_controllen = sizeof(control_un.control);

/* Populate the control info */
cmptr = CMSG_FIRSTHDR(&msg);
cmptr->cmsg_len = CMSG_LEN(sizeof (int));
cmptr->cmsg_type = SCM_RIGHTS;
cmptr->cmsg_level = SOL_SOCKET;
*((int *) CMSG_DATA(cmptr)) = fd; /* fd being passed here */

msg.msg_name = (caddr_t) NULL;
msg.msg_namelen = 0;

iov.iov_base = &b;
iov.iov_len = 1;
msg.msg_iov = &iov;
msg.msg_iovlen = 1;

msg.msg_flags = 0;

rc = sendmsg(sockfd,&msg,0);
if(rc == -1 ){
perror("sendmsg");
exit(-5);
}
close(sockfd);
}

int listener(char *path)
{
struct sockaddr_un unsock = {0};
struct sockaddr_un remote = {0};
int sockfd;
socklen_t len;

sockfd = socket(AF_UNIX,SOCK_STREAM,0); /* AF_UNIX for local domain sockets */
if(sockfd == -1){
perror("socket");
exit(-1);
}

unlink(UDS);
bzero(&unsock,sizeof(unsock));
unsock.sun_family = AF_UNIX;
strcpy(unsock.sun_path,UDS);
unsock.sun_len=SUN_LEN(&unsock);

/* Binding to a pathname creates the reference file in the file system */
if(bind(sockfd ,(struct sockaddr *)&unsock,SUN_LEN(&unsock)) == -1){
perror("bind");
exit(-1);
}

if (listen(sockfd, 5) == -1) {
perror("listen");
exit(1);
}
len = SUN_LEN(&unsock);
getsockname(sockfd,(struct sockaddr *)&unsock,&len);
printf("bound name = %s, returned len = %d\n", unsock.sun_path, len);

for(;;){
socklen_t len = sizeof(struct sockaddr_un);
int fd,sendfd;

fd = accept(sockfd ,(struct sockaddr *)&remote,&len);
if(fd == -1 ){
perror("accept");
exit(-2);
}
printf("Accepted a connection\n");

/* Open the file . This returned fd of the file is passed to the client */
sendfd = open(path,O_RDONLY|O_CREAT,0755);
if(sendfd == -1 ){
perror("open");
exit(-3);
}
send_connection(sendfd,fd);
close(sendfd);
}
}

int main()
{
listener("./test.txt");
return 0;
}

client.c

#define UDS "domain_socket"

int receive_fd(int fd)
{
struct msghdr msg;
struct iovec iov;
char buf[1];
int rv;

union
{
struct cmsghdr cm;
char control[CMSG_SPACE(sizeof(int))];
} control_un;
struct cmsghdr *cmptr;

iov.iov_base=buf;
iov.iov_len=1;

msg.msg_name=NULL;
msg.msg_namelen=0;
msg.msg_iov=&iov;
msg.msg_iovlen=1;

msg.msg_control=control_un.control;
msg.msg_controllen=sizeof(control_un.control);

rv = recvmsg(fd,&msg,0);
if(rv == -1){
perror("recvmsg");
exit(-1);
}
else if(rv > 0){
cmptr = CMSG_FIRSTHDR(&msg);
if(cmptr->cmsg_type != SCM_RIGHTS){
printf("Unknown control info\n");
exit(-3);
}
return *((int *)CMSG_DATA(cmptr));
}
else
return -1;
}

int sock_dgram()
{
int s, t, len;
struct sockaddr_un remote;
char str[100];

if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) {
perror("socket");
exit(1);
}

printf("Trying to connect...\n");

remote.sun_family = AF_UNIX;
strcpy(remote.sun_path, UDS);
remote.sun_len=SUN_LEN(&remote);
len = strlen(remote.sun_path) + sizeof(remote.sun_len) + sizeof(remote.sun_family);
if (connect(s, (struct sockaddr *)&remote, len) == -1) {
perror("connect");
exit(1);
}
printf("Connected ..\n");
return s;
}

void reader(int fd)
{
char ch;
while(read(fd,&ch,1))
write(1,&ch,1);
}
int main()
{
int fd,passfd;

fd = sock_dgram();
passfd = receive_fd(fd);
if(passfd != -1)
reader(passfd);

return 0;
}

Friday, October 31, 2008

Dtrace Basics

DTrace is a comprehensive dynamic tracing facility that can be used by administrators and developers to examine the behavior of both user programs and of the operating system itself. With DTrace we can explore our system to understand how it works, track down performance problems across many layers of software, or locate the cause of aberrant behavior. It is safe to use on production systems and does not require restarting/recompiling either the system or applications.

We write D scripts which consist of the probe desctiption , predicates and actions to be taken :
probe description
/predicate/
{
actions
}

When we run the D script , we get results based on the probe desciptions ( the actions are executaed based on the predicate filter ). Think of probes as events: a probe fires when the event happens. Let's take a simple D script example,example.d :

syscall::write:entry
/execname == "bash"/
{
printf("bash with pid %d called write system call\n",pid);
}

Here the probe description is syscall::write:entry , which describes the write system call.The predicate is execname == bash , execname is a builtin variable which contains the executable name and here we proceed with the actions only when the string matches.The action statements contain a builtin function print.

Providers/Probes

To list all of the available probes on your system, type the command:
# sudo dtrace -l

It might take some time to display all of the output. To count up all your probes, you can type the command:

#sudo dtrace -l | wc -l
22567

If you look at the output from dtrace -l in your terminal window,each probe has two names , an integer ID and a human-readable name. The human readable name is composed of four parts.When writing out the full human-readable name of a probe, we write all four parts of the name separated by colons like this:

provider:module:function:name

You might note that some fields are left blank.A blank field is a wildcard and matches all of the probes that have matching values in the parts of the name that you do specify.

Now let's look a little deeper. The probe is described using four fields, the provider, module, function, and name.

* provider—Specifies the instrumentation method to be used. For example, the syscall provider is used to monitor system calls while the io provider is used to monitor the disk io.
* module and function—Describes the module and function you want to observe
* name—Typically represents the location in the function. For example, use entry for name to instrument when you enter the function.

Note that wild cards like * and ? can be used. Blank fields are interpreted as wildcards.Below table shows a few examples :

Probe Description Explanation
syscall::open:entry entry into open system call
syscall::open*:entry entry into any system call that starts with open (open and open64)
syscall:::entry entry into any system called
syscall::: all probes published by the system call provider

A predicate can be any D expression.The action is executed only when the predicate evaluates to true.Below table shows some examples :
Predicate Explanation
cpu == 0 true if the probe executes on cpu0
pid == 1029 true if the pid of the process that caused the probe to fire is 1029
execname != "sched" true if the process is not the scheduler (sched)
ppid !=0 && arg0 == 0 true if the parent process id is not 0 and first argument is 0

The action section can contain a series of action commands separated by semi-colons (;).Below table provides some examples :
Action Explanation
printf() print something using C-style printf() command
ustack() print the user level stack
trace print the given variable

Note that predicates and action statements are optional. If the predicate is missing, then the action is always executed. If the action is missing, then the name of the probe which fired is printed.

Below links provide references for different parts of a probe.
List of providers
List of functions
List of aggregating functions
List of variables
List of built-in variables

Examples

pid provider
------------
Example Explanation
pid2439:libc:malloc:entry entry into the malloc()in libc for process id 2439
pid1234:a.out:main:return return from main for process id 1234
pid1234:a.out::entry entry into any func in 1234 that is main exec
pid1234:::entry entry into any function in any library for pid 1234

You can limit the number of probes enabled by modifying the probe description.
Probe Description Explanation
pid$1:libc::entry/div> Limit to only a given library
pid$1:a.out::entry/div> Limit probes to non-library functions
pid$1:libc:printf:entry Limit probes to just one function

Here is the command you can run to print all the functions that process id 1234 calls:
# dtrace -n pid1234:::entry

Modify the script to take the process id as a parameter. Your script will now look like:

#!/usr/sbin/dtrace -s
pid$1:::entry
{}

script to find the stack trace when the program makes the write system call. Note that you need to run this with the -c option.

#!/usr/sbin/dtrace -s
syscall::write:entry
{
@[ustack()]=count();
}

The syscall Provider
--------------------
This is probably the most important provider to learn and use because system calls are the main communication channel between user level applications and the kernel.

To list all the occurrences of the probe when it was fired and give information about the system calls at entry into the system that are performing a close(2) system call, use the following script:

# dtrace -n syscall::close:entry

To start to identify the process which sent a kill(2) signal to a particular process, use the following script:

#!/usr/sbin/dtrace -s
syscall::kill:entry
{
trace(pid);
trace(execname);
}

The proc Provider
-----------------
Trace all the signals sent to all the processes currently running on the system:

#!/usr/sbin/dtrace -wqs
proc:::signal-send
{
printf("%d was sent to %s by ", args[2], args[1]->pr_fname);
system("getent passwd %d | cut -d: -f5", uid);
}

Add the conditional statement (/args[2] == SIGKILL/) into the script and send SIGKILL signals to different processes from different users.

#!/usr/sbin/dtrace -wqs
proc:::signal-send
/args[2] == SIGKILL/
{
printf("SIGKILL was sent to %s by ", args[1]->pr_fname);
system("getent passwd %d | cut -d: -f5", uid);
}

Here you can see the introduction of pr_fname, which is part of the structure of psinfo_t of the receiving process.

References :

Dtrace @ OpenSolaris
Dtrace inventor blogs
Big Admin Page
Dtrace Guide

Thursday, July 3, 2008

gdb equivalent commands on dbx

DBX debugger is found on the Solaris & AIX platforms . Since we have different commands for dbx & gdb , the other most popular debuger , this note is for people who want to see the gdb commands for/on dbx.

DBX dosen't support command completion and abbreviation like gdb . We have other ways to make it work a bit like gdb.dbx does have a gdb mode ( gdb on ),but it lacks some of the gdb commands.Below I try to give the most commonly used commands for the 2 debuggers.For all the commands , the dbx command is on the left of the ":" and the gdb equivalent command on the right of the ":".

Reading Core files

dbx - core : gdb -c core # Reading the core file.
dbx - pid : gdb -p pid # dbx can find the program automatically.

Logging

dbxenv session_log_file_name file : set logging # logging o/p to a file
dbxenv session_log_file_name : show logging

Debugging Information Support

stabs (SUN), dwarf2, -g -O : stabs (GNU), dwarf2, -g -O
Macro support (-g3) : Macro support (-g3) # Macro debugging support

Sun Studio compilers don't generate debug info for macros, though.

Debugging Programs with Multiple Processes

dbxenv follow_fork_mode parent : set follow-fork-mode parent
dbxenv follow_fork_mode child : set follow-fork-mode child
dbxenv follow_fork_mode ask : -

Breakpoints

stop in function : break function
stop at [filename:]linenum : break [filename:]linenum
stopi at address : break *address # Stop at a instruction address
status [n] : info breakpoints [n] # Show all breakpoints
delete [breakpoints] : delete [breakpoints] [range ...]# delete breakpoint
delete all : - # delete a breakpoint

Examining the Stack

where [n] : backtrace [n] # Shows the stack backtrace
frame [n] : frame [args] # goto a particular frame
dump : info locals # dump info about local variables

Examining Data

print -f expr : print /f expr
Array slicing (p array[2..5]) : Artifcial arrays (p *array@len)
display : display
x addr [/nf] : x/nfu addr
regs : info registers
regs -f | -F : info all-registers
print $regname : info registers regname ...

Memory access checking

check -access : set mem inaccessible-by-default [on|off]
check -memuse : set mem inaccessible-by-default [on|off]
check -leaks : set mem inaccessible-by-default [on|off]

Examining the Symbol Table

whereis -a addr : info symbol addr
whatis [-e] arg : whatis arg
whatis [-e] arg : ptype arg
whatis -t [typename] : info types [regexp]
modules -v / files : info sources

Also,its better to set up aliases to commonly used dbx commands, to their gdb quivalents.I am using the below ~/.dbxrc file :
--
dalias alias=dalias

alias b="stop in" # set breakpoint in a function
alias sa="stop at" # set breakpoint at a line number
alias st=status # show breakpoints, numbered
alias del=delete # delete a breakpoint

alias cka="check -access" # check for invalid memory access
alias ckl="check -leaks" # check for memory leaks

alias r="run " # start the program running at its beginning
alias q=quit

alias w=where # show frames in call stack
alias bt=where # show frames in call stack
alias u=up
alias d=down
alias f=frame

alias l=list # list some source lines
alias lw="list -w" # from 5 before current line to 5 after
alias p=print # print value of variable or expression
alias ptype=whatis -t # find declaration of variable or function
alias wi=whatis # find declaration of variable or function

alisa ni=nexti
alias si=stepi
alias n=next # cont to next stmt in same function
alias s=step # step INTO the function about to be called
alias su="step up" # cont to next stmt in parent function
alias c=cont # continue running

alias h=history

Wednesday, April 16, 2008

Beginners AWK programming with examples

AWK derives it name from its creators Aho,Kernighan and Weinberger. Awk has two faces: it is a utility for performing simple text-processing tasks, and it is a programming language for performing complex text-processing tasks.It is also an "interpreted" language -- that is, an Awk program cannot run on its own, it must be executed by the Awk utility itself.

Basic Structire

awk [options] 'pattern action ...' [filenames]

Examples :
awk '/root/' /etc/passwd # root is the pattern here delimited by / & /
awk '{print}' /etc/passwd # prints the whole file

AWK supports multiple pattern action statements ( use shell's multiline capability )

Records and Fields
Each Line is a record.

$0 is the entire record.
$1..$127 are the fields 1 .. 127

Examples :
awk -F: '/root/{print $1}' /etc/passwd # -F specifies the field seperator.
# prints the first field of each entry.

awk -F: '/root/{print $1,$7}' /etc/passwd # prints the 1st and 7th fields
# comma uses OFS which is a space

ls -l | awk '{print $9"\t"$5}'

awk '/^$/ {print "This is a blank line"}
/[a-zA-Z]+/ {print "Alphabets"}
/[0-9]+/ { print "Numerals"}'

What would the output of the below statement ?
awk -F: '/root/{print $ $7}

Arithmatic
Examples :
awk -F: '{print $3,$3+1}' /etc/passwd
awk -F: '{printf("%10s %15s\n",$1,$7)}' /etc/passwd

Note print introduces a newline , but printf dosen't.

Relational Operators ( <,<=,>,>= )
Examples :
awk -F: '$3>500' /etc/passwd
awk -F: '$3==500' /etc/passwd
awk -F: '$3>500 && $3<510' /etc/passwd
awk -F: '$1 == "root" || $1 == "halt"' /etc/passwd

Regular Expression Operators
Regular expressions can also be used in matching expressions.The two operators, `~' and `!~', perform regular expression comparisons. Expressions using these operators can be used as patterns or in if, while, for, and do statements.

Examples :
awk '$1 ~ /^root/' # lines starting with root are printed
awk '$1 !~ /$root/'

Built-In Variables
1. NR ( No. of records processed so far )
NR gives the current line's sequential number.

Examples :
awk '/root/ { print NR,$0}' /etc/passwd # if matches print line no. and line.
awk 'NR>40' /etc/passwd # print from the 41st line
awk 'NR==5 , NR==10 {print NR}' /etc/passwd # print line nos 5 to 10
awk 'NR>5 && NR<10 { print NR}'/etc/passwd # print line no. > 5 and < 10
awk 'NR%2 == 1 { print NR }' /etc/passwd # print odd line numbers

2.FNR
NR counts the lines from the very begining countinuously until the end. FNR restarts the counting at the begining of each input file.

So, for the first file processed they will be equal but on the first line of the second and subsequent files FNR will start from 1 again.

Examples :
awk '{print FNR,$0}' out out1 out2

3.NF ( Contains the no. of fields in the current line/record )
Examples :
awk '{print NF}'
awk 'NF>4' # print lines having > 4 fields

What would the following line output ?
awk '{print $NF}' /etc/passwd

Output Redirection

Examples :
awk '/root { print NR,$0 > "out" }' /etc/passwd # redirects o/p to file named out
ls -l | awk '{print $5 | "sort -rn > sorted" }'
The above calls the sort command and redirects o/o to file sorted.Any external command should always be given in quotes.

ls -l | awk '{print $5 | "sort -nr | uniq "}'
ls -l | awk '{print $5 | "sort -nr | uniq > out"}'

BEGIN & END Blocks

BEGIN and END are special patterns. They are not used to match input records. Rather, they are used for supplying start-up or clean-up information to your awk script. A BEGIN rule is executed, once, before the first input record has been read. An END rule is executed, once, after all the input has been read.An awk program may have multiple BEGIN and/or END rules. They are executed in the order they appear, all the BEGIN rules at start-up and all the END rules at termination.

BEGIN {actions}

-- The body of the AWK script --

END {actions}

Examples :
awk 'BEGIN{FS=":"} { print $1}' /etc/passwd # begin initializes FS to :
awk 'BEGIN{FS=":" ; OFS="+"} {print $1,$7}' /etc/passwd
awk 'BEGIN{FS=":";OFS="+";print "List of users"}{print $1,$7}' /etc/passwd
awk 'BEGIN{print "Welcome"}'
ls -l | awk '{sum=sum+$5} END{print sum}' # sum is accessed as such , not with $.

Built-In AWK Functions

Examples :
awk '{print int($1)}'
awk '{print sqrt($1)}' # square root function
awk '{print length($1}' # length function
awk '{print length}' # prints length of i/p line
awk 'length>60' /etc/passwd
awk 'length>60 { print length,$0}' /etc/passwd

awk print substr($1,3,2)}' # From 3rd char , print 2 chars.
awk '{print substr($1,3,2) > 50'
awk '{print substr($1,3,2) >50 && substr($1,3,2) <60}'

awk '{print toupper($0)}'

Tuesday, April 15, 2008

Perl arrays

An array in perl is an ordered collection of scalar items.While scalar data (single pieces of data) use the $ sign, arrays use the @ symbol in perl.Array indices are whole numbers and the first index is 0.

There are 3 distict characteristics for arrays in perl :

1. Perl supports only single dimentional arrays.
2. Array size cannot be fixed.
3. Collection of data items of any types.

Examples :
@strn=("abc",34,56.7,"hello"); # Declares and initialises an array
print @strn; # Print all the elements

What does the below code fragment do ?
$x=("abc",34,56.7,"hello");

Since we are assigning a list to a scalar,it takes the last value , ie, "hello".

PS : For the difference between arrays and lists , see here.

The syntax that is used to access arrays is closer to arrays in C. In fact, one can often treat Perl's arrays as if they were simply C arrays, but they are actually much more powerful than that.

$, is a global variable and is called the field seperator.By default its not set to anything.Therefore print @string statement above prints all the elements without any spaces.Now , we can use the field seperator variable to our own type of seperator.

Examples :
$,=" ";
print @strn;

What do you think the below code fragment should output ?
$,=":";
print "value of $x is ",$x,"\n" ;

Some more special global variables :
$# Gives the size of the array.
$" Special variable used when printing an array . Default is a space.
$\ Output record seperator.Default is nothing.
$/ Input record seperator.Default is \n.

Below examples show how the size of an array is referenced
$s=@strn; # Assigning the array to a scalar gives the no. of elements of the array.
print @strn>5 ; # In an scaler context we compare with the size of the array.
print scalar @strn; # Explicitly request the size of an array.
print $#strn; # Returns the last index no.

Note we can have a scalar and an array with the same name.
Examples :
$strn=44;
print $strn[0]; # The square bracket differentiates it to be an array.
$strn[100]="rrr"; # Now the array size is 101.All the uninitialized values are 0.
# Array elements beyond the array size is undef/null.
$#strn=5; # Reduces the size of the array.
print @strn[0,4,2]; # prints 0th , 4th and 2nd element.

.. is the range operator.Range should always be positive.
Examples :
@strn[11..15]=(45,6,7,8,9); # truncates any additional values given.
print $strn[-1]; # -1 is the last index no.
print $strn[-2]; # -2 is second last index and so on.

Build-In Array functions :

1. Push ( push array,list of elements )
Push 1 or more elements.Push returns size of the new array.

Examples :
@n=qw(a b c d e f); # qw stands for quote words.
push @n,"56",33,"aa";
print push @n,"ui","ll"; # prints the size of the new array.
print push @n; # returns the size of array.
print @n;

@n=("hello","world");
is the same as
@n=qw(hello world);

2. Pop ( pop arrayname )
Removes the last element of an array and decrease the size of array.
Returns the element removed.

Examples :
$\="\n";
print @ARGV;
pop; # pop looks into @ARGV & removes the last element.
print @ARGV;

3. Unshift ( unshift arrayname,list of elements )
Adds the elements at the begining of array ( the opposite of push )

Examples :
unshift @n,"first",second";
print @n;

4. Shift ( shift arrayname )
Same as pop , but removes the first element.

Examples :
my @numbers = (1 .. 10);
while(scalar(@numbers) > 0)
{
my $i = shift(@numbers);
print $i, "\n";
}

5. Splice ( splice arr,startindex,no. of elem to be removed,list of elem to add )
Overwrite/Append anywhere in an array.

Examples :
@cities=("bang","hyd","mum","chn");
splice @cities,2,1,"mys";
print "@cities";

splice @cities,0,0,"mum","sri","bhu"; # appends at the begining.
print "@cities";

splice @cities,1,2; # remove 2 elements begining at index 1.Index starts at 0.
splice @cities,3; # removes all the elements starting from the 3rd index.
splice @cities; # deletes all the elements.

6. Sort ( sort arrayname )
Sort the array elements by ASCII ascending order ( default ).This dosen't modify the array,returns a new sorted array.By default the array elements are compared with the string comparision operator.

Examples :
$,=" ";
@cities=("bang","hyd","mum");
print sort @cities; # prints a ascii sorted list with a space inbetween.
print @cities;

@cities=sort @cities; # overwrites the array with the sorted array
print @cities;

Below examples show how to do numeric comparisions.
Examples :
@nn=(45,67,1,11,20,30);
print sort @nn; # o/p 1,11,20,30,45,67 ( ascii sort ).
print sort{$a <=> $b} @nn; # ascending order . Remember this construct.
print @nn;
print sort{$b <=> $a} @nn; # descending order. Remember this construct.
print sort{$b cmp $a} @cities; # string (ascii)comparision in descending order.

7. Reverse ( reverse arrayname )
Reverse the array elements.Dosen't modify the array.

Examples :
print reverse @cities; # prints reverse.
print reverse sort @cities; # descending order.

8. split ( split ,string )
Returns an array splitting on a character or string.

Examples :
$s="Hello:world::perl";
@arr2=split(m/:+/,$s); # m stands for match.
# The contents between / / is the regex pattern
# $s is the string to be searched.

9. Join ( Join char/string,string )
Its the opposite of split.Returns a string.

Examples :
$st=join "-",@cities;
print $st;
print join "\n",@cities;

10. Delete ( delete array )
Deletes any element of an array.Deleting an element other than the last element of the array dosen't change the size of the array , else it changes.

Examples :
delete $cities[1];
print "@cities";
print scalar @cities;
delete $cities[$#cities];
print scalar @cities;
print "@cities";