Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Amit Patel
Introduction
Awk is a programming language which allows easy manipulation of structured data and
the generation of formatted reports.
AWK name is derived from the family names of its authors – Alfred Aho, Peter
Weinberger, and Brian Kernighan.
The Awk is mostly used for pattern scanning and processing. It searches one or more files
to see if they contain lines that matches with the specified patterns and then perform
associated actions.
It is an excellent filter and report writer. It is very powerful and specially designed for
the text processing.
Introduction
Some of the key features of Awk are:
Like common programming language, Awk has variables, conditionals and loops
Awk reads from a file or from its standard input, and outputs to its standard output.
AWK is simple to use and we can provide AWK commands either directly from command
line or in the form of a text file having AWK commands.
awk Syntax
Selection criteria Or
Pattern {action}
pattern { action }
The pattern specifies when the action is performed. Like most UNIX utilities, AWK is line
oriented. That is, the pattern specifies a test that is performed with each line read as input. If
the condition is true, then the action is taken. The default pattern is something that matches
every line. This is the blank or null pattern.
If action is missing, the matched line is printed. It must have either pattern or action
For each line, it matches with given pattern in the given order, if matches performs the
corresponding action.
In the above syntax, either search pattern or action are optional, But not both.
“Begin” specify actions to be taken before any lines are read, “End” after the last line
is read. The AWK program below:
Use of awk
Initialization Processing (BEGIN)
The initialization processing is done only once, before awk starts reading the file.
The beginning instruction are used to initialize varaibles, create report headings and
perform other processing that must be completed before the file processing starts.
Body Processing
It start when awk read the first record or line from the file. It then processes the
data through the body instructions.
Use of awk
When the end of the body instructions is reached, awk repeats the process by
reading the next record or line and processing it.
This means that if a file contain 50 records, the body will executed 50 times.
The end processing is executed after all input data have been read.
At this time, information accumulated during the processing can be analysed and
printed or other end activities can be conducted.
BEGIN and END
Special pattern BEGIN matches before the first input line is read; END
matches after the last input line has been read
{ print }
A field is a unit of data that has informational content. In data file ( When file
is made up of organize data), there is fixed field like in emp file field is
empid, ename etc. but in text file each word become fields so there is
variable number of fields.
Record
Field
Opearation
Opearation
Field buffer
There are as many field buffer available as there are fields in the current records of the input
file.
Each field buffer has a name which is the dollar sign ( $ ) followed by the field number in the
current record.
It is begin with number one, which give us $1, $2 second field and so on.
Record buffer
There is only one record buffer available. Its name is $0 which hold whole record.
As long as the content of any of the fields are not changed, $0 holds exactly the same data as
found in input file.
Opearation
Opearation
Example 1: Default behavior of Awk
In the above example pattern is not given. So the actions are applicable to all the lines.
Action print with out any argument prints the whole line by default. So it prints all the
lines of the file with out fail. Actions has to be enclosed with in the braces.
Opearation
Example 2: Print the lines which matches with the pattern. OR
$ awk '/Thomas/
In the above example it prints all the line which matches with the ‘Thomas’ or ‘Nisha’.
It has two patterns. Awk accepts any number of patterns, but each set (patterns and its
corresponding actions) has to be separated by newline.
Opearation
Example 2: Print only specific field.
Awk has number of built in variables. For each record i.e line, it splits the record delimited by
whitespace character by default and stores it in the $n variables. If the line has 4 words, it will
be stored in $1, $2, $3 and $4. $0 represents whole line. NF is a built in variable which
represents total number of fields in a record.
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000
Opearation
Example 4: Initialization and Final Action.
Report Generated
--------------
Opearation
Variable
System Variables
Without these built-in variables it’s very much difficult to write simple AWK code.
These variable are used to format output of an AWK command, as input field separator and
even we can store current input file name in them for using them with in the script.
Variable Description
This will come handy when you want to print line numbers in a file.
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
22
Example: Space as Field Separator
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
23
Example: Equivalent to SED.
Print line 2 to 4 in file OR Extract line 2 to 4 without using sed and head, tail
OR Write equivalent command of following : sed –n ‘5,10p’ emps OR head -10 emps|tail +5
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
$ awk -F “ “ ‘NR<=2 {print NR, $0}’ emps (Print Line no. Also)
OR
$ awk ‘NR<=2’ EMPS. (Print only Records)
25
Example: Equivalent to Head filter
26
Example: NF varaible
NF gives you the total number of fields in a record.
Awk NF will be very useful for validating whether all the fields are exist in a record.
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000
In the above example $2 and $5 represents Name and Salary respectively. We can get the Salary
27
using $NF also, where $NF represents last field. In the print statement ‘,’ is a concatenator.
Example: NF varaible
How to print last field without knowing the number of field in file?
$5,000
$5,500
$7,000
$9,500
$6,000
28
Example: FILENAME varaible
This variable contain file, awk command is processing.
This will come handy when you want to print no of line present in a
given file.
Output:
30
Example: FS varaible
By default AWK can understand only spaces, tabs as input and output
separators.
But if your file contains some other character as separator other than
these mention one’s, AWK cannot understand them.
31
Example: FS varaible
If you see entire file is displayed which indicates AWK do not understand
db.txt file separator “,”. We have to tell AWK what is the field separator.
32
Example: FS varaible
File
33
Example: OFS varaible
This variable is useful for mentioning what is your output field separator which separates
output data.
Example: Display only 1st and 4th column and the separator between at output for these
columns should be $.
John $ IBM
Barbi $ JHH
Mitch $ BofA
Tim $ DELL
Lisa $ SmartDrive
34
Example: RS varaible
Row Separator is helpful in defining separator between rows in a file. By default AWK takes
row separator as new line. We can change this by using RS built-in variable.
Example: I want to convert a sentence to a word per line. We can use RS variable for doing it.
echo “This is how it works” | awk ‘BEGIN{RS=” ”}{print $0}’
This
is
how
it
works
35
Example: ORS varaible
This variable is useful for defining the record separator for the AWK command output. By
default ORS is set to new line.
Example: Print all the company names in single line which are in 4th column.
36
Printf Statement
printf is similar to AWK print statement but the advantage is that it can print with formatting
the output in a desired manner.
For example you want to print decimal values of column 3 then the example will be.
awk ‘{printf “%d”, $3}’ example.txt
37
Printf Statement
Type of data
printf can be useful when specifying data type such as integer, decimal, octal etc. Below are
the list of some data types which are available in AWK.
Format Specification
%i or %d Decimal
%o Octal
%x Hex
%s String
%f Floating Number
Make sure that you pass exact data types when using corresponding formats as shown below.
If you pass a string to a decimal
38
formatting, it will print just zero instead of that string.
Printf Statement
Padding Between Column
We can format the columns to specify number of chars each column can use. We have
following padding formats available with printf statement.
Format Description
-n.m Pad n spaces right hand side and add m zeros before that number.
n.m Pad n spaces left hand side and add m zeros before that.
39
Printf Statement
Pad 5 spaces on right hand side of each column.
|00021|00078|00084|
|00023|00056|00058|
|00025|00021|00038|
|00025|00087|00097|
|00024|00055|00030|
Make the column element with 4 digits and 7 in length and print the number to left hand side.
42
Awk Operators
OPERATOR DESCRIPTION
Ternary Operator ?:
Unary Operator +, -
String Concatenation 43
Array Membership In
Awk Operators
Addition
It is represented by plus(+) symbol which adds two or more numbers. Below example
illustrates this:
Subtraction
It is represented by minus(-) symbol which subtracts two or more numbers. Below example
illustrates this:
Pre-increment
On executing the above code, you get the following result: a = 11, b = 11
45
Awk Operators
Pre-decrement
Post-increment
It is represented by ++. It increments value of operand by 1. This operator first returns the
value of operand then it increments its value. For instance below example sets value of
operand a to 11 and b to 10.
On executing the above code, you get the following result: a = 11, b = 10
Awk Operators
AWK supports following assignment operators:
Simple assignment
$ awk 'BEGIN { name = "Jerry"; print "My name is", name }'
On executing the above code, you get the following result: My name is Jerry
Shorthand addition
$ awk 'BEGIN { cnt=10; cnt += 10; print "Counter =", cnt }'
On executing the above code, you get the following result: Counter = 20
Awk Operators
Shorthand exponential
On executing the above code, you get the following result: Counter = 16
Shorthand exponential
$ awk 'BEGIN { cnt=2; cnt **= 4; print "Counter =", cnt }'
On executing the above code, you get the following result: Counter = 16
Awk Operators
Relational Operators: Equal to
It is represented by ==. It returns true if both operands are equal otherwise it returns false.
Below example illustrates this:
Less than
It is represented by <. It returns true if left side operand is less than right side operand
otherwise it returns false.
$ awk 'BEGIN { a = 10; b = 20; if (a < b) print "a < b" }'
On executing the above code, you get the following result: a < b
Awk Operators
Logical Operators: Logical AND
It evaluates to true if both expr1 and expr2 evaluate to true, otherwise it evaluates to false.
expr2 is evaluated if and only if expr1 evaluates to true. For instance below example checks
whether given single digit number is in octal format or not.
$ awk 'BEGIN {num = 5; if (num >= 0 && num <= 7) printf "%d is in octal format\n", num }'
On executing the above code, you get the following result: 5 is in octal format
Awk Operators
Ternary Operator
We can easily implement condition expression using ternary operator. Below is the syntax of
the same:
When condition expression returns true, statement1 gets executed otherwise statement2 gets
executed. For instance below example finds maximum number.
$ awk 'BEGIN { a = 10; b = 20; (a > b) ? max = a : max = b; print "Max =", max}'
On executing the above code, you get the following result: Max = 20
Awk Operators
String concatenation operator
Space is string concatenation operator which merge two strings. Below simple example
illustrates this:
$ awk 'BEGIN { str1="Hello, "; str2="World"; str3 = str1 str2; print str3 }'
On executing the above code, you get the following result: Hello, World
Awk Operators
Array membership operator
It is represented by in. It is used while accessing array elements. Below simple example prints
array elements using this operator.
$ awk 'BEGIN { arr[0] = 1; arr[1] = 2; arr[2] = 3; for (i in arr) printf "arr[%d] = %d\n", i, arr[i] }'
arr[0] = 1
arr[1] = 2
arr[2] = 3
Awk Operators
Regular Expression Operators
This tutorial explain the two forms of regular expressions operators with suitable examples:
Match (~)
It is represented as ~. It looks for a field that contains the match string. For instance below
example prints lines which contains pattern 9.
2) Rahul Maths 90
5) Hari History 89
Awk Operators
Print the list of employees in Technology department
Now department name is available as a fourth field, so need to check if $4 matches with the
string “Technology”, if yes print the line.
The below example, checks if the department is Technology, if it is yes, in the Action, just
increment the count variable, which was initialized with zero in the BEGIN section.
$4 ~ /Technology/ { count++; }
It is represented as !~. It looks for a field that does not contain the match string.
For instance below example prints lines which does not contain pattern 9.
1) Amit Physics 80
3) Shyam Biology 87
4) Kedar English 85
Example
EXAMPLE COMMAND
Print first field of each line { print $1 }
Print only lines which first field matches URGENT $1 ~ /URGENT/ { print $3, $2 }
Sum up column 2 and print the total {total += $2} END { print total}
Print lines of 7 fields and beginning with "Name:" NF==7 && /^Name:/
Example
EXAMPLE COMMAND
Search for a word and print the lines which contain either Juila or /[Jj]uila/
Print all the column 4 values between lines which contain Frank and /Frank/,/Low/{print $4}
Low
NR>3 && NR<7
Print lines from 3 to 6.
print $i;
}
Awk Control Flow
Like other programming languages AWK also provides conditional statement to control the flow
of the program. This tutorial explain AWK's condition statement with suitable example.
If statement
It simply tests the condition and performs certain action depending upon condition. Below is
the syntax of the if statement:
if (condition)
action
We can also use pair of curly braces as given below to execute multiple actions:
if (condition)
{
action-1
action-1 . . action-n }
Awk Control Flow
Like other programming languages AWK also provides conditional statement to control the flow
of the program. This tutorial explain AWK's condition statement with suitable example.
If statement
It simply tests the condition and performs certain action depending upon condition. Below is
the syntax of the if statement:
if (condition)
action
We can also use pair of curly braces as given below to execute multiple actions:
if (condition)
{
action-1
action-1 . . action-n }
Awk Control Flow
For instance below simple example checks whether number is even or not:
$ awk 'BEGIN {num = 10; if (num % 2 == 0) printf "%d is even number.\n", num }'
On executing the above code, you get the following result: 10 is even number.
If Else Statement
In if-else syntax we can provide the list of actions to be performed when the condition
becomes false.
if (condition)
action-1
else
action-2
Awk Control Flow
Print only odd numbered line of file “emp”
$ awk ' { x= NR % 2;
if ( x == 1)
} ‘ emp '
On executing the above code, we use NR system variable to find number of record and check
its odd or even using if statement and print the records.
Awk Control Flow
In above syntax action-1 is performed when condition evaluates to true and action-2 is
performed when condition evaluates to false. For instance below simple example checks
whether number is even or not:
$ awk 'BEGIN {num = 11; if (num % 2 == 0) printf "%d is even number.\n", num; else printf
"%d is odd number.\n", num }'
On executing the above code, you get the following result: 11 is odd number.
If-Else-If Ladder
We can easily create if-else-if ladder by using multiple if-else statement. Below simple
example illustrates this:
Awk Control Flow
$ awk 'BEGIN {
a=30;
if (a==10)
else if (a == 20)
else if (a == 30)
}'
. - any character
(regex1|regex2) - alternation
NOTE: the min max {m, n} or variations {m}, {m,} syntax is NOT supported
Awk loop
Like conditional statements, AWK also provides looping statements. It is used to execute set of
action in repeated manner. The loop execution continues as long as the loop condition is true.
This tutorial explain AWK's loops with suitable example.
For Loop
action
Initially for statement performs initialisation action, then it checks condition; if condition is
true then it executes actions, after that it performs increment or decrement operation.
The loop execution continues as long as the condition is true. For instance below example
prints 1 to 5 numbers using for loop:
Awk loop
$ awk 'BEGIN { for (i = 1; i <= 5; ++i) print i }'
5
Awk Associate Arry
Print above file in reverse
$ cat reverse.awk
line[NR] = $0;
END {
print line[i]
}
Awk loop
While Loop
While loop keeps executing the action until a particular logical condition evaluates to true.
Given below is syntax of the while loop:
while (condition)
action
AWK first checks condition, if the condition is true then it executes action, this process
repeats as long as the loop condition evaluates to true. For instance below example prints 1 to
5 numbers using while loop:
The do-while loop is similar to the while loop, except that the test condition is evaluated at
the end of the loop. Given below is the syntax of the do while loop:
do
action
while (condition)
In do-while loop action statement gets executed at least once even when condition statement
evaluates to false. For instance below example prints 1 to 5 numbers using do-while loop:
As name suggest it is used to end the loop execution. Here is the example which ends the loop
when sum becomes greater that 50.
$ awk 'BEGIN {sum = 0; for (i = 0; i < 20; ++i) { sum += i; if (sum > 50) break; else print
"Sum =", sum } }'
Sum = 0
Sum = 1
Sum = 3
Sum = 6
$ awk 'BEGIN {for (i = 1; i <= 15; ++i) {if (i % 2 == 0) print i ; else continue} }'
2 4 6 8 10 12 14
Exit
It is used to stop the execution of the script. It accepts an integer as an argument which will
be the exit status code for the AWK process. If no argument is supplied, exit returns status
zero. Here is the example which stops the execution when sum becomes greater that 50.
$ awk 'BEGIN {sum = 0; for (i = 0; i < 10; ++i) { sum += i; if (sum > 10) exit(10); else print "Sum
=", sum } }'
$ echo $?
Where array_name is the name of array, index is the array index and the value is any value
assigning to the element of the array.
Also there is no need to declare the size of array in advance - array can expand/shrink at
runtime.
Awk arrays are different from other ones used in other programming in following ways
AWK ARRAY
They are not formally defined. It is declared at the moment it is used.
Array element are initialized with zero or an empty string unless initialized explicitly.
It expand automatically.
In following program, we use array to store the total of the basic pay, da, hra and gross pay of
the sales and marketing.
Use the “T[ ]” array to store the total of each element of the pay and also the gross pay.
AWK ARRAY
Begin
/SALES|MARKETING/
DA=0.25*$6;HRA=0.50*$6;GP=$6+HRA+DA;
T[1]+=$6; T[2]+=DA;T[3]+=HRA;T[4]+=GP
C++;
}
AWK ARRAY
END
T[1],T[2],T[3],T[4]
}
Awk Array
For insertion we used assignment operator. Similarly we can use delete statement to remove
an element from the array. Below is the syntax of delete statement:
delete array_name[index]
Below example deletes element orange hence command does not show any output.
$ awk 'BEGIN {
fruits["mango"]="yellow";
fruits["orange"]="orange";
delete fruits["orange"];
print fruits["orange"]
}'
Awk Multi-Dimensional Arry
Awk only supports single dimensional arrays. But we can easily simulate a multi-dimensional
array using the single dimensional array itself.
In above example array[0][0] stores 100, array[0][1] stores 200 and so on. To store 100 at
array location [0][0] we can use following syntax:
array["0,0"] = 100
Though we have given 0,0 as index, these are not two indexes. In reality it's just one index
with the string 0,0.
Awk Multi-Dimensional Arry
Below simple example simulates 2-D array:
$ awk 'BEGIN {
array["0,0"] = 100;
array["0,1"] = 200;
array["0,2"] = 300;
array["1,0"] = 400;
array["1,1"] = 500;
array["1,2"] = 600;
:
Awk Multi-Dimensional Arry
print "array[0,1] = " array["0,1"];
print "array[0,2] = " array["0,2"];
print "array[1,0] = " array["1,0"];
print "array[1,1] = " array["1,1"];
print "array[1,2] = " array["1,2"];
}'
Where array is the name of the array , string is the index of the element of the array you are
assigning a value to and value is the value we are assigning to that element.
The subscript is often called the key and is associated with the value assigned to the
corresponding array element.
The array element are not stored in sequential order and when the content of the array are
displayed, they may not be in order we expect.
Awk Associative Arry
The first array associate name with age. In this array, the name identify each person’s age.
The second array associates department number with the sales for that department. The department
number in this array are string.
The associate array structure has several design constraints that we must remember when we use
them
Index must be unique, means each index can be associated with an array value only once.
The is no ordering imposed on the indexes. It means if we create an associate array and print it,
there is no guarantee that the element will be printed based on the order in which the array was
created.
To gain insights about array let us create and access the elements of the array.
$ awk 'BEGIN {
fruits["mango"]="yellow";
fruits["orange"]="orange"
}'
On executing the above code, you get the following result: orange
yellow
In above example we have declared array namely fruits whose index is fruit name and value is
colour of the fruit. To access array element we have used array_name[index] format.
Awk Associate Arry – For loop
The special for loop is used to read through an associated array when string
are used as subscript or the subscript is not consecutive numbers.
The special for loop uses the subscript as a key into the value associated with
it.
action
The index variable can be data or sequence number such as the line number
as a file is read.
Using Associative array, create sales report by department from
following sales file. Also Print Total sales of all department
Awk Associate Arry
In above file $2 is contain the name of department and $3 contains sales.
$ cat sales.awk
deptSales[$2] += $3
END {
for (x in deptSales)
print x, deptSales[x]
computer : 21482
supplies : 2242
textbooks : 36774
clothing : 6393
93
Process each input line
94
Process each input line
95
Awk Built In Function
Awk has several built in function performing both arithmetic and string operations.
The parameter are passed to a function, delimited by comma and enclosed by a matched pair
of parenthesis.
FUNCTION DESCRIPTION
Syntax: length(string)
Following statement is used to locate the records whose length exceeds 57 characters.
For example in emp.txt file 2nd field contain the name of employee. To list out those
employee whose name is less than 10 character.
Locate line in file which contain line between 100 and 150 character
Awk –f “|” ‘length > 100 && length < 150’ emp.txt
index
It determines the first position of string within a large string.
If substring is found then it returns its position, if not found it returns zero.
For example, we have a field which contain string “abcde”, we can use this function to find
out whether “b” is present or not
x = index(“abcde”,”b”)
Output: 2
substr
It extract a substring from a string. It has two format
For example, we have a field which contain string “abcde”, we can use this function to find
out whether “b” is present or not
x = substr(“abcde”,2)
Output: bcde
x = substr(“abcde”,2,2)
Output: bc
split
It Divide string into pieces separated by fieldsep and store the pieces in array and the
separator strings in the seps array
In first format, the field in a string are copied into an array. The first piece is store in
array[1], the second in array[2] and so on. The end of each field is identified by a field
separator character.
Before splitting the string, split() deletes any previously existing elements in the arrays array.
sub
It substitute one string into another.
{ result=sub(/hello/,”hi”,$0);
if(result==0)
print NR,$0
NOTE : Substitute only first position. To apply in all position use “gsub” function
match
It return starting position of the matching expression in the line.
Syntax: match(string,regexp)
Syntax: toupper(string)
Output: AMIT
Built-In Arithmetic Functions
Function Description
split(s, a, fs) split s into array a on field separator fs, return number of fields