Awk

awk
Amit Patel
Introduction
 Awk is a programming language which allows easy manipulation of structured data and
the generation of formatted reports.
 AWK name is derived from the family names of its authors – Alfred Aho, Peter
Weinberger, and Brian Kernighan.
 The Awk is mostly used for pattern scanning and processing. It searches one or more files
to see if they contain lines that matches with the specified patterns and then perform
associated actions.
 It is an excellent filter and report writer. It is very powerful and specially designed for
the text processing.
Introduction
 Some of the key features of Awk are:
 Awk views a text file as records and fields.
 Like common programming language, Awk has variables, conditionals and loops
 Awk has arithmetic and string operators.
 Awk can generate formatted reports
 Awk reads from a file or from its standard input, and outputs to its standard output.
 AWK is simple to use and we can provide AWK commands either directly from command
line or in the form of a text file having AWK commands.
awk Syntax
Selection criteria Or
Pattern {action}
awk [options] ‘script’ file(s)
awk [options] –f scriptfile file(s)

Syntax of awk
 AWK program follows the form:
pattern { action }
 The pattern specifies when the action is performed. Like most UNIX utilities, AWK is line
oriented. That is, the pattern specifies a test that is performed with each line read as input. If
the condition is true, then the action is taken. The default pattern is something that matches
every line. This is the blank or null pattern.
 If pattern is missing, action is applied to all lines.
 If action is missing, the matched line is printed. It must have either pattern or action
Example: awk '/for/' testfile
 prints all lines containing string “for” in testfile

Working Methodology
 Awk reads the input files one line at a time.
 For each line, it matches with given pattern in the given order, if matches performs the
corresponding action.
 If no pattern matches, no action will be performed.
 In the above syntax, either search pattern or action are optional, But not both.
 Each statement in Actions should be delimited by semicolon.

Syntax of awk Script
 All “awk script” are divided into 3 parts: BEGIN, BODY and END.
 “Begin” specify actions to be taken before any lines are read, “End” after the last line
is read. The AWK program below:
Use of awk
Initialization Processing (BEGIN)
 The initialization processing is done only once, before awk starts reading the file.
 It is identifies by the keyword, BEGIN, and instruction are enclosed in a set of

braces.
 The beginning instruction are used to initialize varaibles, create report headings and
perform other processing that must be completed before the file processing starts.
Body Processing
 The body is a loop that processes the data in a file.
 It start when awk read the first record or line from the file. It then processes the
data through the body instructions.
Use of awk
 When the end of the body instructions is reached, awk repeats the process by
reading the next record or line and processing it.
 This means that if a file contain 50 records, the body will executed 50 times.
End Processing (END)
 The end processing is executed after all input data have been read.
 At this time, information accumulated during the processing can be analysed and
printed or other end activities can be conducted.
BEGIN and END
 Special pattern BEGIN matches before the first input line is read; END
matches after the last input line has been read
 This allows for initial and wrap-up processing
BEGIN { print “NAME RATE HOURS”; print “” }
{ print }
END { print “total number of employees is”, NR }

Opearation
 The awk utility view a file as a collection of records and fields.
 A field is a unit of data that has informational content. In data file ( When file
is made up of organize data), there is fixed field like in emp file field is
empid, ename etc. but in text file each word become fields so there is
variable number of fields.
 Each line in text file is consider as a record.
 It provide two types of buffer:
 Record
 Field
Opearation
Opearation
Field buffer
 There are as many field buffer available as there are fields in the current records of the input
file.
 Each field buffer has a name which is the dollar sign ( $ ) followed by the field number in the
current record.
 It is begin with number one, which give us $1, $2 second field and so on.
Record buffer
 There is only one record buffer available. Its name is $0 which hold whole record.
 As long as the content of any of the fields are not changed, $0 holds exactly the same data as
found in input file.
Opearation
Opearation
Example 1: Default behavior of Awk
By default Awk prints every line from the file.
$ awk '{print;}' employee.txt
100 Thomas Manager Sales $5,000
200 Jason Developer Technology $5,500
300 Sanjay Sysadmin Technology $7,000
400 Nisha Manager Marketing $9,500
In the above example pattern is not given. So the actions are applicable to all the lines.
Action print with out any argument prints the whole line by default. So it prints all the
lines of the file with out fail. Actions has to be enclosed with in the braces.
Opearation
Example 2: Print the lines which matches with the pattern. OR
Implement : egrep ‘Thomas | Nisha’ employee.txt
$ awk '/Thomas/
> /Nisha/' employee.txt
100 Thomas Manager Sales $5,000
400 Nisha Manager Marketing $9,500
In the above example it prints all the line which matches with the ‘Thomas’ or ‘Nisha’.
It has two patterns. Awk accepts any number of patterns, but each set (patterns and its
corresponding actions) has to be separated by newline.
Opearation
Example 2: Print only specific field.
Awk has number of built in variables. For each record i.e line, it splits the record delimited by
whitespace character by default and stores it in the $n variables. If the line has 4 words, it will
be stored in $1, $2, $3 and $4. $0 represents whole line. NF is a built in variable which
represents total number of fields in a record.
$ awk '{print $2,$5;}' employee.txt
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000
Opearation
Example 4: Initialization and Final Action.
Print employee detail report with heading and ending information.
$ awk 'BEGIN {print "Name\tDesignation\tDepartment\tSalary";}
> {print $2,"\t",$3,"\t",$4,"\t";}
> END{print "Report Generated\n--------------"; }' employee.txt
Name Designation DepartmentSalary
Thomas Manager Sales $5,000
Jason Developer Technology $5,500
Sanjay Sysadmin Technology $7,000
Report Generated
--------------
Opearation
Variable
 There are two types of variables
 System Variables
 User Defined Variables

System Variable:
 Without these built-in variables it’s very much difficult to write simple AWK code.
 These variable are used to format output of an AWK command, as input field separator and
even we can store current input file name in them for using them with in the script.
 AWK built-in variables are listed below in table

System Variable
Variable Description
NR Current count of the number of input records.
NF Keeps a count of the number of fields.
FILENAME The name of the current input file.
FNR No. of records in current filename.
FS Contain the “field separator” character.
RS Stores the current “record separator” or Row Separator.
OFS Stores the “output field separator”.
ORS Stores the “output record separator” or Output RS.

Example: NR varaible
 This variable keeps the value of present line number.
 This will come handy when you want to print line numbers in a file.
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
$ awk '{print NR, $0}' emps // same line cat –n emps.

1 Tom Jones 4424 5/12/66 543354
2 Mary Adams 5346 11/4/63 28765
3 Sally Chang 1654 7/22/54 650000
4 Billy Black 1683 9/23/44 21336500
Example: Space as Field Separator
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
$ awk '{print NR, $1, $2, $5}' emps

1 Tom Jones 543354
2 Mary Adams 28765
3 Sally Chang 650000
4 Billy Black 336500
22
Example: Space as Field Separator
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
$ awk '{print NR, $1, $2, $5}' emps

1 Tom Jones 543354
2 Mary Adams 28765
4 Billy Black 336500
23
Example: Equivalent to SED.
Print line 2 to 4 in file OR Extract line 2 to 4 without using sed and head, tail
OR Write equivalent command of following : sed –n ‘5,10p’ emps OR head -10 emps|tail +5
$ cat emps
Tom Jones 4424 5/12/66 543354
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
$ awk -F “ “ ‘NR=2,NR==4 {print NR, $0}' emps

2 Mary Adams 28765
4 Billy Black 336500 24
Example: Equivalent to Head filter
Print First 2 line OR
Write following command using awk: head -2 emps.
$ awk -F “ “ ‘NR<=2 {print NR, $0}’ emps (Print Line no. Also)
OR
$ awk ‘NR<=2’ EMPS. (Print only Records)
1 Tom Jones 4424 5/12/66 543354

2 Mary Adams 5346 11/4/63 28765
25
Example: Equivalent to Head filter
Print all line after 2 line OR
Write following command using awk: tail +2 emps.
$ awk ‘NR>=2’ EMPS.
Mary Adams 5346 11/4/63 28765

Sally Chang 1654 7/22/54 650000
Billy Black 1683 9/23/44 336500
26
Example: NF varaible
 NF gives you the total number of fields in a record.
 Awk NF will be very useful for validating whether all the fields are exist in a record.
 The final value of a row can be represented with $NF.
$ awk '{print $2,$NF;}' employee.txt
Thomas $5,000
Jason $5,500
Sanjay $7,000
Nisha $9,500
Randy $6,000
 In the above example $2 and $5 represents Name and Salary respectively. We can get the Salary
27
using $NF also, where $NF represents last field. In the print statement ‘,’ is a concatenator.
Example: NF varaible
 How to print last field without knowing the number of field in file?
$ awk '{print $NF;}' employee.txt
$5,000
$5,500
$7,000
$9,500
$6,000
 In the above example $NF represents last field.
28
Example: FILENAME varaible
 This variable contain file, awk command is processing.
Example11: Print filename for each line in a given file.
awk ‘{print FILENAME, NR, $0}’ abc.txt

Output:
abc.txt 1 Jones 2143 78 84 77

abc.txt 2 Gondrol 2321 56 58 45
abc.txt 3 RinRao 2122234 38 37
abc.txt 4 Edwin 253734 87 97 95
abc.txt 5 Dayan 24155 30 47
29
Example: FNR varaible
 This variable keeps count of number of lines present in a given
file/data.
 This will come handy when you want to print no of line present in a
given file.
 This command is equivalent to wc -l command.
Example10: Print total number of lines in a given file.
awk ‘END{print FNR}’ db.txt
Output:
30
Example: FS varaible
 This variable is useful in storing the input field separator.
 By default AWK can understand only spaces, tabs as input and output
separators.
 But if your file contains some other character as separator other than
these mention one’s, AWK cannot understand them.
 For example Unix password file which contain ‘:’ as a separator. So in

order to mention the input filed separator we use this inbuilt variable.
31
File Print first column data from db.txt file.

cat db.txt awk ‘{print $1}’ db.txt
John,29,MS,IBM,M,Married Output:
Barbi,45,MD,JHH,F,Single John,29,MS,IBM,M,Married
Mitch,33,BS,BofA,M,Single Barbi,45,MD,JHH,F,Single
Tim,39,Phd,DELL,M,Married Mitch,33,BS,BofA,M,Single
Lisa,22,BS,SmartDrive,F,Married Tim,39,Phd,DELL,M,Married
Lisa,22,BS,SmartDrive,F,Married
If you see entire file is displayed which indicates AWK do not understand
db.txt file separator “,”. We have to tell AWK what is the field separator.
32
File
awk ‘BEGIN{FS=”,”}{print $1}’ db.txt

John
Barbi
Mitch
Tim
Lisa
33
Example: OFS varaible
 This variable is useful for mentioning what is your output field separator which separates
output data.
 Example: Display only 1st and 4th column and the separator between at output for these
columns should be $.
awk ‘BEGIN{FS=”,”;OFS=” $ “}{print $1,$4}’ db.txt
John $ IBM
Barbi $ JHH
Mitch $ BofA
Tim $ DELL
Lisa $ SmartDrive
34
Example: RS varaible
 Row Separator is helpful in defining separator between rows in a file. By default AWK takes
row separator as new line. We can change this by using RS built-in variable.
 Example: I want to convert a sentence to a word per line. We can use RS variable for doing it.
echo “This is how it works” | awk ‘BEGIN{RS=” ”}{print $0}’
This
is
how
it
works
35
Example: ORS varaible
 This variable is useful for defining the record separator for the AWK command output. By
default ORS is set to new line.
 Example: Print all the company names in single line which are in 4th column.
awk -F’,’ ‘BEGIN{ORS=” “}{print $4}’ db.txt
IBM JHH BofA DELL SmartDrive
36
Printf Statement
 printf is similar to AWK print statement but the advantage is that it can print with formatting
the output in a desired manner.
 printf syntax is similar to C type printf statement.
 Syntax: printf format, Arguments
 For example you want to print decimal values of column 3 then the example will be.
awk ‘{printf “%d”, $3}’ example.txt
 Printf can do two things which print command don’t

1)Defining type of Data.
2)Padding between columns.
37
Printf Statement
Type of data
 printf can be useful when specifying data type such as integer, decimal, octal etc. Below are
the list of some data types which are available in AWK.
Format Specification
%i or %d Decimal
%o Octal
%x Hex
%c ASCII Number Character
%s String
%f Floating Number
 Make sure that you pass exact data types when using corresponding formats as shown below.
If you pass a string to a decimal
38
formatting, it will print just zero instead of that string.
Printf Statement
Padding Between Column
 We can format the columns to specify number of chars each column can use. We have
following padding formats available with printf statement.
Format Description
-n Pad n spaces on right hand side of a column.
n Pad n spaces on left hand side of a column.
.m Add zeros on left side.
-n.m Pad n spaces right hand side and add m zeros before that number.
n.m Pad n spaces left hand side and add m zeros before that.
39
Printf Statement
 Pad 5 spaces on right hand side of each column.
awk ‘{printf “%-5d%-5d%-5dn”, $2,$3,$4}’ db.txt

21 78 84
23 56 58
25 21 38
25 87 97
24 55 30
 Pad 5 spaces on left hand side of each column.
awk ‘{printf “|%5d|%5d|%5d|n”, $2,$3,$4}’ db.txt
| 21| 78| 84|

| 23| 56| 58|
| 25| 21| 38|
| 25| 87| 97|
| 24| 55| 30|
40
Printf Statement
 Add zero’s on left hand side of each column element make it a 5 digit number.
awk ‘{printf “|%.5d|%.5d|%.5d|n”, $2,$3,$4}’ db.txt
|00021|00078|00084|
|00023|00056|00058|
|00025|00021|00038|
|00025|00087|00097|
|00024|00055|00030|
Make the column element with 4 digits and 7 in length and print the number to left hand side.
awk ‘{printf “|%-7.4d|%-7.4d|%-7.4d|n”, $2,$3,$4}’ db.txt
|0021 |0078 |0084 |

|0023 |0056 |0058 |
|0025 |0021 |0038 |
|0025 |0087 |0097 | 41
|0024 |0055 |0030 |
Printf Statement
 Make the column element with 4 digits and 7 in length and print the number to right hand
side.
awk ‘{printf “|%7.4d|%7.4d|%7.4d|n”, $2,$3,$4}’ db.txt
| 0021| 0078| 0084|
| 0023| 0056| 0058|
| 0025| 0021| 0038|
| 0025| 0087| 0097|
| 0024| 0055| 0030|
42
Awk Operators
OPERATOR DESCRIPTION
Arithmetic Operator Addition(+), Subtraction(-), Division(/), Module(%), Multiplication(*)
Increment Decrement Preincrement, PreDecrement, PostIncrement, PostDecrement
Assignment Operator Simple Assignment, Shorthand Addition(+=), Shorthand Multiplication(*=),

Shorthand Division(/=) , Shorthand Module(%=), Shorthand Subtraction(*-),
Shorthand Exponent(^=)
Relational Operator =, !=, <, <=, >, >=
Logical Operator &&, ||, !
Ternary Operator ?:
Unary Operator +, -
String Concatenation 43
Array Membership In
Awk Operators
Addition
 It is represented by plus(+) symbol which adds two or more numbers. Below example
illustrates this:
$ awk 'BEGIN { a = 50; b = 20; print "(a + b) = ", (a + b) }'
 On executing the above code, you get the following result: (a + b) = 70
Subtraction
 It is represented by minus(-) symbol which subtracts two or more numbers. Below example
illustrates this:
$ awk 'BEGIN { a = 50; b = 20; print "(a - b) = ", (a - b) }'
 On executing the above code, you get the following result: (a - b) = 30

44
Awk Operators
Increment and Decrement Operators
 AWK supports following increment and decrement operators:
Pre-increment
 It is represented by ++. It increments value of operand by 1. This operator first increments

value of operand then returns incremented value. For instance in below example this operator
sets value of both operands, a and b to 11.
awk 'BEGIN { a = 10; b = ++a; printf "a = %d, b = %d\n", a, b }'
 On executing the above code, you get the following result: a = 11, b = 11
45
Awk Operators
Pre-decrement
 It is represented by --. It decrements value of operand by 1. This operator first decrements

value of operand then returns decremented value. For instance in below example this
operator sets value of both operands, a and b to 9.
$ awk 'BEGIN { a = 10; b = --a; printf "a = %d, b = %d\n", a, b }'
Post-increment
 It is represented by ++. It increments value of operand by 1. This operator first returns the
value of operand then it increments its value. For instance below example sets value of
operand a to 11 and b to 10.
$ awk 'BEGIN { a = 10; b = a++; printf "a = %d, b = %d\n", a, b }'
Awk Operators
 AWK supports following assignment operators:
Simple assignment
 It is represented by =. Below example illustrates this:
$ awk 'BEGIN { name = "Jerry"; print "My name is", name }'
 On executing the above code, you get the following result: My name is Jerry
Shorthand addition
 It is represented by +=. Below example illustrates this:
$ awk 'BEGIN { cnt=10; cnt += 10; print "Counter =", cnt }'
 On executing the above code, you get the following result: Counter = 20
Awk Operators
Shorthand exponential
 It is represented by ^=. Below example illustrates this:
$ awk 'BEGIN { cnt=2; cnt ^= 4; print "Counter =", cnt }'
Shorthand exponential
 It is represented by **=. Below example illustrates this:
$ awk 'BEGIN { cnt=2; cnt **= 4; print "Counter =", cnt }'
Awk Operators
Relational Operators: Equal to
 It is represented by ==. It returns true if both operands are equal otherwise it returns false.
Below example illustrates this:
$ awk 'BEGIN { a = 10; b = 10; if (a == b) print "a == b" }'
 On executing the above code, you get the following result: a == b
Less than
 It is represented by <. It returns true if left side operand is less than right side operand
otherwise it returns false.
$ awk 'BEGIN { a = 10; b = 20; if (a < b) print "a < b" }'
 On executing the above code, you get the following result: a < b
Awk Operators
Logical Operators: Logical AND
 It is represented by &&. Below is the syntax of Logical AND operator.
expr1 && expr2
 It evaluates to true if both expr1 and expr2 evaluate to true, otherwise it evaluates to false.
expr2 is evaluated if and only if expr1 evaluates to true. For instance below example checks
whether given single digit number is in octal format or not.
$ awk 'BEGIN {num = 5; if (num >= 0 && num <= 7) printf "%d is in octal format\n", num }'
 On executing the above code, you get the following result: 5 is in octal format
Awk Operators
Ternary Operator
 We can easily implement condition expression using ternary operator. Below is the syntax of
the same:
condition expression ? statement1 : statement2
 When condition expression returns true, statement1 gets executed otherwise statement2 gets
executed. For instance below example finds maximum number.
$ awk 'BEGIN { a = 10; b = 20; (a > b) ? max = a : max = b; print "Max =", max}'
 On executing the above code, you get the following result: Max = 20
Awk Operators
String concatenation operator
 Space is string concatenation operator which merge two strings. Below simple example
illustrates this:
$ awk 'BEGIN { str1="Hello, "; str2="World"; str3 = str1 str2; print str3 }'
 On executing the above code, you get the following result: Hello, World
Awk Operators
Array membership operator
 It is represented by in. It is used while accessing array elements. Below simple example prints
array elements using this operator.
$ awk 'BEGIN { arr[0] = 1; arr[1] = 2; arr[2] = 3; for (i in arr) printf "arr[%d] = %d\n", i, arr[i] }'
 On executing the above code, you get the following result:
arr[0] = 1
arr[1] = 2
arr[2] = 3
Awk Operators
Regular Expression Operators
 This tutorial explain the two forms of regular expressions operators with suitable examples:
Match (~)
 It is represented as ~. It looks for a field that contains the match string. For instance below
example prints lines which contains pattern 9.
$ awk '$0 ~ 9' marks.txt
2) Rahul Maths 90
5) Hari History 89
Awk Operators
Print the list of employees in Technology department
Now department name is available as a fourth field, so need to check if $4 matches with the
string “Technology”, if yes print the line.
$ awk '$4 ~/Technology/' employee.txt
200 Jason Developer Technology $5,500
300 Sanjay Sysadmin Technology $7,000
500 Randy DBA Technology $6,000

Awk Operators
Print number of employees in Technology department
The below example, checks if the department is Technology, if it is yes, in the Action, just
increment the count variable, which was initialized with zero in the BEGIN section.
$ awk 'BEGIN { count=0;}
$4 ~ /Technology/ { count++; }
END { print "Number of employees in Technology Dept =",count;}' employee.txt
Number of employees in Tehcnology Dept = 3

Awk Operators
Not match (!~)
 It is represented as !~. It looks for a field that does not contain the match string.
 For instance below example prints lines which does not contain pattern 9.
$ awk '$0 !~ 9' marks.txt
1) Amit Physics 80
3) Shyam Biology 87
4) Kedar English 85
Example
EXAMPLE COMMAND
Print first field of each line { print $1 }
Print all lines containing pattern /pattern/
Print first field of lines containing pattern /pattern/ { print $1 }
Select record having > 2 fields NF > 2
Print only lines which first field matches URGENT $1 ~ /URGENT/ { print $3, $2 }
Print number of lines matching pattern /pattern/ {x++}

END {print x}
Sum up column 2 and print the total {total += $2} END { print total}
Print lines containing < 20 characters length($0) < 20
Print lines of 7 fields and beginning with "Name:" NF==7 && /^Name:/
Example
EXAMPLE COMMAND
Search for a word and print the lines which contain either Juila or /[Jj]uila/
juila in the given file.
Print all the column 4 values between lines which contain Frank and /Frank/,/Low/{print $4}
Low
NR>3 && NR<7
Print lines from 3 to 6.
Print only lines which have 29 in its third column. $3 ~/29/

Example
 Print fields in reverse order one per line
for (i=NF; i>=1; i--)
print $i;
}
Awk Control Flow
 Like other programming languages AWK also provides conditional statement to control the flow
of the program. This tutorial explain AWK's condition statement with suitable example.
If statement
 It simply tests the condition and performs certain action depending upon condition. Below is
the syntax of the if statement:
if (condition)
action
 We can also use pair of curly braces as given below to execute multiple actions:
if (condition)
{
action-1
action-1 . . action-n }
Awk Control Flow
 Like other programming languages AWK also provides conditional statement to control the flow
of the program. This tutorial explain AWK's condition statement with suitable example.
If statement
 It simply tests the condition and performs certain action depending upon condition. Below is
the syntax of the if statement:
if (condition)
action
 We can also use pair of curly braces as given below to execute multiple actions:
if (condition)
{
action-1
action-1 . . action-n }
Awk Control Flow
 For instance below simple example checks whether number is even or not:
$ awk 'BEGIN {num = 10; if (num % 2 == 0) printf "%d is even number.\n", num }'
 On executing the above code, you get the following result: 10 is even number.
If Else Statement
 In if-else syntax we can provide the list of actions to be performed when the condition
becomes false.
 Below is the syntax of the if-else statement:
if (condition)
action-1
else
action-2
Awk Control Flow
 Print only odd numbered line of file “emp”
$ awk ' { x= NR % 2;
if ( x == 1)
print
} ‘ emp '
 On executing the above code, we use NR system variable to find number of record and check
its odd or even using if statement and print the records.
Awk Control Flow
 In above syntax action-1 is performed when condition evaluates to true and action-2 is
performed when condition evaluates to false. For instance below simple example checks
whether number is even or not:
$ awk 'BEGIN {num = 11; if (num % 2 == 0) printf "%d is even number.\n", num; else printf
"%d is odd number.\n", num }'
 On executing the above code, you get the following result: 11 is odd number.
If-Else-If Ladder
 We can easily create if-else-if ladder by using multiple if-else statement. Below simple
example illustrates this:
Awk Control Flow
$ awk 'BEGIN {
a=30;
if (a==10)
print "a = 10";
else if (a == 20)
print "a = 20";
else if (a == 30)
print "a = 30";
}'
 On executing the above code, you get the following result: a = 30

Regular Expressions in Awk
 Awk uses the same regular expressions we’ve been using
 ^ $ - beginning of/end of field
 . - any character
 [abcd] - character class
 [^abcd] - negated character class
 [a-z] - range of characters
 (regex1|regex2) - alternation
 * - zero or more occurrences of preceding expression
 + - one or more occurrences of preceding expression
 ? - zero or one occurrence of preceding expression
 NOTE: the min max {m, n} or variations {m}, {m,} syntax is NOT supported
Awk loop
 Like conditional statements, AWK also provides looping statements. It is used to execute set of
action in repeated manner. The loop execution continues as long as the loop condition is true.
This tutorial explain AWK's loops with suitable example.
For Loop
 Below is syntax of the for loop:
for (initialisation; condition; increment/decrement)
action
 Initially for statement performs initialisation action, then it checks condition; if condition is
true then it executes actions, after that it performs increment or decrement operation.
 The loop execution continues as long as the condition is true. For instance below example
prints 1 to 5 numbers using for loop:
Awk loop
$ awk 'BEGIN { for (i = 1; i <= 5; ++i) print i }'
5
Awk Associate Arry
 Print above file in reverse
$ cat reverse.awk
line[NR] = $0;
END {
for (i=NR; I>0; I++)
print line[i]
}
Awk loop
While Loop
 While loop keeps executing the action until a particular logical condition evaluates to true.
Given below is syntax of the while loop:
while (condition)
action
 AWK first checks condition, if the condition is true then it executes action, this process
repeats as long as the loop condition evaluates to true. For instance below example prints 1 to
5 numbers using while loop:
$ awk 'BEGIN {i = 1; while (i < 6) { print i; ++i } }'
 On executing the above code, you get the following result: 1 2 3 4 5

Awk loop
Do-While Loop
 The do-while loop is similar to the while loop, except that the test condition is evaluated at
the end of the loop. Given below is the syntax of the do while loop:
do
action
while (condition)
 In do-while loop action statement gets executed at least once even when condition statement
evaluates to false. For instance below example prints 1 to 5 numbers using do-while loop:
$ awk 'BEGIN {i = 1; do { print i; ++i } while (i < 6) }'
 On executing the above code, you get the following result: 1 2 3 4 5

Break
Break Statement
 As name suggest it is used to end the loop execution. Here is the example which ends the loop
when sum becomes greater that 50.
$ awk 'BEGIN {sum = 0; for (i = 0; i < 20; ++i) { sum += i; if (sum > 50) break; else print
"Sum =", sum } }'
Sum = 0
Sum = 1
Sum = 3
Sum = 6
Sum = 10 Sum = 15 um = 21 Sum = 28 Sum = 36 Sum = 45

Continue
 Continues statement is used inside loop to skip to next iteration of a loop. It is useful when we
wish to skip processing of some data inside the loop. For instance below example uses
continue statement to print the even number between 1 to 20.
$ awk 'BEGIN {for (i = 1; i <= 15; ++i) {if (i % 2 == 0) print i ; else continue} }'
2 4 6 8 10 12 14
Exit
 It is used to stop the execution of the script. It accepts an integer as an argument which will
be the exit status code for the AWK process. If no argument is supplied, exit returns status
zero. Here is the example which stops the execution when sum becomes greater that 50.
$ awk 'BEGIN {sum = 0; for (i = 0; i < 10; ++i) { sum += i; if (sum > 10) exit(10); else print "Sum
=", sum } }'
Sum = 0 Sum = 1 Sum = 3 Sum = 6
 Let us check the return status of script.
$ echo $?
 On executing the above code, you get the following result: 10

AWK ARRAY
 An array is a variable that can store a set of values or elements.
 Syntax of the array: array_name[index]=value
 Where array_name is the name of array, index is the array index and the value is any value
assigning to the element of the array.
 Also there is no need to declare the size of array in advance - array can expand/shrink at
runtime.
 Each element is accessed by a subscript called the index.
 Awk arrays are different from other ones used in other programming in following ways
AWK ARRAY
 They are not formally defined. It is declared at the moment it is used.
 Array element are initialized with zero or an empty string unless initialized explicitly.
 It expand automatically.
 The index can be virtually anything, it can even be a string.
 Consider following program….
 In following program, we use array to store the total of the basic pay, da, hra and gross pay of
the sales and marketing.
 Use the “T[ ]” array to store the total of each element of the pay and also the gross pay.
AWK ARRAY
Begin
printf “%46s\n”, “BASIC DA HRA GROSS”
/SALES|MARKETING/
DA=0.25*$6;HRA=0.50*$6;GP=$6+HRA+DA;
T[1]+=$6; T[2]+=DA;T[3]+=HRA;T[4]+=GP
C++;
}
AWK ARRAY
END
PRINTF “\t Average %5d %5d %5d %5d\n”, \
T[1],T[2],T[3],T[4]
}
Awk Array
 For insertion we used assignment operator. Similarly we can use delete statement to remove
an element from the array. Below is the syntax of delete statement:
delete array_name[index]
 Below example deletes element orange hence command does not show any output.
$ awk 'BEGIN {
fruits["mango"]="yellow";
fruits["orange"]="orange";
delete fruits["orange"];
print fruits["orange"]
}'
Awk Multi-Dimensional Arry
 Awk only supports single dimensional arrays. But we can easily simulate a multi-dimensional
array using the single dimensional array itself.
 For instance below is 3x3 three-dimensional array:
100 200 300
400 500 600
700 800 900
 In above example array[0][0] stores 100, array[0][1] stores 200 and so on. To store 100 at
array location [0][0] we can use following syntax:
array["0,0"] = 100
 Though we have given 0,0 as index, these are not two indexes. In reality it's just one index
with the string 0,0.
 Below simple example simulates 2-D array:
$ awk 'BEGIN {
array["0,0"] = 100;
array["0,1"] = 200;
array["0,2"] = 300;
array["1,0"] = 400;
array["1,1"] = 500;
array["1,2"] = 600;
# print array elements
print "array[0,0] = " array["0,0"];
 :
}'

array[0,0] = 100
array[0,1] = 200
array[0,2] = 300
array[1,0] = 400
array[1,1] = 500
array[1,2] = 600
Associated Array
 An associative array is most powerful feature of awk.
 An associative array is an array which uses strings as indices instead of integers.
 Syntax: array[string] = value
 Where array is the name of the array , string is the index of the element of the array you are
assigning a value to and value is the value we are assigning to that element.
 The subscript is often called the key and is associated with the value assigned to the
corresponding array element.
 The key and values are stored internally in a table .
 The array element are not stored in sequential order and when the content of the array are
displayed, they may not be in order we expect.
Awk Associative Arry
 The first array associate name with age. In this array, the name identify each person’s age.
 The second array associates department number with the sales for that department. The department
number in this array are string.
 The associate array structure has several design constraints that we must remember when we use
them
 Index must be unique, means each index can be associated with an array value only once.
 Date values may be duplicated.
 The association of the index with its values is guaranteed.
 The is no ordering imposed on the indexes. It means if we create an associate array and print it,
there is no guarantee that the element will be printed based on the order in which the array was
created.
 An array index cannot be sorted. The data value can be sorted.

Awk Arry
Creation
 To gain insights about array let us create and access the elements of the array.
$ awk 'BEGIN {
fruits["mango"]="yellow";
fruits["orange"]="orange"
print fruits["orange"] "\n" fruits["mango"]
}'
 On executing the above code, you get the following result: orange
yellow
 In above example we have declared array namely fruits whose index is fruit name and value is
colour of the fruit. To access array element we have used array_name[index] format.
Awk Associate Arry – For loop
 The special for loop is used to read through an associated array when string
are used as subscript or the subscript is not consecutive numbers.
 The special for loop uses the subscript as a key into the value associated with
it.
For(index variable in arrayname)
action
The index variable can be data or sequence number such as the line number
as a file is read.
Using Associative array, create sales report by department from
following sales file. Also Print Total sales of all department
Awk Associate Arry
 In above file $2 is contain the name of department and $3 contains sales.
$ cat sales.awk
deptSales[$2] += $3
END {
for (x in deptSales)
print x, deptSales[x]
$ awk –f sales.awk sales

Awk Associate Arry
 Output
computer : 21482
supplies : 2242
textbooks : 36774
clothing : 6393
Total Sales : 66891

Process each input line
93
94
95
Awk Built In Function
 Awk has several built in function performing both arithmetic and string operations.
 The parameter are passed to a function, delimited by comma and enclosed by a matched pair
of parenthesis.
FUNCTION DESCRIPTION
Int(x) Returns integer value of x.
Sqrt(x) Returns square root of x.
Length Return length of complete record.
Length(x) Return length of x.
Substr(s1,s2,s3) Retrurn portion of string of length s3, starting from position s2 in

string s1.
Indexs(s1,s2) Returns position of s2 in string s1.
Split(s,a) Split string s into array a; optionally return number of fields.

length
 It determines the length of its argument.
 Syntax: length(string)
 If no argument is present, it assumes the entire line as argument.
 Following statement is used to locate the records whose length exceeds 57 characters.
Awk –F “|” ‘length > 57 ‘ emp.txt
 For example in emp.txt file 2nd field contain the name of employee. To list out those
employee whose name is less than 10 character.
Awk –f “|” ‘length($2) < 10’ emp.txt
 Locate line in file which contain line between 100 and 150 character
Awk –f “|” ‘length > 100 && length < 150’ emp.txt
index
 It determines the first position of string within a large string.
 Syntax: index(string, substring)
 If substring is found then it returns its position, if not found it returns zero.
 For example, we have a field which contain string “abcde”, we can use this function to find
out whether “b” is present or not
x = index(“abcde”,”b”)
 Output: 2
substr
 It extract a substring from a string. It has two format
 Syntax: substr(string, position)
substr(string, position, length)
 If length is specified, it returns up to length character for position.
 For example, we have a field which contain string “abcde”, we can use this function to find
out whether “b” is present or not
x = substr(“abcde”,2)
Output: bcde
x = substr(“abcde”,2,2)
Output: bc
split
 It Divide string into pieces separated by fieldsep and store the pieces in array and the
separator strings in the seps array
 Syntax: split(string, array)
split(string, array, field seperator)
 In first format, the field in a string are copied into an array. The first piece is store in
array[1], the second in array[2] and so on. The end of each field is identified by a field
separator character.
 In second format, the field separator is specified as the parameter.
 Before splitting the string, split() deletes any previously existing elements in the arrays array.
sub
 It substitute one string into another.
 Syntax: sub(regexp, replacement_string, in_string)
 The string which you want to replace is written as “regexp”.
 It returns 1, if the substitution is successful, otherwise 0.
 For example, Following script substitute “hello” with “hi”
{ result=sub(/hello/,”hi”,$0);
if(result==0)
print NR,$0
NOTE : Substitute only first position. To apply in all position use “gsub” function
match
 It return starting position of the matching expression in the line.
 Syntax: match(string,regexp)
 The there is no matching string, return 0.

toupper
 It return string into upper case
 Syntax: toupper(string)
 For example : toupper(“amit”)
 Output: AMIT
Built-In Arithmetic Functions
Function Return Value

atan2(y,x) arctangent of y/x (-p to p)
cos(x) cosine of x, with x in radians
sin(x) sine of x, with x in radians
exp(x) exponential of x, ex
int(x) integer part of x
log(x) natural (base e) logarithm of x
rand() random number between 0
and 1
srand(x) new seed for rand()
sqrt(x) square root of x
Built-In String Functions
Function Description
gsub(r, s) substitute s for r globally in $0, return

number of substitutions made
gsub(r, s, t) substitute s for r globally in string t, return
number of substitutions made
index(s, t) return first position of string t in s, or 0 if t is
not present
length(s) return number of characters in s
match(s, r) test whether s contains a substring matched

by r, return index or 0
sprint(fmt, expr-list) return expr-list formatted according to format
string fmt
Built-In String Functions
Function Description
split(s, a) split s into array a on FS, return number of fields
split(s, a, fs) split s into array a on field separator fs, return number of fields
sub(r, s) substitute s for the leftmost longest substring of $0 matched by r
sub(r, s, t) substitute s for the leftmost longest substring of t matched by r
substr(s, p) return suffix of s starting at position p
substr(s, p, n) return substring of s of length n starting at position p

Awk

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Awk

Cargado por

Copyright:

Formatos disponibles

awk

 Awk views a text file as records and fields.

 Awk has arithmetic and string operators.

 Awk can generate formatted reports

awk [options] ‘script’ file(s)

awk [options] –f scriptfile file(s)

 If pattern is missing, action is applied to all lines.

Example: awk '/for/' testfile

 prints all lines containing string “for” in testfile

 If no pattern matches, no action will be performed.

 Each statement in Actions should be delimited by semicolon.

 It is identifies by the keyword, BEGIN, and instruction are enclosed in a set of

 The body is a loop that processes the data in a file.

End Processing (END)

 This allows for initial and wrap-up processing

BEGIN { print “NAME RATE HOURS”; print “” }

END { print “total number of employees is”, NR }

 Each line in text file is consider as a record.

 It provide two types of buffer:

By default Awk prints every line from the file.

$ awk '{print;}' employee.txt

100 Thomas Manager Sales $5,000

200 Jason Developer Technology $5,500

300 Sanjay Sysadmin Technology $7,000

400 Nisha Manager Marketing $9,500

Implement : egrep ‘Thomas | Nisha’ employee.txt

> /Nisha/' employee.txt

100 Thomas Manager Sales $5,000

400 Nisha Manager Marketing $9,500

$ awk '{print $2,$5;}' employee.txt

Print employee detail report with heading and ending information.

$ awk 'BEGIN {print "Name\tDesignation\tDepartment\tSalary";}

> {print $2,"\t",$3,"\t",$4,"\t";}

> END{print "Report Generated\n--------------"; }' employee.txt

Name Designation DepartmentSalary

Thomas Manager Sales $5,000

Jason Developer Technology $5,500

Sanjay Sysadmin Technology $7,000

 There are two types of variables

 User Defined Variables

 AWK built-in variables are listed below in table

NR Current count of the number of input records.

NF Keeps a count of the number of fields.

FILENAME The name of the current input file.

FNR No. of records in current filename.

FS Contain the “field separator” character.

RS Stores the current “record separator” or Row Separator.

OFS Stores the “output field separator”.

ORS Stores the “output record separator” or Output RS.

$ awk '{print NR, $0}' emps // same line cat –n emps.

$ awk '{print NR, $1, $2, $5}' emps

$ awk '{print NR, $1, $2, $5}' emps

$ awk -F “ “ ‘NR=2,NR==4 {print NR, $0}' emps

Print First 2 line OR

Write following command using awk: head -2 emps.

1 Tom Jones 4424 5/12/66 543354

Print all line after 2 line OR

Write following command using awk: tail +2 emps.

$ awk ‘NR>=2’ EMPS.

Mary Adams 5346 11/4/63 28765

 The final value of a row can be represented with $NF.

$ awk '{print $2,$NF;}' employee.txt

$ awk '{print $NF;}' employee.txt