Test Manipulation

Unit-6 1
Unit-6
Test Manipulation
INSPECTING FILES
Let us first look at commands that allow us to inspect files without altering them.
For example, we might want to find out how many words there are in a file, or we
might want to locate places in the file which contain a particular text expression.
Before going further, we must be clear as to what a text file is. This is a file which
contains only printable characters and which is organised around lines. Although
in some cases we can alter the files, these commands are really meant to let us
look at the files or to find about them
File Statistics
The wc command tells you the number of characters, words and lines in a text
file.
% wc quotation
8 43 227 quotation
This means that quotation has 8 lines, 43 words and 227 characters. A word is a
string of characters delimited by any combination of one or more spaces, tabs or
newlines. If you wish you can make wc operate on the standard input,
whereupon you will not find any filename displayed in the output.
% WC
No generalisation is ever wholly true, including this one.
The problem with equality is that we desire it only with our superiors.
^D
2 202 130
This also means you can use wc in a pipe, either to read from or write to. Thus
ACME/Gulshan Soni
Unit-6 2
% cat quotation | we
8 43 227
or
% who | wc
9 45 333
In both cases the method used is perhaps not the most natural one. For
example, to find out the number of users in a system you could say
% who -q
You can do a wc on several files at a time and then you get an additional line of
output giving the total figures.
If you wish, you can find only the number of lines in the input by using the -1
option, only the number of words by using - w and only the number of characters
by saying -c. These options can be combined in any order. So
% wc -cl quotation
227 8 quotation
You can see that
% WC -Iwc
is the same as we.
Searching for Patterns
We can now come to a few commands which help in locating patterns in files.
One such program is grep (for global regular expression printer). It takes one
regular expression which you want it to search for, and looks for it one by one in
all of the specified fields. Whenever grep finds a line in a file that contains the
pattern, it prints the line on the standard output. if more than one file was given to
grep to search. the line is preceded by the file name in. which the match was
found, followed by a colon. If only one file was to be searched then only the line
is printed.
A word on regular expressions is in order. A regular expression is away of
ACME/Gulshan Soni
Unit-6 3
specifying a template or pattern which can match several text strings according
to certain rules. For specifying the template, some characters are used with a
certain meaning. Such characters are called metacharacters. Thus a dot (.)
matches any single character. We will not go into the details of the rules
governing regular expressions here, because You must have learnt about them
in Your compiler design course. Regular expressions are used there specify
languages consisting of legal sentences from an alphabet. From such a
specification you must have learnt how to construct a lexical analyser which
accepts only valid sentences, that is, sentences of the language specified bv the
regular expression. In the sent context, our alphabet is the set of printable
character; and the language is the set of all the text strings that match the regular
expression. You should refer to your UNIX manual to find out the exact rules for
constructing regular expressions for grep.
Since the C-shell itself attaches a special meaning to many of the

metachamcters, you will need to tell the shell not to interpret the regular
expression which You are trying to pass to grep. Single quotes we the safest way
of telling the shell this. So the regular expression argument to grep should be
enclosed in single quotes, although double quotes also do work in many cases.
We will examine this matter in the next unit on Shell Programming.
Unfortunately the meaning attached to metacharacters in different utilities of

UNIX is not always consistent. For example, in grep, as we just saw an arbitrary
single character is matched by a period (.) while in the C-shell this is done by the
question mark (?). This is a potential source of confusion, and all the more so
because a beginner can find it hard to construct or even interpret a regular
expression anyway. However, with practice this difficulty reduces somewhat.
Moreover, not all utilities support regular expressions in their fullest
manifestation, and actually the degree of support varies amongst them.
By now you will be complaining because you want to see some real examples,
not endless commentary on the command. So here we go
% grep Gupta Payfile
tells You where the suing "Gupta" occurs in the file payfile. As shown here grep
is matching a text string exactly. Every line in payfile that contains the given
string anywhere will be printed. You can give more than one file as an argument.
% grep Thomas custfiie orderfile
If you want to know the line number in the file of the line on which the matches
were found, say
% grep -n Australia country
ACME/Gulshan Soni
Unit-6 4
To count the number of lines which matched, just say
% grep -c India prodfile
This will not print the line and only the count will be shown. You can invert the
sense of a match like this
% grep -v India prodfile
This command will print lines that do not include the string India. Remember that
grep looks for only one regular expression but can look at More than one file. So
do not try
% grep Ram Kumar users
grep: can't open Kumar
to look for Ram Kumar in a file users. The Command as shown will look for a
string Ram in the two files called Kumar and Users. Instead you should say
% grep "Ram Kumar. users
whereupon Ram Kumar will be searched for in the file users. There is also an
option to turn off case sensitivity. So
% grep -i "Ram Kumar" users
will find any occurrence of Ram Kumar Irrespective of case. Thus this would
report RAM KuMAr as a match. What if there could be occurrences of die string
in the file with an unknown number of spaces between the two words? You will
now need to use regular expressions.
% grep "Ram *Kumar" users
matches Ram Kumar in this case. The * metacharacter specifies a closure

meaning that the preceding pattern is to be matched 0 or more times, which is
what we want here.
Grep is line oriented and patterns are not matched across line boundaries. The
metacharacters ^ and $ stand for the beginning and the end of a line
respectively. So to look for an empty line, say
% grep 'A$' users
But if you are looking for blank lines, say
ACME/Gulshan Soni
Unit-6 5
% grep '^[ ^I]*$' users
The [^I] is the character class consisting of spaces and tabs, and the *
metacharacer is a closure which looks for 0 or more occurrences of these.
To see whether khanz is a valid login, name, say
% grep "Akhanz" /etc/passwd
because the login name is the first field in the passwd file. You can get every line
in a file with line numbering by saying
% grep -n . letter
This is like a cat on the file but with the line numbers displayed too. To find lines
containing a number, say
% grep '[0-9]' table
which will find a sequence of one or more digits.
We have seen that grep cannot search for more than one regular expression at a
time. There is another utility called egrep which can handle regular expressions
with alternations. We will not look at it here but you should study the manual
entry for it.
There is another utility in this family called fgrep which does not handle regular
expressions. Since it handles only fixed text strings, however, it is faster. Thus
you can say
% fgrep "Ram Kumar" empfile custfile
Another advantage to this command is that you can store a list of words in a file,
say search, one word per line. You can then look for the occurrence of any of
those words in a file like this
% fgrep -f search story
Usually grep is sufficient for everyday use but whenever needed you can make
use of fgrep or egrep.
ACME/Gulshan Soni
Unit-6 6
Comparing Files
We will now look at a group Of utilities which help us compare two files. While
talking of cp in #2.4.8, we did not know of commands which could help us
ascertain whether the original and the copy indeed had the same contents. First
let us make a copy of the passwd file in our directory and then examine the two
% cp /etc/passwd ~
% cmp /etc/passwd -/passwd
The cm command takes two filenames as arguments and prints on the standard
output the character offset and the line number of the first position where the two
files differ. It is useful in comparing two binary files to see whether they are the
same. It is not of much help in comparing two text files to see how they differ,
because if you add or delete even one character in one of two different
characters in the two files. So you can try something like
% cmp lbinils /bin/cp
/bin/IS /bin/cp differ. char 27, line I
to see that they differ.
To look at all the differences in two files say
% cmp -1 /bin/Is /bin/cp
and you should be flooded with several thousand lines of output, each line
containing the bytes offset, the character in the first files (Is) represented in octal
and the character in the second file (cp) also in octal, for every byte position
where the two files differ (almost all in this case, one would imagine) until one or
both the files end.
If one file is shorter than the other but no differences are detected in the two upto
the point the shorter file ends, cmp reports end of file on the relevant file.
Now let us turn our attention back to text files. Suppose we have two text files
which are sorted in ascending order. Now try
% comm file1 file2
This produces on the standard output three columns of text. The first column
contains lines that are to be found only in the first file and not in the second. The
second column likewise contains lines present only in the second file. 'Me third
column contains lines common to both files. T-his output could be all jumbled up
ACME/Gulshan Soni
Unit-6 7
if the files are not sorted. You can suppress the printing of one or more columns
like this
% comm -1 file 1 file2
This suppresses the printing of lines only in the first file. So
will print lines only in file 1 and those only in file2 but not those that are common
to both files (column 3). You can suppress two columns as well. Thus to print
only lines in file 1, say
As you would expect
will print nothing.
cmp and comm are simple commands and it would be easy to write a program to
accomplish what they do. We will now take a quick look at a utility which is far
more complex. The diff command takes two text files as arguments and brings
out the smallest set of differences between them. It can also produce output
which can be used by the text editor ed to produce the second file from the first
At the heart of diff is a complex algorithm to find the largest common
subsequences in two blocks of text Let us look a little more at how these utilities
can help you. Suppose you have a file containing the names of a few places you
would like to visit, as follows
% cat places
Agra
Cochin
Delhi
Goa
Guwahati
Jhansi
Puri
ACME/Gulshan Soni
Unit-6 8
Secunderabad
^D
Also let there be, another file containing the names of places a friend of yours
would like to visit.
% cat moreplaces
Agra
Guwahati
Goa
Gwalior
Kochi
Madras
Udaipur
^D
Now you want to plan out an itinerary after discussion. In this discussion if you
both agree about wanting to visit a place, there is no difficulty. Otherwise you will
have to decide what to do. So first you need to know whether you disagree at all.
We will assume here that both the files are sorted, and later in this unit we shall
see how this can be easily done. So to find out about the disagreements, we can
say
% Cmp places morephices
Places and moreplaces differ char 6 line 2
Well, it was too much to expect complete agreement. To find out the differences
you can use comm.
% comm places moreplaces
Here column 3 will tell you about the places you both agree upon. Now you only
have to discuss columns 1 and 2 to arrive at an agreement. But we have still not
talked of diff. Say
ACME/Gulshan Soni
Unit-6 9
% diff places moreplaces
This indicates the differences between the files in three ways a, d and c. The a
stand for lines which have been added, d for lines deleted and c for lines
changed between the two files. The symbol refers to the first and to the second
file. We will not discuss the command a length but will see a few options to diff.
% diff -e places moreplaces
Produces output in a form suitable for the editor ed. You can save this output to a
file and apply the change file to the first file to produce the second. If you are
wondering why one would want to do such a thing, you should wait until the unit
on programming tools, where we discuss version control. The essence of it is
that instead of storing every version of a file completely. One stores only the
initial version and all the change to it. One can always recreate any version by
applying the appropriate set of changes to the initial version. There is another
option to diff which ignores all but leading white space on a line
% diff -b places moreplace
diff can handle files of a limited size only. There is a Command called bdiff which
can be used for large files, but it just uses diff after breaking up the files into
manageable chunks. So difference across chunk boundaries may not come out
optimally.
Another command is sdiff, which works like diff but places the output from the
two files side by side. Lines that are present in one file but not in the other are
shown by and. Lines that are present in both files but differ somewhat are shown
separated by a pipe (|). This command can be used to merge two files into one,
keeping the common portion intact and incorporating the differing parts of both
files.
OPERATING ON FILES
We can now look at several utilities which will allow us to alter files in some way.
The utilities in the previous section, in contrast, allowed us to look at the files
without manipulating their contents. However, most of these utilities are filters
and can write out the changed file only to the standard output, hence it can be
redirected to another disk file. Very few commands allow you to change a file
inplace.
ACME/Gulshan Soni
Unit-6 10
Printing Files
If there is a long file and you cat it to the screen, the output is difficult to
understand because there are no page breaks, headers and the like. If you
redirect the output to a printer, the resulting file is a long stream of lines without
regard to the page length of your stationery. To get a formatted output, you can
use the pr command.
% pr places
pr breaks up the file into pages with a header, text and footer area. The header
contains a line giving the date, the name of the file and the page number. The
length of the page can be altered by the -1 option and the header can be set by
the -h option. Thus if you want to print itinerary as the heading instead of the
filename places, say
% pr -h Itinerary places
The header and footer can be suppressed by the -t option. You can expand tabs
to any desired number of spaces by using the -e option followed by the number.
Thus to expand tabs to 4 spaces instead of the default of 8, say
% pr -e4 places
You can give a left margin to your output by using the -o option followed by the
number of characters you want to use for the margin. Thus to have a 5 character
margin, say
% pr -o5 places
You can also set up double space printing by using the -d option. If you want to
print in more than one column, just use the -n option where n is the number of
columns you want. So to print in two column format, use
% pr -2 places
The column separator is a tab by default but can be changed by the -s option to
whatever single character you want. You only have to put your desired separator
after the -s. The width of the output can be changed by using the -w option. For
example, if you are using 132 column stationery, you can say
% pr -4w132 places
which will print the file in 4 column format with the width being 132 characters. If
you want to merge several files, you can use the -m option. Thus
ACME/Gulshan Soni
Unit-6 11
% pr -2m places moreplaces
will print the two files, one per column. You can use the -p option to pause after
every page if the output is to a terminal. Thus it could be some sort of a
substitute for more or pg, although pr will not provide the several other features
that more has (pattern matching, for instance).The output from pr is usually
redirected to a printer to produce a hard copy. It is rarely useful to just look at a
formatted file on the terminal.
Rearranging Files
There are two commands which will enable you to obtain a vertical section of a
text file. This is like implementing the projection of a database relation. Let us say
that we have a file studfile containing the names of students and the marks they
have obtained in some examination.
From this we want to create a file containing only the names. The cut command
is well suited to perform such a task. Let us look at a small portion of the file
Ajay Sapra 87
Pappu Ahmed 85
Vinod Bhalla 91
You can see that the names extend from column 1 to 20 and the marks are in
columns 21. To obtain the names alone we can cut out those columns like this
% cut -c 1-20 studfile
Ajay Sapra
Pappu Ahmed
Vinod Bhalia
This gives us the columns 1 to 20 of the file studfile oh the standard output
Similarly to get the marks alone (for some analysis, for example) you can say
% cut -c21,22 studfile
or
% cut -c21-22 studfile
or
ACME/Gulshan Soni
Unit-6 12
in this case, even
% cut -c21- studfile
This last command cuts out all the columns starting from column number 21.
Remember that cut does not affect the original file in any way. It does the
transformation only onto the standard output which can be redirected as always,
if You want it in a disk file. Now suppose you want the surnames of all the
students in surfile. Can you do this with what you already know?
You will find that you cannot achieve the desired result because the first 20
columns, which contain the name, are actually Only one fixed length field (name)
of studfile as currently organised. The first name and the surname take up an
arbitrary number of columns out of these 20. In other words, the first name and
the surname are not of fixed length. So there are no parameters you can give to
the -c option of cut which will be correct for all records in this file.
In such a case You must tell cut to work with variable length fields rather than
column positions. So say
% cut -fl,2 studfile
to try and get the names alone. You might be a trifle surprised at the result
because there will be no effect. If so, it was because you expected that the field
separator would be a space. But actually cut expects the fields to be separated
by tabs by default. To tell it to consider a space (or any other character) as the
separator, use the -d flag before specifying the field numbers
% cut -d" " -f2 studfile surfile; cat surfile
Sapra
Ahmed
Bhalla
You can. create another file containing only the first names
% cut -d" " -f1 studfile firfile
and we might put the marks into a file as well
% cut -d" " -f3- studfile marksfile
ACME/Gulshan Soni
Unit-6 13
Since every space is now considered to delimit a field, we have to cut out every
field from the 3rd field onwards. That is why You will find it necessary to give the
hyphen after f3. We. have now separated studfile into three files, each containing
one of the fields of the file. Let us now see how we can put the fields back in a
different order.
Suppose we want the marks list but with the names given as surname followed
by a comma and the first name. and followed by the marks secured. We have all
the components available with us in the three files we just created. To put them
back we can use the paste command like this
% paste -d", " surfile fiffile marksfile
Sapra, Ajay 87
Ahmed, Pappu 85
Bhalla, Vinod 91
What does this command do? It writes lines to the standard output and
constructs each line by concatenating lines from the files specified with the field
separator for that field. Thus the first line of the output consists of the first line of
the files in the order they are specified on the command line, with the first
delimiter being used after the first field, the second after the second, and so on. If
only one delimiter is given, it is used to delimit all fields. The default delimiter is a
tab character.
We could have achieved this result using only two intermediate files because cut
and paste are both filters.
% cut -d" " -f2 studfile 1 paste -d". " - firfde marksfile
Whenever a command accepts multiple filenames, one can use - to specify that
the standard input be used at that point So we could also have achieved our
result by using only two intermediate files like this
% cut -d" " -f 1 studfile 1 paste -d", " surfile – marksfile
ACME/Gulshan Soni
Unit-6 14
Sorting Files
While cut and paste allow you to rearrange a file vertically, it is very common to
want to rearrange a file horizontally, that is, to sort it in some order. UNIX has an
elaborate sort command which allows you to sort files in various ways with a
variety of options. Here we will look at some of the features of the sort command.
Consider a file empfile containing the first name, the surname, the date of joining
the company, the employee number and the basic salary
Ram Gupta 24/03/84 2038 15200.00
Harish Gupta 18/10/89 5496 4300.00
Thomas Robinson 04107/87 3562 4800.00
Gopal Das 28/02191 8764 4400.00
Anil Jain 13/09/85 2867 6500.00
The UNIX sort is based on fields of variable length and the field delimiter can be
specified. The default is the space character. Let us see the result of sorting
empfile
% sort empfile
Anil Jain 13/09/85 2867 6500.00
Gopal Das 28/02191 8764 4400.00
Harish Gupta 18/10/89 5496 4300.00
Ram Gupta 24/03/84 2038 15200.00
Thomas Robinson 0.4/07/87 3562 4800.00
As you can see the result is written to the standard output. Sort can read from the
standard input and is thus a fitter. The default mode of sorting is in the collating
sequence of the machine, ASCII for example, and in ascending order starting
from the first character of the line.
To sort on the surname, which is the second field, you can say
% sort+ 1 empfile
Gopal Das 28/02/91 8764 4400.00
ACME/Gulshan Soni
Unit-6 15
Ram Gupta 24/03/84 2038 15200.'k)0
Harish Gupta 18110/89 5496 4300.00
Anil Jain 13/09/85 2867 6500.00
Thomas Robinson 04/07/87 3562 4800.00
So the +1 means that sorting starts at the second field. To sort on multiple field
ranges, you can give the field number to stop sorting at
%sort + 1- 2 empfile
Gopal Das 28102191 8764 4400.00
Harish Gupta 18/10189 5496 4300.00
Ram Gupta 24103184 2038 15200.00
Anil Jain 13/09185 2867 6500.00
Thomas Rqbinson 04107187 3562 4800.00
Let us now try
% sort +4 empfile
Ram Gupta 24103184 2038 15200.00
Harish Gupta 18/10189 5496 4300.00
Gopal Das 28102191 8764 4400.00
Thomas Robinson 04/07187 3562 4800.00
Anil Jain 13/09/85 2867 6500.00
How has Ram Gupta with the highest basic salary of Rs 15,200.00 come at the
beginning of the list? This is because sort sorts from left to right in the ASCII
collating sequence and 1 is smaller than any other digit in this case. So the field
starting with 1 appears at the beginning. In other words sort looks at the
dictionary order rather than the numeric value of the field. To make sort use the
numeric order for numeric fields, say
% sort -n +4 empfile
ACME/Gulshan Soni
Unit-6 16
whereupon the record for Ram Gupta will appear at the end.
Let us now see how to sort on portions of fields. A practical example is when you
have the dates given in dd/mm/yy form as above and you want to sort in the
ascending order of the date. If the dates were in yy/mm/dd order there would
have been no problem. So we need to sort on the 7th and 8th characters of the
third field followed by the 4th and 5th characters and the 1st and 2nd characters.
Note that including a constant "f' character in between will not make a difference,
but to illustrate the syntax we will exclude this character.
% sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile
Ram Gupta 24103/84 2038 15200.00
Anil Jain 13/09/85 2867 6500.00
Thomas Robinson 04/07187 3562 4800.00
Harish Gupta 18/10/89 5496 4300.00
Gopal Das 28102191 8764 4400.00
The field delimiter can be any character other than the default space, in which
case it has to be specified with the -t option
% sort -t"|" +2 -3 +0 -1 testfile
will sort on the 3rd and 1st fields of testfile considering the "I" character to be the
field delimiter.
If there is more than one record with the same value, you can get unique records
by using the -u option, and duplicate records will not be repeated in the output.
You can specify an output file where the output is to be written or you can
redirect the output if you want.
% sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile -o emp.out
writes the result to the file "emp.out'. The sort command is one of the few utilities
which can work inplace. So the output file can be the same as the input, but do
not try this using redirection unless the environment variable noclobber is set!
% sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile -o empfile
Now empfile will have changed after the command completes. Ibis method is
particularly useful when you are sorting a large file and do not have space to
ACME/Gulshan Soni
Unit-6 17
keep both the unsorted and sorted files on the disk. But remember that sort uses
temporary space in the directory /usr/tmp, so there must be enough space there
or your sort will abort, though your source file will not be overwritten unless the
sort has completed successfully.
The sort command is not limited to sorting one file. You can sort, several files in
the same manner simultaneously by giving the names of all the files on the
command line, but remember that the output will then be all in one file. You can
check if a file is sorted in a particular manner by giving the sort command with
the -c (check) option. You can sort in reverse order by using the -r option.
To merge two or more files that are already sorted, use sort with the -m (merge)
option. This is of course much faster than sorting the files from scratch.
Incidentally the UNIX sort is not a very efficient one.
Splitting Files
Sometimes one wants to split files into pieces. We will take a practical example
later, and first see how we can do the job. Suppose there is a large file called
"stores" consisting of the stores transaction data of a large organisation. If the file
is a large one, say with 324532 records, you might at times want to split it For
this you can say
% split stores
and the file will be split into 1000 line pieces. Each piece will be stored in a file.
The last piece will have whatever is left after the penultimate piece has been
created, which in this case will mean that the last piece has 532 lines.
What are the pieces called, you might ask. The files are by default named xaa,
xab, xac, ...I xaz, xba, xbb and so on upto xzz. So you cannot split a file into
more than 676 pieces using the split command.
If you want to, you can specify a prefix different from x to name each of the
portions produced by indicating it on the command line. Thus if you want to call
the pieces partaa, partab and so on, you can say
% split stores part
You can also change the number of lines that are put into each piece by giving
this number on the command line. So
% split -10000 stores part
ACME/Gulshan Soni
Unit-6 18
will split the file into 10000 line pieces instead of the default of 1000. Note that
the split is done based on lines in each piece rather than on the size in bytes of
each piece. Also there is no way of automatically telling split to produce a
specified number of pieces. Thus if you need to split a file into exactly 20 pieces,
you will have to first determine the number of lines in the file. Then work out a
piece size which will give you the number of pieces you want (20 in this case). It
should be easy to see that there can be more than one piece size which will
produce a specified number of pieces from a file, because the size of the last
piece can vary depending on what is left over. However, keep in mind that split
will also work if the number of characters in a line is not fixed, that is, when you
have variable length lines.
There can be various situations where it might be necessary or desirable to split

a file into several parts. Let us look at one such situation. Imagine a data file of
100 MB in a partition on the hard disk which has 150 MB of free space left. The
file has 100000 records of 1000 bytes each, including the newline. Also assume
that the partition which contains the /usr/tmp directory has only 80 MB of space
free. We want to sort our 100 MB data file. How can we do this?
You know that the sort command uses temporary work space in the /usr/tmp
directory and that it needs about 1.2 times the size of the file as work space. So
to sort a 100 MB file the /usr/tmp partition must have at least 120 MB of free
space. Since this is not so in this case, we cannot sort the file directly although
there is enough space to hold the sorted file.
What you can do is to split the source file into two parts of 50 MB containing
50000 records each. Now sort each piece separately inplace. There is enough
space in /usr/tmp to sort a 50 MB file. Then merge the two pieces together using
the -m option to the sort command. This option does not need much work space.
By reducing the size of the files to be sorted you can accomplish your goal of
sorting the file.
Translating Characters
There is a very useful command to translate characters in a text file. Suppose we

have a file quotation
% cat quotation
Chess, like music, like love, has the power to make men happy
and we want to change all letters to capitals or upper case. We can do this easily
by using the tr command
ACME/Gulshan Soni
Unit-6 19
% tr '[a-z] " [A-Z]' quotation QUOTA77ON
CHESS, LIKE MUSIC, LIKE LOVE, HAS THE POWER TO MAKE MEN HAPPY
Notice that letters that are already upper case are not affected because there is
no translation specified for them. The tr command takes two arguments which
specify character sets. Every character from the first set is replaced by the
corresponding character from the second set.
The command is a filter and takes input from the standard input and writes to the
standard output. If you want to use the command on disk files you will need to
redirect the input and the output accordingly, as has been done in the example
above.
The arguments, that is, the character sets can be specified either by enumerating
them or as ranges. In the example given both the arguments have been specified
as ranges. For this to be possible the characters must be in the ascending order
of the ASCII collating sequence without any gaps.
To implement Caesar's cipher, for instance, you can use tr on the source file. In
this primitive cipher, every letter of the Roman alphabet is shifted forward by
three characters. Thus a becomes d, b becomes e and z becomes c. So try ,
% tr '[a-z]'defghijklmnopqrstuvwxyzabc plaintext ciphertext
Here we have specified the first character set as a range but the same cannot be
done for the second. So the second character set has been enumerated in full.
'Me command given will not encipher upper case letters. If you want to change
these too, or also want to change digits, you can modify the command
appropriately.
As usual, if a character in the command has special meaning to the shell, it

needs to be escaped. Here we have used single quotes to escape the square
brackets although double, quotes could have worked as well.
What happens if the number of characters in the two character sets does not
tally? Well, if there are more characters in the second set than in the first, there is
no problem because there will never be occasion to translate to them. If there are
more characters in the first set, the extra characters are ignored. Thus
% tr '10-91' '19-fl' srcfl targetfl
will change 0 to a, 1 to b, and so on, making a 5 into an f. Ale digits 6, 7, 8 and 9

will not be changed because there is no translation specified for them.
The command has some other facilities. We can delete any set of characters
ACME/Gulshan Soni
Unit-6 20
from the input by using the -d option of tr and specifying only one character set
So if you want to get rid of punctuation marks like a semicolon, a colon, a dash
and a comma, you can say
% tr -d srcfl targets
The characters can also be specified by giving their octal representation after a
backslash. Thus to delete all tab characters, you can say
% tr -d '\0 1 1 ' srefl targetfl
There is also the -s Or squeeze option using which you can collapse or squeeze
multiple (Consecutive) occurrences of a character to a single occurrence of that
character. Thus to replace multiple spaces by a single space, you can say
% tr -s ' ' srcfi targetfl
Finally we will look at the complement option specified by - c. This option

complements or inverts the character set you specify. While
% tr -d , [0-91 1 srcfl targetfl
will delete all digits from srcfl and write the result out into targetfl, using the -c
option with this will delete everything except digits from the file
% tr -cd '[0-9]' srcfl targctfl
leaves only digits from srcfl in targetfl.
ACME/Gulshan Soni

Test Manipulation

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Test Manipulation

Cargado por

Copyright:

Formatos disponibles

Unit-6 1

No generalisation is ever wholly true, including this one.

You can see that

is the same as we.

Searching for Patterns

A word on regular expressions is in order. A regular expression is away of

Since the C-shell itself attaches a special meaning to many of the

Unfortunately the meaning attached to metacharacters in different utilities of

% grep Gupta Payfile

% grep Thomas custfiie orderfile

% grep -n Australia country

To count the number of lines which matched, just say

% grep -c India prodfile

% grep -v India prodfile

% grep Ram Kumar users

grep: can't open Kumar

% grep "Ram Kumar. users

% grep -i "Ram Kumar" users

% grep "Ram *Kumar" users

matches Ram Kumar in this case. The * metacharacter specifies a closure

% grep 'A$' users

But if you are looking for blank lines, say

% grep '^[ ^I]*$' users

To see whether khanz is a valid login, name, say

% grep "Akhanz" /etc/passwd

% grep '[0-9]' table

which will find a sequence of one or more digits.

% fgrep "Ram Kumar" empfile custfile

% fgrep -f search story

% cmp /etc/passwd -/passwd

% cmp lbinils /bin/cp

/bin/IS /bin/cp differ. char 27, line I

to see that they differ.

To look at all the differences in two files say

% cmp -1 /bin/Is /bin/cp

% comm file1 file2

% comm -1 file 1 file2

This suppresses the printing of lines only in the first file. So

% comm -3 file 1 file2

% comm -23 file 1 file2

As you would expect

% comm -123 file 1 file2

will print nothing.

% Cmp places morephices

Places and moreplaces differ char 6 line 2

% comm places moreplaces

% diff places moreplaces

% diff -e places moreplaces

% diff -b places moreplace

% pr -2m places moreplaces

% cut -c 1-20 studfile

% cut -c21,22 studfile

% cut -c21-22 studfile

in this case, even

% cut -c21- studfile

% cut -fl,2 studfile

% cut -d" " -f2 studfile surfile; cat surfile

% cut -d" " -f1 studfile firfile

and we might put the marks into a file as well

% cut -d" " -f3- studfile marksfile

% paste -d", " surfile fiffile marksfile

% cut -d" " -f 1 studfile 1 paste -d", " surfile – marksfile

Ram Gupta 24/03/84 2038 15200.00

Harish Gupta 18/10/89 5496 4300.00