Está en la página 1de 17

Unit 2: Lexical Analysis

Q.1 Construct a NFA for following regular expression using Thompsons notation and then
convert it into DFA. aa*(b | c) a*c# (Dec 2012) ( 7 marks)
Answer:
NFA

Conversion of NFA into DFA


Initial state=-closure({0})

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Now DFA is as follow

Transition table

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Q.2 Construct a DFA without constructing NFA for following regular expression. Find
minimized DFA. a*b*a(a | b)*b*a# (Dec 2012) ( 7 marks)

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Q.3 Convert the following NFA- into equivalent NFA. Here is a ^-transition.
( May 2012) (7 marks)

Answer:

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Transition Table

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Equivalent NFA:

Q.4 Construct a DFA for a given regular expression (010+00)*(10)*


( May 2012) (7 marks)
Answer:

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Initial state of DFA is {1, 4, 6, 8}


Find Transition
Move({1,4,6,8},0)=followpos(1) followpos(4)
={2,5}
Move({1,4,6,8},1)=followpos(6)
={7}

Q.5 Convert the following regular expression into deterministic finite automata.
(a+b)*abb(a+b)* (dec 2011) ( 4 marks)
Answer:

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Q.6 Draw Deterministic Finite Automata for the binary strings ending with 10 (nov 2013) (4
marks)

0
Answer:

0
C
1

Q.7 Draw Deterministic Finite Automata for : (May 2014) (7 marks)


1. (0+1)*101(0+1)*
2. 10(0+1)*1
1. (0+1)*101 (0+1)*

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

2. 10(0+1)*1

Q.8 Find the Regular Expression corresponding to given statement, subset of {0,1}*
( May 2012) (4 marks)
1. The Language of all strings containing at least one 0 and at least one 1
(0+1)*0(0+1)*1(0+1)* + (0+1)*1(0+1)*0(0+1)*
2. The Language of all strings containing 0s and 1s both are even.
(00+11+(01+10)(00+11)*(01+10))*
3. The Language of all strings containing at most one pair of consecutive 1s.
(0+10)*(1+11+) (0+01)*
4. The Language of all strings that do not end with 01.
(0+1)*(00+10+11) + 0+ 1+
Q. 9 Write a regular definition for the language of all strings of 0s and 1s with an even number
of 0s and odd number of 1s. (dec 2011) ( 4 marks)
Answer:
R1 (00 | 01(00)*10 )
R2 (1 | 01(11)*0)
R3(1 | 0(11)*10)
R40(11)*0
RR1*R2(R4 | R3R1*R2)*
Q.10 Write down the regular expression for the binary strings with even length (Nov 2013) (3
marks)
Answer:
((0+1)(0+1))* or ((0+1)2)* or (00+11+10+01)*

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

Q.11 How do the parser and scanner communicate? Explain with the block diagram
communication between them. Also explain: What is input buffering? ( May 2012) (7 marks)
Write a short note on Input Buffering. (Nov 2013) ( 7 marks) (May 2014) (7 Marks)
Answer:
Commonly, the interaction is implemented by having the parser call the lexical analyzer. The call, suggested
by the getNextToken command, causes the lexical analyzer to read characters from its input until it can
identify the next lexeme and produce for it the next token, which it returns to the parser.

Interactions between the lexical analyzer and the parser

Input Buffering
Buffer Pair
Because of the amount of time taken to process characters and the large number of characters that
must be processed during the compilation of a large source program, specialized buffering techniques
have been developed to reduce the amount of overhead required to process a single input character.
An important scheme involves two buffers that are alternately reloaded, as shown in the following as

Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096 bytes.

Read N characters into each half of the buffer with one system read command.
If fewer than N characters remain in the input, then eof is read into the buffer after the
input characters.
Two pointers to the input buffer are maintained.
The string of characters between two pointers is the current lexeme.
Initially both pointers point to the first character of the next lexeme to be found.
Forward pointer, scans ahead until a match for a pattern is found.
Once the next lexeme is determined, the forward pointer is set to the character at its right
end.
Prepared BY: Prof. Trupti Kodinariya
(Aits-Rajkot)

If the forward pointer is about to move past the halfway mark, the right half is filled with
N new input characters.
If the forward pointer is about to move past the right end of the buffer, the left half is
filled with N new characters and the forward pointer wraps around to the beginning of the
buffer.

Algorithm to advance forward pointer


Disadvantage of this scheme:
This scheme works well most of the time, but the amount of lookahead is limited.
This limited lookahead may make it impossible to recognize tokens in situations where the
distance that the forward pointer must travel is more than the length of the buffer.
For example: DECLARE ( ARG1, ARG2, , ARGn ) in PL/1 program;
Cannot determine whether the DECLARE is a keyword or an array name until the character
that follows the right parenthesis.

Sentinels:
In the previous scheme, must check each time the move forward pointer that have not moved
off one half of the buffer. If it is done, then must reload the other half.
Therefore the ends of the buffer halves require two tests for each advance of the forward
pointer.
This can reduce the two tests to one if it is extend each buffer half to hold a sentinel character
at the end.
The sentinel is a special character that cannot be part of the source program. (eof character is
used as sentinel).

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

In this, most of the time it performs only one test to see whether forward points to an eof.
Only when it reach the end of the buffer half or eof, it performs more tests.
Since N input characters are encountered between eofs, the average number of tests per input
character is very close to 1.

Algorithm to advance forward pointer using sentinel


Q. 12 Write the two methods used in lexical analyzer for buffering the input. Which technique is
used for speeding up the lexical analyzer? (dec 2011) ( 7 marks)
Answer:
Follow question 7 for input buffering
Using sentinel concept can speed up lexical analyzer
Q.13 List out phases of a compiles. Write a brief not on Lexical Analyzer (May 2014) (6 Marks)
Answer:

Phases of a Compiler
1. Lexical analysis (scanning)
o Reads in program, groups characters into tokens
2. Syntax analysis (parsing)
o Structures token sequence according to grammar rules of the language.
3. Semantic analysis
o Checks semantic constraints of the language.
4. Intermediate code generation
o Translates to lower level representation.
5. code optimization
o Improves code quality.
6. Final code generation.
7. Symbol table management
8. Error Handling
Prepared BY: Prof. Trupti Kodinariya
(Aits-Rajkot)

Lexical Analyzer
The main task of the lexical analyzer is to read the input characters of the source program, group them into
lexemes, and produce as output a sequence of tokens for each lexeme in the source program.
The stream of tokens is sent to the parser for syntax analysis. It is common for the lexical analyzer to interact
with the symbol table as well. When the lexical analyzer discovers a lexeme constituting an identifier, it needs
to enter that lexeme into the symbol table. In some cases, information regarding the kind of identifier may be
read from the symbol table by the lexical analyzer to assist it in determining the proper token it must pass to
the parser.
These interactions are shown in the figure. Commonly, the interaction is implemented by having the parser call
the lexical analyzer.

Interactions between the lexical analyzer and the parser


The call, suggested by the getNextToken command, causes the lexical analyzer to read characters from its
input until it can identify the next lexeme and produce for it the next token, which it returns to the parser.
it may perform certain other tasks besides identification of lexemes.
Remove whitespace ( blank, tab, newline character)
Remove comments
Keep tracks of line numbers (correlating error messages generated by the compiler with the
source program)

For each token, the lexical analyzer produces as output of the form
(token-name, attribute-value)
In the token, the first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the symbol table for this
token in case of identifier.
For example, suppose a source program contains the assignment statement
position = initial + rate * 60
1. position is an identifier that would be mapped into a token (id, 1), where id is an abstract
symbol standing for identifier and 1 points to the symbol table entry for position
2. = is an assignment operator that is mapped into the token (=). Since this token needs no
attribute-value.
Prepared BY: Prof. Trupti Kodinariya
(Aits-Rajkot)

3. initial is an identifier that is mapped into the token (id, 2), where 2 points to the symboltable entry for initial
4. + is An addition operator that is mapped into the token (+).
5. rate is an identifier that is mapped into the token (id, 3), where 3 points to the symboltable entry for rate.
6. * is a multiplication operator that is mapped into the token (*).
7. 60 is a num constant that is mapped into the token (NUM, 60).
Output of lexical analysis phase
id1=id2+id3*60

Token
sequence of alphanumeric character having collective meaning known as token
Pattern
It is rule describing set of strings (lexeme) that can be represent a particular token
in source program
Lexeme
A lexeme is a sequence of characters in the source program that is matched by the
pattern for a token
DFA for identifier

DFA for floating number

Regular definition for floating number

Prepared BY: Prof. Trupti Kodinariya


(Aits-Rajkot)

También podría gustarte