Compiler Design #1

Q1. What is the difference between weakly typed and strongly typed languages?
How a compiler handles Type Checking and Type Conversion? Ans :- A strongly typed language does not allow you to use one type as another. In C you can pass a data element of the wrong type and it will not complain. Strong typing probably means that variables have a well-defined type and that there are strict rules about combining variables of different types in expressions. For example, if A is an integer and B is a float, then the strict rule about A+B might be that A is cast to a float and the result returned as a float. If A is an integer and B is a string, then the strict rule might be that A+B is not valid. Strongly typed means, a will not be automatically converted from one type to another. Weak typing implies that the compiler does not enforce a typing discipline, or perhaps that enforcement can easily be subverted. Weakly typed is the opposite: Perl can use a string like "123" in a numeric context, by automatically converting it into the int 123. Differences between Strongly Typed and Weakly Typed Languages:1. A language is strongly typed if type annotations are associated with variable names, rather than with values. If types are attached to values, it is weakly typed. 2. A language is strongly typed if it contains compile-time checks for type constraint violations. If checking is deferred to run time, it is weakly typed. 3. A language is strongly typed if there are compile-time or run-time checks for type constraint violations. If no checking is done, it is weakly typed. 4. A language is strongly typed if conversions between different types are forbidden. If such conversions are allowed, it is weakly typed. 5. A language is strongly typed if conversions between different types must be indicated explicitly. If implicit conversions are performed, it is weakly typed. 6. A language is strongly typed if there is no language-level way to disable or evade the type system. If there are casts or other type-evasive mechanisms, it is weakly typed. 7. A language is strongly typed if it has a complex, fine-grained type system with compound types. If it has only a few types, or only scalar types, it is weakly typed. 8. A language is strongly typed if the type of its data objects is fixed and does not vary over the lifetime of the object. If the type of a datum can change, the language is weakly typed.
Weak versus strong

The main difference, roughly speaking, between a strongly typed language and a weakly typed one is that a weakly typed one makes conversions between unrelated types implicitly, while a strongly typed one typically disallows implicit conversions between unrelated types.
Furthermore a strongly typed language requires an explicit conversion (by using the cast operator) between related types, when there is possibility of data loss, while a weakly typed one would carry out the conversion regardless. Weakly typed means that the programmer doesn't need to take care of data types while declaring variables. e.g. In Visual Basic any variable is declared as "Dim a as Var" Strongly typed means that the programmer provides the data type while declaring the variables. e.g. In Java, the variables are declared as int, short and long etc.
Handling of Type checking and type conversion in a compiler

Type checking Type checking can occur at either compile time or run time. Statically typed languages, such as C++ and Java, do type checking at compile time. Dynamically typed languages, such as Smalltalk and Python, handle type checking at run time. As a dynamically typed language, ActionScript 3.0 has run-time type checking, but also supports compile-time type checking with a special compiler mode called strict mode. In strict mode, type checking occurs at both compile time and run time, but in standard mode, type checking occurs only at run time. Dynamically typed languages offer tremendous flexibility when you structure your code, but at the cost of allowing type errors to manifest at run time. Statically typed languages report type errors at compile time, but at the cost of requiring that type information be known at compile time.
Compile-time type checking

Compile-time type checking is often favored in larger projects because as the size of a project grows, data type flexibility usually becomes less important than catching type errors as early as possible.
Run-time type checking

Run-time type checking occurs in ActionScript 3.0 whether you compile in strict mode or standard mode. Consider a situation in which the value 3 is passed as an argument to a function that expects an array. In strict mode, the compiler will generate an error, because the value 3 is not compatible with the data type Array. If you disable strict mode, and run in standard mode, the compiler does not complain about the type mismatch, but run-time type checking results in a run-time error.
Type conversions
A type conversion is said to occur when a value is transformed into a value of a different data type. Type conversions can be either implicit or explicit. Implicit conversion, which is also called coercion, is sometimes performed at run time. For example, if the value 2 is assigned to a variable of the Boolean data type, the value 2 is converted to the Boolean value true before assigning the value to the variable. Explicit conversion, which is also called casting, occurs when your code instructs the compiler to treat a variable of one data type as if it belongs to a different data type. When primitive values are involved, casting actually converts values from one data type to another. To cast an object to a different type, you wrap the object name in parentheses and precede it with the name of the new type. For example, the following code takes a Boolean value and casts it to an integer:
var myBoolean:Boolean = true; var myINT:int = int(myBoolean); trace(myINT); // 1
Implicit conversions
Implicit conversions happen at run time in a number of contexts:

In assignment statements When values are passed as function arguments When values are returned from functions In expressions using certain operators, such as the addition (+) operator For user-defined types, implicit conversions succeed when the value to be converted is an instance of the destination class or a class that derives from the destination class. If an implicit conversion is unsuccessful, an error occurs. For example, the following code contains a successful implicit conversion and an unsuccessful implicit conversion:
class A {} class B extends A {} var objA:A = new A(); var objB:B = new B(); var arr:Array = new Array(); objA = objB; // Conversion succeeds. objB = arr; // Conversion fails.
For primitive types, implicit conversions are handled by calling the same internal conversion algorithms that are called by the explicit conversion functions.
Explicit conversions
Its helpful to use explicit conversions, or casting, when you compile in strict mode, because there may be times when you do not want a type mismatch to generate a compile-time error. This may be the case when you know that coercion will convert your values correctly at run time. For example, when working with data received from a form, you may want to rely on coercion to convert certain string values to numeric values. The following code generates a compile-time error even though the code would run correctly in standard mode:
var quantityField:String = "3"; var quantity:int = quantityField; // compile time error in strict mode
If you want to continue using strict mode, but would like the string converted to an integer, you can use explicit conversion, as follows:
var quantityField:String = "3"; var quantity:int = int(quantityField); // Explicit conversion succeeds.
Q2. Discuss the process of code generation? Also elaborate on the problem faced in this process?
Ans:- Code generation is the process by which a compilers code generator converts some intermediate representation of source code into a form (e.g., machine code) that can be readily executed by a machine (often a computer). The input to the code generator typically consists of a parse tree or an abstract syntax tree. The tree is converted into a linear sequence of instructions, usually in an intermediate language such as three address code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a peephole optimization pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.) Major tasks in code generation Tasks which are typically part of a sophisticated compiler's "code generation" phase include:

Instruction selection: which instructions to use. Instruction scheduling: in which order to put those instructions. Scheduling is a speed optimization that can have a critical effect on pipelined machines. Register allocation: the allocation of variables to processor registers. Debug data generation if required so the code can be debugged.
In a compiler that uses an intermediate language, there may be two instruction selection stages one to convert the parse tree into intermediate code, and a second phase much later to convert the intermediate code into instructions from the instruction set of the target machine. This second phase does not require a tree traversal; it can be done linearly, and typically involves a simple replacement of intermediate-language operations with their corresponding opcodes. However, if the compiler is actually a language translator
Runtime code generation When code generation occurs at runtime, as in just-in-time compilation(JIT), it is important that the entire process be efficient with respect to space and time. For example, when regular expressions are interpreted and used to generate code at runtime, a nondetermistic finite state machine is often generated instead of a deterministic one, because usually the former can be created more quickly and occupies less memory space than the latter. Despite its generally generating less efficient code, JIT code generation can take advantage of profiling information that is available only at runtime. CODE GENERATION The final phase in our compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program. The requirements traditionally imposed on a code generator are severe. The output code must be correct and of high quality, meaning that it should make effective use of the resources of the target machine. Moreover, the code generator itself should run efficiently. ISSUES IN THE DESIGN OF A CODE GENERATOR While the details are dependent on the target language and the operating system, issues such as memory management, instruction selection, register allocation, and evaluation order are inherent in almost all code generation problems. Input to the code Generator The input to the code generator consists of the intermediate representation of the source program produced by the front end, together with information in the symbol table that is used to determine the run time addresses of the data objects denoted by the names in the intermediate representation. There are several choices for the intermediate language, including: linear representations such as postfix notation, three address representations such as quadruples, virtual machine representations such as syntax trees and dags.
Target Programs The output of the code generator is the target program. The output may take on a variety of forms: absolute machine language, relocatable machine language, or assembly language. Producing an absolute machine language program as output has the advantage that it can be placed in a location in memory and immediately executed. A small program can be compiled and executed quickly. Memory Management Mapping names in the source program to addresses of data objects in run time memory is done cooperatively by the front end and the code generator. We assume that a name in a three-address statement refers to a symbol table entry for the name. Instruction Selection The nature of the instruction set of the target machine determines the difficulty of instruction selection. The uniformity and completeness of the instruction set are important factors. If the target machine does not support each data type in a uniform manner, then each exception to the general rule requires special handling. Register Allocation Instructions involving register operands are usually shorter and faster than those involving operands in memory. Therefore, efficient utilization of register is particularly important in generating good code. The use of registers is often subdivided into two subproblems: 1. 2. During register allocation, we select the set of variables that will reside in registers at a point in the program. During a subsequent register assignment phase, we pick the specific register that a variable will reside in.
Q 3. Discuss the techniques that can be used to reduce the size or running time of a program?
Techniques for reducing the amount of data space required to represent objects in objectoriented programs. Our techniques optimize the representation of both the programmerdefined fields within each object and the header information used by the run-time system: Field Reduction: The compiler then transforms the program to reduce the size of the field to the smallest type capable of storing that range of values.
Unread and Constant Field Elimination: If the bit width analysis finds that a field always holds the same constant value, the compiler eliminates the field. It removes each write to the field, and replaces each read with the constant value. Fields without executable reads are also removed. Static Specialization: Our analysis finds classes with fields whose values do not change after initialization, even though different instances of the object may have different values for these fields. It then generates specialized versions of each class which omit these fields, substituting accessor methods which return constant values. Field Externalization: Our analysis uses profiling to find fields that almost always have the same default value. It then removes these fields from their enclosing class, using a hash table to store only values of the field that differ from the default value. It replaces writes to the field with an insertion into the hash table (if the written value is not the default value) or a removal from the hash table (if the written value is the default value). It replaces reads with hash table lookups; if the object is not present in the hash table, the lookup simply returns the default value. Class Pointer Compression: We use rapid type analysis to compute an upper bound on the number of classes that the program may instantiate commonly called claz, which contains a pointer to the class data for that object, such as inheritance information and method dispatch tables. Our compiler uses the results of the analysis to replace the reference with a smaller set into a table of pointers to the class data. Byte Packing: All of the above transformations may reduce or eliminate the amount of space required to store each field in the object or object header. Our byte packing algorithm arranges the fields in the object to minimize the object size.
Q4. Define a regular expression? How can it be converted into Finte Automata? Explain with the help of example.
Ans:- A regular expression is a set of pattern matching rules encoded in a string according to certain syntax rules. a regular expression provides a concise and flexible means to "match" (specify and recognize) strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp". The concept of regular expressions was first popularized by utilities provided by Unix distributions, in particular the editor ed and the filter grep. A regular expression is written in a formal language that can be interpreted by a regular expression processor, which is a program that either serves as a parser generator or examines text and identifies parts that match the provided specification. Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based on patterns. Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively Simple Regular Expressions Simple Regular Expressions is a syntax that may be used by historical versions of application programs, and may be supported within some applications for the purpose of providing backward compatibility.
Converting a Regular Expression into a Deterministic Finite Automaton The task of a scanner generator, such as JLex, is to generate the transition tables or to synthesize the scanner program given a scanner specification (in the form of a set of REs). So it needs to convert REs into a single DFA. This is accomplished in two steps: first it converts REs into a non-deterministic finite automaton (NFA) and then it converts the NFA into a DFA. An NFA is similar to a DFA but it also permits multiple transitions over the same character and transitions over . In the case of multiple transitions from a state over the same character, when we are at this state and we read this character, we have more than one choice; the NFA succeeds if at least one of these choices succeeds. The transition doesn't consume any input characters, so you may jump to another state for free. Clearly DFAs are a subset of NFAs. But it turns out that DFAs and NFAs have the same expressive power. The problem is that when converting a NFA to a DFA we may get an exponential blowup in the number of states. We will first learn how to convert a RE into a NFA. This is the easy part. There are only 5 rules, one for each type of RE:
As it can been shown inductively, the above rules construct NFAs with only one final state. For example, the third rule indicates that, to construct the NFA for the RE AB, we construct the NFAs for A and B, which are represented as two boxes with one start state and one final state for each box. Then the NFA for AB is constructed by connecting the final state of A to the start state of B using an empty transition. For example, the RE (a| b)c is mapped to the following NFA:
The next step is to convert a NFA to a DFA (called subset construction). Suppose that you assign a number to each NFA state. The DFA states generated by subset construction have sets of numbers, instead of just one number. For example, a DFA state may have been assigned the set {5, 6, 8}. This indicates that arriving to the state labeled {5, 6, 8} in the DFA is the same as arriving to the state 5, the state 6, or the state 8 in the NFA when parsing the same input. (Recall that a particular input sequence when parsed by a DFA, leads to a unique state, while when parsed by a NFA it may lead to multiple states.) First we need to handle transitions that lead to other states for free (without consuming any input). These are the transitions. We define the closure of a NFA node as the set of all the nodes reachable by this node using zero, one, or more transitions. For example, The closure of node 1 in the left figure below
is the set {1, 2}. The start state of the constructed DFA is labeled by the closure of the NFA start state. For every DFA state labeled by some set {s1,..., sn} and for every character c in the language alphabet, you find all the states reachable by s1, s2, ..., or sn using c arrows and you union together the closures of these nodes. If this set is not the label of any other node in the DFA constructed so far, you create a new DFA node with this label. For example, node {1, 2} in the DFA above has an arrow to a {3, 4, 5} for the character a since the NFA node 3 can be reached by 1 on a and nodes 4 and 5 can be reached by 2. The b arrow for node {1, 2} goes to the error node which is associated with an empty set of NFA nodes. The following NFA recognizes (a| b)*(abb | a+b), even though it wasn't constructed with the above RE-to-NFA rules. It has the following DFA:
Q 5. How is memory managed at the time of execution of a program? Discuss with Reference to any language of your choice?
Ans : Processes in a system share the CPU and main memory with other processes. In order to manage memory more efficiently and with fewer errors, modern systems provide an abstraction of main memory known as virtual memory (VM). Virtual memory is an elegant interaction of hardware exceptions, hardware address translation, main memory, disk files, and kernel software that provides each process with a large, uniform, and private address space. virtual memory provides three important capabilities. (1) It uses main memory efficiently by treating it as a cache for an address space stored on disk, keeping only the active areas in main memory, and transferring data back and forth between disk and memory as needed. (2) It simplifies memory management by providing each process with a uniform address space. (3) It protects the address space of each process from corruption by other processes. Physical and Virtual Addressing The main memory of a computer system is organized as an array of M contiguous bytesized cells. Each byte has a unique physical address (PA). The first byte has an address of 0, the next byte an address of 1, the next byte an address of 2, and so on. We call this approach physical addressing. When the CPU executes the load instruction, it generates an effective physical address and passes it to main memory over the memory bus. The main memory fetches the 4-byte word starting at physical address 4 and returns it to the CPU, which stores it in a register. Address Space Modern processors use a form of addressing known as virtual addressing, With virtual addressing, the CPU accesses main memory by generating a virtual address (VA), which
is converted to the appropriate physical address before being sent to the memory. The task of converting a virtual address to a physical one is known as address translation. Like exception handling, address translation requires close cooperation between the CPU hardware and the operating system. Dedicated hardware on the CPU chip called the memory management unit (MMU) translates virtual addresses on the fly, using a look-up table stored in main memory whose contents are managed by the operating system. Management of Memory in C++ Language:C++ is a (weakly) object-oriented language. The standard library functions for memory management in C++ are new and delete. C++ manual memory management is inherited from C without changes Manual memory management is incompatible with features such as exceptions & operator overloading. The most common solution is copying - since it's dangerous to point to an object which can die before we're done with it. An alternative solution to copying is using "smart" pointer classes, which could emulate automatic memory management by maintaining reference count. The language is notorious for fostering large numbers of memory management bugs, including:

Using stack-allocated structures beyond their lifetimes Using heap-allocated structures after freeing them; Neglecting to free heap-allocated objects when they are no longer required; Excessive copying by copy constructors; Unexpected sharing due to insufficient copying by copy constructors; Allocating insufficient memory for the intended contents; Accessing arrays with indexes that are out of bounds.
Q6. Discuss the following Terms:

a) Polymorphic Functions: Polymorphism is a programming language feature that allows values of different datatypes to be handled using a uniform interface. Polymorphic functions are functions whose operands (actual parameters) can have more than one type. Consider the type of the function sumlist that takes a list of numbers and adds them up. Using recursive functions this must produce an answer for the empty list, the value is zero, and then for list made by ``:'' which will be sum the tail of the list recursively and then add the value of the head element:
sumlist [] = 0 sumlist (h:t) = h + sumlist t
the type of this functions is: [Int]->Int, because the argument is a list the elements of which are added and the sum is the result. It does not perform any operation on the elements of the lists, it just takes the list apart and counts its way along. In fact len can be applied to any list, it is said to be polymorphic which means the same operation applied to different
types of value. The way Haskell types a polymorphic function is to use a type variable where the ``any'' type would be, so len is of type:
len:: [a] -> Int
Type variables are ``a'', ``b'', ``c'' etc. and they can only be used in type descriptions. A letter type variable in a type means the function has a range of types, a polytype, those which can be obtained by substituting a type for the variable uniformly throughout the signature. So len has types:
[Int] -> Int [Char] -> Int [Bool] -> Int [[Char]] -> Int [[Int]] -> Int ... and infinitely many more.
b) DAG Representation of Program A directed acyclic graph (DAG!) is a directed graph that contains no cycles. A rooted tree is a special kind of DAG and a DAG is a special kind of directed graph. For example, a DAG may be used to represent common subexpressions in an optimising compiler.
+ . . () . . . . . . a b f * .. . . a b Tree . * .. . + . . . . *<---| () .. | . . . . | . | a b | f | ^ v | | |--<---DAG
expression: a*b+f(a*b)
The DAG Representation of Basic Blocks Directed acyclic graphs (DAGs) give a picture of how the value computed by each statement in the basic block is used in the subsequent statements of the block. Definition: a dag for a basic block is a directed acyclic graph with the following labels on nodes: - leaves are labeled with either variable names or constants. they are unique identifiers from operators we determine whether l- or r-value. represent initial values of names. Subscript with 0. - interior nodes are labeled by an operator symbol. - Nodes are also (optionally) given a sequence of identifiers for labels. - interior node computed values
- identifiers in the sequence have that value. Example of DAG Representation
t1:= 4*i t2:= a[t1] t3:= 4*i t4:= b[t3] t5:= t2 * t4 t6:= prod + t5 prod:= t6 t7:= i + 1 i:= t7 if i <= 20 goto 1 Three address code
+ prod [] a * t2 b 4
Corresponding DAG
t5 [] t4 * t1, t3 i0 (1) <= + t7, i 1 20

Compiler Design #1

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Compiler Design #1

Cargado por

Copyright:

Formatos disponibles

Q1. What is the difference between weakly typed and strongly typed languages?

Weak versus strong

Handling of Type checking and type conversion in a compiler

Compile-time type checking

Run-time type checking

Q6. Discuss the following Terms:

- identifiers in the sequence have that value. Example of DAG Representation

t5 [] t4 * t1, t3 i0 (1) <= + t7, i 1 20

También podría gustarte