Está en la página 1de 21

Extending a Proxy to Analyze and Modify Data Streams based on the Oracle Database Protocol

Internship Technical Report, Software Systems Engineering Author Mentor Internship Place Time period : Pinaet Phoonsarakun : Alexander Adam, Dipl.-Inf. : Dimensio Informatics GmbH Chemnitz, Germany : April 1 September 30, 2011

Internship Technical Report

Topic Name

: Extending a Proxy to Analyze and Modify Data Streams Based on the Oracle Database Protocol : Pinaet Phoonsarakun : 51-5708-007-7 : Software Systems Engineering : Alexander Adam : dimensio informatics GmbH

Author Name Personal Number Department Mentor Name Organization

Organization Address : Brckenstrae 4, 09111 Chemnitz, Germany Time Period : April 1st September 30th, 2011

This internship report submitted in Partial Fulfillment of the requirement for the degree of Master of Software Systems Engineering department of the Sirindhorn International Thai-German Graduate School of Engineering, King Mongkuts University of Technology North Bangkok.

Internship Project Description

Project Title

: Extending a Proxy to Analyze and Modify Data Streams Based on the Oracle Database Protocol

Student

: Mr. Pinaet Phoonsarakun

Organization Mentor : Alexander Adam, Dipl.-Inf. Development Director, dimensio informatics GmbH Brckenstrae 4, 09111 Chemnitz, Germany Phone +49 (0) 371 / 26 20 19 - 0 Fax +49 (0) 371 / 26 20 19 - 10 Email alexander.adam@dimensio-informatics.com TGGS Advisors : Dr. Kamol Limtanyakul Coordinator of Software Systems Engineering, TGGS at KMUTNB Campus 1518 Pracharaj soi 1 Road (Pibulsongkram Road), Bangsue, Bangkok 10800, Thailand Phone +66 (0) 2 587 2904 Email kamoll@gmail.com

Description: A real database, for instance a banking database, has to deal with very large amounts of data and complex database queries. To improve the performance of the queries and to analyze which of those queries takes minutes, hours or, in uncommon cases, even one or more nights, a proxy server is an efficient solution. Such a proxy is produced by dimensio informatics GmbH. This paper discusses that proxy server and its functionality. During my internship I developed improvements to modify the performance of database queries for the Oracle database. The tasks range was from an initial extraction of the SQL statements that were sent to the database and reached up to an own generated answer, independently from the database.

Table of Contents
Abstract 6 1. Introduction .. 7 1.1 Background for proxy servers ... 7 1.2 Database drivers ... 8 1.3 The TNS protocol ..... 9 1.4 Basic and extended functionality .... 12 2. Project Description ... 13 2.1 SQL statement extracting function ..... 13 2.2 SQL statement changing function ... 14 2.3 Bind values encoding and decoding function ...... 16 2.4 Direct answer generating function ...... 18 3. Conclusion ... 19 Appendix A: List of Abbreviation and Acronyms ....... 20 References ... 21

Abstract
This report paper discusses the internship task that was to extend the functionality of a proxy server to improve and analyze the performance of database queries for the Oracle database. The proxy server acts as a man in the middle. It receives requests from a client and sends it to a destination server, afterwards receiving an answer and sending it back to client. Before sending the request or the answer to a destination server or client, the proxy server may observe, extract, modify, filter, redirect, or analyze it for specific purposes. The goal of this internship was to implement functions to extract, modify, and analyze SQL statements to improve and analyze the performance of database queries for the Oracle database and then integrate those functions into the proxy server from dimensio informatics GmbH.

1. Introduction
During four months, the internship concentrated on extending the functionality of a proxy server which is one of the products of dimensio informatics GmbH. Due to the main theme of the products of dimensio informatics GmbH, to speed up databases as the motto of the company states the spirit of speed [Dim11], a proxy server is a solution to achieve this. The internship task is the implementation of extended functionality into the proxy server. There were four tasks in this internship: implement a function for extracting SQL statement from a packet sent by a client. develop a function for changing SQL statement in a packet sent by client. implementation of a function for encoding and decoding bind values in data stream based on the Oracle database protocol. develop a function for generating direct answer to client. This report is divided into three sections: the first section contains an introduction to this internship. It contains background for proxy servers, database drivers, the TNS protocol, and basic and extended functionality. The next section explains project which includes the four tasks that were worked on. The last section contains the conclusion of this report.

1.1 Background for proxy servers


Proxy servers are a concept in computer networks to have a server which acts as intermediary for requests from clients seeking resources from other servers. The proxy server will be connected by a client to request some service, such as a file, a connection, a web page, a database view or a table, or another resource, available from a different server. The proxy server evaluates the request according to its functionality. For instance, it may filter traffic by IP address or protocol according to rules in its filtering function. If the request is valid for the filter, the proxy will provide the resource by connecting to the relevant server and requesting the service on behalf of the client. A proxy server may optionally alter the client's request or the server's response, and sometimes it may serve the request without contacting the specified server. In this case, it 'caches' responses from the remote server, and returns subsequent requests for the same content directly. For example, web proxies are commonly used to cache web pages from a web server to speed up access to resources [Tho06]. The proxy concept was invented in the early days of distributed computers [Sha86] as a way to simplify and control their complexity. The figure 1.1 illustrates how a proxy works in a simple way. In this figure, the server in the middle shown in red acts as a proxy server. It receives a request from Charles which acts as a client. Then the proxy forwards the request to Jonas which acts as a destination server. Jonas spends sometime to process an answer for the request. After that, Jonas reply an answer to the proxy, and then it forwards the answer from Jonas to Charles.

Figure 1.1 Two-computer communication connected through a proxy server (shown in red) In figure 1.1, there are a large diversity of potential purposes for the proxy to be used, including: to block undesired requests; to log / audit usage; to bypass security / parental controls; to scan transmitted contents for viruses and malwares before forwarding; to pass queries and answers unmodified or act as a tunneling proxy; and to store answers from the destination server for the frequently requests (using caching). Proxy servers can be placed in the user's local computer or at various points between the user and the destination servers on the Internet. Moreover proxy servers can be used as a front-end to control and protect access to a server on a private network, commonly also performing tasks such as load-balancing, authentication, decryption or caching, as a reverse proxy or an Internet-facing proxy.

1.2 Database drivers


To develop extended functionality into the proxy server, this has to observe physical packets from network traffic. The packets will be the data streams based on the Oracle database protocol. Before database queries from a client are sent to a destination database server, they are encoded into packets or in a format that the destination server can understand. After finish the packet encoding, the client sends them to a destination server. Then the destination server receives packets from the client and process replies. After that the replies are encoded into packets and sent back to the client. Finally the client receives the replies and decodes them in a format that it can understand. The software routine that encodes and decodes the queries and answer is the database driver. In other words, database drivers are client-side adapters (installed on the client machine, not on the server) that convert requests from application programs to a protocol that the DBMS can understand.

To connect with individual databases, different database drivers require different API (application programming interface) [Api08], for instance, JDBC driver requires JDBC API, ODBC driver requires ODBC API, and OCCI driver requires OCCI API [Oci94]. Java Database Connectivity commonly referred to as JDBC [Jdbc05], is an API for the Java programming language that defines how a client may access a database. It provides methods for querying and updating data in a database. JDBC is oriented towards relational databases. The JDBC driver gives out the connection to the database and implements the protocol for transferring the query and result between client and database. Open Database Connectivity commonly known as ODBC [Odbc95] is an API for accessing database management systems. The designers of ODBC aimed to make it independent of programming languages, database systems, and operating systems. ODBC accomplishes platform and language independence by using an ODBC driver. The Oracle C++ Call Interface, commonly referred to as OCCI [Occi05], provided by Oracle Corporation. It offers C++ programmers a comfortable interface to access Oracle databases. The OCCI classes have parameters reminiscent of SQL statements. The interface is available since Oracle release 9i. OCCI driver provides OCCI the connection to the database and implements the protocol for transferring the query and result between client and database. The overview of database driver process as mentioned above is illustrated in Figure 1.2.

Figure 1.2 Database drivers convert requests that database can understand The four functions developed during this internship have to be flexible and able to support all of these database drivers, in formatting data streams between clients and servers.

1.3 The TNS protocol


Since this internship has to develop functions to deal with problems in Oracle, it is necessary to understand the Transparent Network Substrate protocol known as TNS. In the Oracle network architecture (shown in Figure 1.2), the task of TNS is to select the Oracle Protocol Adapter, wrapping the communication in one of the supported transport protocols,

10

including TCP/IP, with or without TCP, Named Pipes and Sockets Direct Protocol (SDP), which enables communication over Infiband high-speed networks [Dav07].

Figure 1.3 The Oracle network architecture Moreover TNS is the protocol SQL*Net based on [Ora11]. The SQL*Net is the program for establishing connections between local Oracle database clients and the related Oracle database instance or between two different Oracle database instances through database link. All common physical network protocols, for instance TCP/IP, can execute TNS on top of themselves because it is a generic logical protocol. Also TNS is based on individual logic packets which are mapped transparently to physical packets. To detail TNS packet, every TNS packet has an 8-byte wide global header. The 8 identical bytes of the global header are composed of: WORD WORD BYTE BYTE WORD 00 00 00 00 00 00 00 00 Total packet size Checksum of packet Type of packet Flags or Reserved Checksum of global header.

The first two bytes (WORD) of the header are the packet length or total packet size. All values in TNS header are big-endian. The next two bytes are for the packet checksum, by

11

default they are 0x0000. The next byte is used to indicate the packet type. The next byte is the header flags. Normally, the flags are unused and reserved, but the 10g client may set the value to 0x04 [Dav07]. Finally, the last two bytes are for the header checksum. Generally they are unused by default and set to 0x0000. TNS packet is determined by the type field and that makes the packet body differs accordingly. Currently there are 12 different types of packet are in use. The 12 different types of TNS packet can be roughly divided into 3 groups as following: Connection Connect packet Accept packet Ack packet Refuse packet Redirect packet Data Transfer Data packet NULL packet Control Flow Abort packet Resend packet Marker packet Attention packet Control packet

Type Type Type Type Type

1 2 3 4 5

Type Type

6 7

Type Type Type Type Type

9 11 12 13 14.

This is how they work. Firstly, a local Oracle database client sends a packet type 1, Connect packet, to specify the service name while connecting to a Oracle database instance. Then two things can happen if the Listener provided knows such the service: it sends the client an Accept packet if it knows otherwise it sends the client a Redirect packet to redirect the client to other port. If the first option happens, then the client tries authentication. If the second option happens, then the client sends a Connect packet to the other port according to the Redirect packet and requests access to the service. If everything is fine, the destination server send the client an Accept packet and authentication takes place. All packets sent for authentication are data packets. If the Listener does not know the service, then it sends the client a packet type 4, a Refuse packet. After the client and the server authenticated, the client sends requests and the server sends replies. All request and reply packets are Data packets. While sending request and reply packets, a Marker packet may have been sent for interrupting. For instance, the server will send the client a Marker packet if it wants to stop the client sending data. Moreover the packets that can clearly provide useful information are Refuse packets because they point some obvious error. For example, a logon denied error with an ORA-

12

01017 invalid user name / password". With these errors, the 54th byte indicates the problem. A 3 is an invalid password; a 2 indicates no such user [Dav07]. This knowledge of how and when each type of TNS packet has been sent will be applied into the second section of this report and explained about the four tasks in more details.

1.4 Basic and extended functionality


The internship does not built a proxy server on its own but it takes a proxy server provided by dimensio informatics GmbH instead. The main task of this internship is to extend functionalities of that proxy server, to analyze and modify data streams based on the Oracle database protocol. Before beginning the task, it has to observe and learn the base functionality of that proxy in order to know where to implement those functionalities. The proxy has basic functions as a simple proxy server: pass-through of a connection, to initialize a connection between a client and a server. forwarding packets, to receive packets from clients and send them to a destination server, afterwards receiving an answer and sending it back to client. balancing connection, to load-balance client connections for the proxy to connect to a server instance. The place where to implement the extended functionalities is the function where packets are forwarded. In this function, there were four functionalities developed: a function for extracting SQL statements from a packet sent by a client. a function for changing SQL statements in a packet sent by client. a function for encoding and decoding bind values in data stream based on the Oracle database protocol. a function for generating direct answers to a client. Showing in Figure 1.3, it summaries basic and extended functionality of the proxy server.

13

Figure 1.4 Basic and extended functionality of the proxy server A further description of the tasks, is given in the project description, in section two.

2. Project Description
This section will give the explanation of the tasks worked on during the internship by development: a function for extracting SQL statements from a packet sent by a client. a function for changing SQL statements in a packet sent by a client. a function for encoding and decoding bind values in data stream based on the Oracle database protocol. a function for generating direct answers to a client. The software used for developing and testing these four functionalities, included: 1 Oracle database 2 Wireshark 3 QtCreator 4 SQLDeveloper 5 SQLPlus 6 Toad for Oracle More details are explained in the following contents

2.1 SQL statement extracting function


First of all, SQL statement extracting functionality has to be accomplished. It is used as a basic functionality to develop more advanced functions. The SQL statement extracting function has to fetch and not miss any single one of the database SQL statement queries.

14

Moreover this functionality has to work with different database drivers as JDBC, ODBC, and OCCI. To develop this function, the internship has to observe which packets, that are sent between a client and a Oracle database server, contain database queries, and to locate where these database queries are in them. This process consumed most of the time of the internship. First I had to gain knowledge of the Oracle database protocol and architecture. Next, I monitored and analyzed packets transparently in the network traffic by using Wireshark. Also I had to monitor how packets are different with the applications SQLDeveloper, SQL+, and Toad for Oracle, and different database drivers. After that I used a proxy server of dimensio informatics GmbH. Finally I developed extended functionality for the proxy to extract SQL statements. Figure 2.1 shows how a simple proxy of dimensio informatics GmbH works.

Figure 2.1 how a simple proxy of dimensio informatics GmbH works In Figure 2.1, the client application changes from connecting directly to the database server (shown in blue), to connecting to this server through the proxy server (shown in green). When the proxy receives a packet sent by the client, it is usually not a complete TNS packet, but it is fragmented into many small packets. Consequently the proxy has to collect all of the packets and reconstruct them into a complete TNS packet for extracting SQL statements before delivery. Collecting, re-arranging packets, and extracting SQL statement, all of these processes are done in the SQL statement extracting function. To extract an SQL statement from a complete TNS packet, first it has to recognize the pattern of bytes blocks in the body of TNS packet, which type of database drivers it is based

15

on and to decide that this packet contains any SQL statement or not. After that, if it does, the function analyzes the offset to reach and extract the SQL statement, and finally forwards that packet to the database server. To make the function more useful, the internship developed a more advanced function for changing SQL statements as explained in the following.

2.2 SQL statement changing function


The SQL statement changing function is used to modify and optimize specific database queries for specific purpose. To develop this function, the function for extracting SQL statements has to finish first because the changing function has to analyze and modify the results from that function. Those are the SQL statement and the complete TNS packet. Figure 2.2 shows the steps of how the SQL statement changing works.

Figure 2.2 the flow chart of the SQL statement changing function As shown in Figure 2.2, first it receives the results from the extracting function that are an SQL statement and a complete TNS packet. Then it compares that SQL statement with the specific SQL it wants to modify. If the statement meets the intention, it analyzes the packet for necessary values such as the total of packet length, packet type, type of statement, number of variables, type of SQL, and SQL length, altering flag, otherwise it forwards the packet to a server. After having finished analyzing the packet, it generates the new SQL statement to replace the old one. Next it modifies the necessary values in that packet, remove the old SQL, and replace it with the new SQL. Finally it gives the modified packet to the connection forwarding function to forward the packet to the destination server. To modify significant values in the packet, it has to know which type of byte order they are stored in. There are three types of byte order: Big Endian (BE), Little Endian (LE), and Middle Endian (ME)[BLE05]. In BE, the byte with the most significant bits is stored at

16

the lowest memory address. In LE, the byte with the least significant bits is stored at the lowest memory address. In ME, the byte with the most or the least significant bits is stored at the lowest memory address. (See Figure 2.3 for example)

Figure 2.3 Example of three types of byte order for value 0x1A2B3C4D The next content will explain the functionality of the proxy for encoding and decoding bind values.

2.3 Bind values encoding and decoding function


The internship developed the encoding and decoding function to deal with prepared statements or SQL statements that take parameters, and to prepare for developing the direct answer generating function. Sometimes client applications use a prepared statement object for sending SQL statements to the database instead of a regular statement object. This usually reduces execution time if it has to execute a regular statement object many times. Because of using a prepared statement object for SQL statements that take parameters one can use the same statement and supply it with different values each time it is executed. The following is sample code for creating SQL statement with a prepared statement object and supplying it with values. 1:String updateString; 2:updateString = "update " + dbName + ".COFFEES " + "set SALES = :1 where COF_NAME = :2"; 3:updateSales = con.prepareStatement( updateString ); 4:updateSales.setInt( 1, 5 ); 5:updateSales.setString( 2, Dark Coffee ); 6:updateSales.executeUpdate( ); This code creates a prepared statement object (shown in line 1-3), takes two input parameters (shown in line 4-5), and finally executes the statement (shown in line 6). This must supply values in place of the two colon mark placeholders before it executes a prepared statement object. Do this by calling the setter methods. The first argument for each of these setter methods specifies which placeholder, :1 or :2. In this example, setInt() specifies the first placeholder and setString() specifies the second placeholder. After

17

that it calls executeUpdate() to execute the SQL statement with these two parameters in the database. The task of the bind values encoding and decoding function is to decode parameter values of prepared statements from a data stream based on the Oracle database protocol. Figure 2.4 shows the bind values encoding and decoding function flow.

Figure 2.4 the flow chart of bind values encoding and decoding function In Figure 2.4, first the bind values encoding and decoding function receives the results from the SQL statement extracting function. Second the encoding and decoding function analyzes the TNS data packet for necessary information such as the position of the statement type, the position of the bind values, and the position of the number of parameters. Third it checks the statement type of the SQL statement that it is a regular statement or a prepared statement with or without parameter values. If the results of the checking turn out to be satisfied, the encoding and decoding function goes to the fourth step; otherwise it skips the fourth step and move to the fifth step. For the fourth step, it decodes or fetches bind values regarding to the result from the previous step especially the TNS packet analyzing step. To decode bind values, first the function counts the number of bind values in the packet. Next the function analyzes the type of each value. 4 bind value types are currently supported, including: integer, float, string, and date. Then the function locates where the bind values are. Finally the function decodes data in the packet into the corresponding bind values. Fifth the encoding and decoding function adds the SQL statement and its bind values into a queue list for analysis purposes. Finally it gives the TNS packet to the connection forwarding function to forward it to the destination database server.

18

Also this function can encode bind values or parameters of each type back into byte code. This is very useful for developing a function to generate an answer, independently from the database as explained in following.

2.4 Direct answer generating function


The direct answer generating function accelerates service requests by retrieving content saved from a previous request made by the same client or even other clients. It extends the functionality of the proxy for keeping local copies of frequently requested resources, allowing large organizations to significantly reduce their upstream bandwidth usage and costs, while significantly increasing performance. Also it can generate own answers, independently from the database server. The task of the function is to accelerate the performance of database queries with two options. The first option is to save answers of frequent queries. The second is to modify and/or generate answers on its own, independently from the database. Figure 2.5 shows the direct answer generating function flow.

Figure 2.5 the flow chart of the direct answer generating function In Figure 2.5, first the generating function receives the results from the extracting function for an SQL statement and a complete TNS packet. Next the generating function compares that SQL statement with frequently SQL statements it prepares to increase the performance. If that SQL statement matches, the generating function generates an answer or retrieves the answer saved from previous similar database queries, otherwise it gives the packet to the connection forwarding function. After generating an answer or retrieving a

19

saved answer for the SQL statement, the generating function analyzes the answer packet for all necessary values and positions it has to modify to corresponds with the current situation such as packet cursor and the answer for a specific SQL statement. Then the function modifies them. Finally the function sends the modified answer packet to the client.

3. Conclusion
This internship report is separated into three sections which compose of an introduction of a proxy which is one of the products of dimensio informatics GmbH, a project description which includes the four tasks, and the conclusion of this report. The first task was the function for extracting SQL statement from a packet sent by a client. The second task was the function for changing SQL statement in a packet sent by client. After that, the third task was the function for encoding and decoding bind values in packet. Finally, the function for generating direct answer, independently from database, to client was the fourth task. All of the four functions, that are based especially on the Oracle database protocol, work fine. They can operate and provide good results when they are running especially with the applications: SQLDeveloper, SQL+, and Toad for Oracle; and the database drivers: JDBC, ODBC, and OCCI. But the fourth function is an exception; it only works with the application SQLDeveloper and the database drivers JDBC due to the time limited. Looking beyond the current context, there are many client applications with different database drivers accessing to the Oracle database need to be covered by all of the four functions. For example, the direct answer generating function works only with SQLDeveloper using JDBC driver to access Oracle database. This function should be developed more to cover the applications: SQL+ and Toad for Oracle; and the database drivers: ODBC and OCCI. Moreover the function for encoding and decoding bind values can lead to develop more functions for measurement of time used by each of SQL statements separately. For instance, how long the statement binds values and executes, and how long it fetches data from database each time. More topics are the proxy performance optimization, the proxy user interface for customization in run time, and the proxy structure that is easily customized to different database protocols. In summary, all of these can enhance the proxy to be an effective solution to achieve the main theme of its products that is to speed up databases as the motto of the company states: the spirit of speed.

20

Appendix A: List of Abbreviation and Acronyms

Abbreviation API BE DBMS JDBC LE ME OCI OCCI ODBC SQL TNS

Full Form Application Programming Interface [Api08] Big Endian [BLE05] Database Management System [Rag02] the Java Database Connectivity [Jdbc05] Little Endian [BLE05] Middle Endian [BLE05] Oracle Call Interface [Oci94] Oracle C++ Call Interface [Occi05] Open Database Connectivity [Odbc95] Structure Query Language [Sql10] the Transparent Network Substrate [Dav07]

21

References
[Api08] Joshua Bloch (2008). Effective Java (2nd edition). Addison-Wesley. pp. 259312. ISBN 978-0321356680. [BLE05] Bertrand Blanc, Bob Maaraoui. (2005). Endianness or Where is Byte 0?. Retrieved 4 September 2011, from http://3bc.bertrand-blanc.com/endianness05.pdf. [Dav07] David Litchfield. (2007). The Oracle Hacker's Handbook: Hacking and Defending Oracle. John Wiley & Sons. ISBN: 978-0470080221. [Dim11] dimensio informatics GmbH. (2011). dimensio informatics GmbH. Retrieved 4 September 2011, from http://dimensio-informatics.com/index.html [Jdbc05] R.M. Menon. (2005). Expert Oracle JDBC Programming. Apress. ISBN: 9781590594070. [Occi05] Roza Leyderman. (2005). Oracle C++ Call Interface Programmers Guide, 10g Release 2 (10.2). Oracle Publishing Ltd. B14294-01. [Oci94] Tom Smith. (1994). Programmer's Guide to the Oracle Call Interface: Release 7.1. Oracle Publishing Ltd. ASIN: B0036BZOCE. [Odbc95] Robert Signore, John Creamer, Michael O. Stegman. (1995). The Odbc Solution: Open Database Connectivity in Distributed Environments. Mcgraw-Hill. ISBN: 9780079118806. [Ora11] Oracle Wiki. (n.d.). SQL*Net. Retrieved 4 September 2011, from http://www.orafaq.com/wiki/SQL*Net [Rag02] Raghu Ramakrishnan and Johannes Gehrke. (2002). Database Management Systems. Mcgraw Hill Higher Education. ISBN: 978-0071231510. [Sha86] Shapiro, Marc (1986). Structure and encapsulation in distributed systems: the Proxy Principle. Retrieved 4 September 2011, Int. Conf. on Distributed Computer Sys.: 198204. [Sql10] Karen Morton, Kerry Osborne, Robyn Sands et al. (2010). Pro Oracle SQL (Expert's Voice in Oracle). Apress. ISBN: 978-1430232285. [Tho06] Thomas, Keir (2006). Beginning Ubuntu Linux: From Novice to Professional. Apress. ISBN: 978-1590596272. "A proxy server helps speed up Internet access by storing frequently accessed pages"

También podría gustarte