Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Internship Technical Report, Software Systems Engineering Author Mentor Internship Place Time period : Pinaet Phoonsarakun : Alexander Adam, Dipl.-Inf. : Dimensio Informatics GmbH Chemnitz, Germany : April 1 September 30, 2011
Topic Name
: Extending a Proxy to Analyze and Modify Data Streams Based on the Oracle Database Protocol : Pinaet Phoonsarakun : 51-5708-007-7 : Software Systems Engineering : Alexander Adam : dimensio informatics GmbH
Organization Address : Brckenstrae 4, 09111 Chemnitz, Germany Time Period : April 1st September 30th, 2011
This internship report submitted in Partial Fulfillment of the requirement for the degree of Master of Software Systems Engineering department of the Sirindhorn International Thai-German Graduate School of Engineering, King Mongkuts University of Technology North Bangkok.
Project Title
: Extending a Proxy to Analyze and Modify Data Streams Based on the Oracle Database Protocol
Student
Organization Mentor : Alexander Adam, Dipl.-Inf. Development Director, dimensio informatics GmbH Brckenstrae 4, 09111 Chemnitz, Germany Phone +49 (0) 371 / 26 20 19 - 0 Fax +49 (0) 371 / 26 20 19 - 10 Email alexander.adam@dimensio-informatics.com TGGS Advisors : Dr. Kamol Limtanyakul Coordinator of Software Systems Engineering, TGGS at KMUTNB Campus 1518 Pracharaj soi 1 Road (Pibulsongkram Road), Bangsue, Bangkok 10800, Thailand Phone +66 (0) 2 587 2904 Email kamoll@gmail.com
Description: A real database, for instance a banking database, has to deal with very large amounts of data and complex database queries. To improve the performance of the queries and to analyze which of those queries takes minutes, hours or, in uncommon cases, even one or more nights, a proxy server is an efficient solution. Such a proxy is produced by dimensio informatics GmbH. This paper discusses that proxy server and its functionality. During my internship I developed improvements to modify the performance of database queries for the Oracle database. The tasks range was from an initial extraction of the SQL statements that were sent to the database and reached up to an own generated answer, independently from the database.
Table of Contents
Abstract 6 1. Introduction .. 7 1.1 Background for proxy servers ... 7 1.2 Database drivers ... 8 1.3 The TNS protocol ..... 9 1.4 Basic and extended functionality .... 12 2. Project Description ... 13 2.1 SQL statement extracting function ..... 13 2.2 SQL statement changing function ... 14 2.3 Bind values encoding and decoding function ...... 16 2.4 Direct answer generating function ...... 18 3. Conclusion ... 19 Appendix A: List of Abbreviation and Acronyms ....... 20 References ... 21
Abstract
This report paper discusses the internship task that was to extend the functionality of a proxy server to improve and analyze the performance of database queries for the Oracle database. The proxy server acts as a man in the middle. It receives requests from a client and sends it to a destination server, afterwards receiving an answer and sending it back to client. Before sending the request or the answer to a destination server or client, the proxy server may observe, extract, modify, filter, redirect, or analyze it for specific purposes. The goal of this internship was to implement functions to extract, modify, and analyze SQL statements to improve and analyze the performance of database queries for the Oracle database and then integrate those functions into the proxy server from dimensio informatics GmbH.
1. Introduction
During four months, the internship concentrated on extending the functionality of a proxy server which is one of the products of dimensio informatics GmbH. Due to the main theme of the products of dimensio informatics GmbH, to speed up databases as the motto of the company states the spirit of speed [Dim11], a proxy server is a solution to achieve this. The internship task is the implementation of extended functionality into the proxy server. There were four tasks in this internship: implement a function for extracting SQL statement from a packet sent by a client. develop a function for changing SQL statement in a packet sent by client. implementation of a function for encoding and decoding bind values in data stream based on the Oracle database protocol. develop a function for generating direct answer to client. This report is divided into three sections: the first section contains an introduction to this internship. It contains background for proxy servers, database drivers, the TNS protocol, and basic and extended functionality. The next section explains project which includes the four tasks that were worked on. The last section contains the conclusion of this report.
Figure 1.1 Two-computer communication connected through a proxy server (shown in red) In figure 1.1, there are a large diversity of potential purposes for the proxy to be used, including: to block undesired requests; to log / audit usage; to bypass security / parental controls; to scan transmitted contents for viruses and malwares before forwarding; to pass queries and answers unmodified or act as a tunneling proxy; and to store answers from the destination server for the frequently requests (using caching). Proxy servers can be placed in the user's local computer or at various points between the user and the destination servers on the Internet. Moreover proxy servers can be used as a front-end to control and protect access to a server on a private network, commonly also performing tasks such as load-balancing, authentication, decryption or caching, as a reverse proxy or an Internet-facing proxy.
To connect with individual databases, different database drivers require different API (application programming interface) [Api08], for instance, JDBC driver requires JDBC API, ODBC driver requires ODBC API, and OCCI driver requires OCCI API [Oci94]. Java Database Connectivity commonly referred to as JDBC [Jdbc05], is an API for the Java programming language that defines how a client may access a database. It provides methods for querying and updating data in a database. JDBC is oriented towards relational databases. The JDBC driver gives out the connection to the database and implements the protocol for transferring the query and result between client and database. Open Database Connectivity commonly known as ODBC [Odbc95] is an API for accessing database management systems. The designers of ODBC aimed to make it independent of programming languages, database systems, and operating systems. ODBC accomplishes platform and language independence by using an ODBC driver. The Oracle C++ Call Interface, commonly referred to as OCCI [Occi05], provided by Oracle Corporation. It offers C++ programmers a comfortable interface to access Oracle databases. The OCCI classes have parameters reminiscent of SQL statements. The interface is available since Oracle release 9i. OCCI driver provides OCCI the connection to the database and implements the protocol for transferring the query and result between client and database. The overview of database driver process as mentioned above is illustrated in Figure 1.2.
Figure 1.2 Database drivers convert requests that database can understand The four functions developed during this internship have to be flexible and able to support all of these database drivers, in formatting data streams between clients and servers.
10
including TCP/IP, with or without TCP, Named Pipes and Sockets Direct Protocol (SDP), which enables communication over Infiband high-speed networks [Dav07].
Figure 1.3 The Oracle network architecture Moreover TNS is the protocol SQL*Net based on [Ora11]. The SQL*Net is the program for establishing connections between local Oracle database clients and the related Oracle database instance or between two different Oracle database instances through database link. All common physical network protocols, for instance TCP/IP, can execute TNS on top of themselves because it is a generic logical protocol. Also TNS is based on individual logic packets which are mapped transparently to physical packets. To detail TNS packet, every TNS packet has an 8-byte wide global header. The 8 identical bytes of the global header are composed of: WORD WORD BYTE BYTE WORD 00 00 00 00 00 00 00 00 Total packet size Checksum of packet Type of packet Flags or Reserved Checksum of global header.
The first two bytes (WORD) of the header are the packet length or total packet size. All values in TNS header are big-endian. The next two bytes are for the packet checksum, by
11
default they are 0x0000. The next byte is used to indicate the packet type. The next byte is the header flags. Normally, the flags are unused and reserved, but the 10g client may set the value to 0x04 [Dav07]. Finally, the last two bytes are for the header checksum. Generally they are unused by default and set to 0x0000. TNS packet is determined by the type field and that makes the packet body differs accordingly. Currently there are 12 different types of packet are in use. The 12 different types of TNS packet can be roughly divided into 3 groups as following: Connection Connect packet Accept packet Ack packet Refuse packet Redirect packet Data Transfer Data packet NULL packet Control Flow Abort packet Resend packet Marker packet Attention packet Control packet
1 2 3 4 5
Type Type
6 7
9 11 12 13 14.
This is how they work. Firstly, a local Oracle database client sends a packet type 1, Connect packet, to specify the service name while connecting to a Oracle database instance. Then two things can happen if the Listener provided knows such the service: it sends the client an Accept packet if it knows otherwise it sends the client a Redirect packet to redirect the client to other port. If the first option happens, then the client tries authentication. If the second option happens, then the client sends a Connect packet to the other port according to the Redirect packet and requests access to the service. If everything is fine, the destination server send the client an Accept packet and authentication takes place. All packets sent for authentication are data packets. If the Listener does not know the service, then it sends the client a packet type 4, a Refuse packet. After the client and the server authenticated, the client sends requests and the server sends replies. All request and reply packets are Data packets. While sending request and reply packets, a Marker packet may have been sent for interrupting. For instance, the server will send the client a Marker packet if it wants to stop the client sending data. Moreover the packets that can clearly provide useful information are Refuse packets because they point some obvious error. For example, a logon denied error with an ORA-
12
01017 invalid user name / password". With these errors, the 54th byte indicates the problem. A 3 is an invalid password; a 2 indicates no such user [Dav07]. This knowledge of how and when each type of TNS packet has been sent will be applied into the second section of this report and explained about the four tasks in more details.
13
Figure 1.4 Basic and extended functionality of the proxy server A further description of the tasks, is given in the project description, in section two.
2. Project Description
This section will give the explanation of the tasks worked on during the internship by development: a function for extracting SQL statements from a packet sent by a client. a function for changing SQL statements in a packet sent by a client. a function for encoding and decoding bind values in data stream based on the Oracle database protocol. a function for generating direct answers to a client. The software used for developing and testing these four functionalities, included: 1 Oracle database 2 Wireshark 3 QtCreator 4 SQLDeveloper 5 SQLPlus 6 Toad for Oracle More details are explained in the following contents
14
Moreover this functionality has to work with different database drivers as JDBC, ODBC, and OCCI. To develop this function, the internship has to observe which packets, that are sent between a client and a Oracle database server, contain database queries, and to locate where these database queries are in them. This process consumed most of the time of the internship. First I had to gain knowledge of the Oracle database protocol and architecture. Next, I monitored and analyzed packets transparently in the network traffic by using Wireshark. Also I had to monitor how packets are different with the applications SQLDeveloper, SQL+, and Toad for Oracle, and different database drivers. After that I used a proxy server of dimensio informatics GmbH. Finally I developed extended functionality for the proxy to extract SQL statements. Figure 2.1 shows how a simple proxy of dimensio informatics GmbH works.
Figure 2.1 how a simple proxy of dimensio informatics GmbH works In Figure 2.1, the client application changes from connecting directly to the database server (shown in blue), to connecting to this server through the proxy server (shown in green). When the proxy receives a packet sent by the client, it is usually not a complete TNS packet, but it is fragmented into many small packets. Consequently the proxy has to collect all of the packets and reconstruct them into a complete TNS packet for extracting SQL statements before delivery. Collecting, re-arranging packets, and extracting SQL statement, all of these processes are done in the SQL statement extracting function. To extract an SQL statement from a complete TNS packet, first it has to recognize the pattern of bytes blocks in the body of TNS packet, which type of database drivers it is based
15
on and to decide that this packet contains any SQL statement or not. After that, if it does, the function analyzes the offset to reach and extract the SQL statement, and finally forwards that packet to the database server. To make the function more useful, the internship developed a more advanced function for changing SQL statements as explained in the following.
Figure 2.2 the flow chart of the SQL statement changing function As shown in Figure 2.2, first it receives the results from the extracting function that are an SQL statement and a complete TNS packet. Then it compares that SQL statement with the specific SQL it wants to modify. If the statement meets the intention, it analyzes the packet for necessary values such as the total of packet length, packet type, type of statement, number of variables, type of SQL, and SQL length, altering flag, otherwise it forwards the packet to a server. After having finished analyzing the packet, it generates the new SQL statement to replace the old one. Next it modifies the necessary values in that packet, remove the old SQL, and replace it with the new SQL. Finally it gives the modified packet to the connection forwarding function to forward the packet to the destination server. To modify significant values in the packet, it has to know which type of byte order they are stored in. There are three types of byte order: Big Endian (BE), Little Endian (LE), and Middle Endian (ME)[BLE05]. In BE, the byte with the most significant bits is stored at
16
the lowest memory address. In LE, the byte with the least significant bits is stored at the lowest memory address. In ME, the byte with the most or the least significant bits is stored at the lowest memory address. (See Figure 2.3 for example)
Figure 2.3 Example of three types of byte order for value 0x1A2B3C4D The next content will explain the functionality of the proxy for encoding and decoding bind values.
17
that it calls executeUpdate() to execute the SQL statement with these two parameters in the database. The task of the bind values encoding and decoding function is to decode parameter values of prepared statements from a data stream based on the Oracle database protocol. Figure 2.4 shows the bind values encoding and decoding function flow.
Figure 2.4 the flow chart of bind values encoding and decoding function In Figure 2.4, first the bind values encoding and decoding function receives the results from the SQL statement extracting function. Second the encoding and decoding function analyzes the TNS data packet for necessary information such as the position of the statement type, the position of the bind values, and the position of the number of parameters. Third it checks the statement type of the SQL statement that it is a regular statement or a prepared statement with or without parameter values. If the results of the checking turn out to be satisfied, the encoding and decoding function goes to the fourth step; otherwise it skips the fourth step and move to the fifth step. For the fourth step, it decodes or fetches bind values regarding to the result from the previous step especially the TNS packet analyzing step. To decode bind values, first the function counts the number of bind values in the packet. Next the function analyzes the type of each value. 4 bind value types are currently supported, including: integer, float, string, and date. Then the function locates where the bind values are. Finally the function decodes data in the packet into the corresponding bind values. Fifth the encoding and decoding function adds the SQL statement and its bind values into a queue list for analysis purposes. Finally it gives the TNS packet to the connection forwarding function to forward it to the destination database server.
18
Also this function can encode bind values or parameters of each type back into byte code. This is very useful for developing a function to generate an answer, independently from the database as explained in following.
Figure 2.5 the flow chart of the direct answer generating function In Figure 2.5, first the generating function receives the results from the extracting function for an SQL statement and a complete TNS packet. Next the generating function compares that SQL statement with frequently SQL statements it prepares to increase the performance. If that SQL statement matches, the generating function generates an answer or retrieves the answer saved from previous similar database queries, otherwise it gives the packet to the connection forwarding function. After generating an answer or retrieving a
19
saved answer for the SQL statement, the generating function analyzes the answer packet for all necessary values and positions it has to modify to corresponds with the current situation such as packet cursor and the answer for a specific SQL statement. Then the function modifies them. Finally the function sends the modified answer packet to the client.
3. Conclusion
This internship report is separated into three sections which compose of an introduction of a proxy which is one of the products of dimensio informatics GmbH, a project description which includes the four tasks, and the conclusion of this report. The first task was the function for extracting SQL statement from a packet sent by a client. The second task was the function for changing SQL statement in a packet sent by client. After that, the third task was the function for encoding and decoding bind values in packet. Finally, the function for generating direct answer, independently from database, to client was the fourth task. All of the four functions, that are based especially on the Oracle database protocol, work fine. They can operate and provide good results when they are running especially with the applications: SQLDeveloper, SQL+, and Toad for Oracle; and the database drivers: JDBC, ODBC, and OCCI. But the fourth function is an exception; it only works with the application SQLDeveloper and the database drivers JDBC due to the time limited. Looking beyond the current context, there are many client applications with different database drivers accessing to the Oracle database need to be covered by all of the four functions. For example, the direct answer generating function works only with SQLDeveloper using JDBC driver to access Oracle database. This function should be developed more to cover the applications: SQL+ and Toad for Oracle; and the database drivers: ODBC and OCCI. Moreover the function for encoding and decoding bind values can lead to develop more functions for measurement of time used by each of SQL statements separately. For instance, how long the statement binds values and executes, and how long it fetches data from database each time. More topics are the proxy performance optimization, the proxy user interface for customization in run time, and the proxy structure that is easily customized to different database protocols. In summary, all of these can enhance the proxy to be an effective solution to achieve the main theme of its products that is to speed up databases as the motto of the company states: the spirit of speed.
20
Full Form Application Programming Interface [Api08] Big Endian [BLE05] Database Management System [Rag02] the Java Database Connectivity [Jdbc05] Little Endian [BLE05] Middle Endian [BLE05] Oracle Call Interface [Oci94] Oracle C++ Call Interface [Occi05] Open Database Connectivity [Odbc95] Structure Query Language [Sql10] the Transparent Network Substrate [Dav07]
21
References
[Api08] Joshua Bloch (2008). Effective Java (2nd edition). Addison-Wesley. pp. 259312. ISBN 978-0321356680. [BLE05] Bertrand Blanc, Bob Maaraoui. (2005). Endianness or Where is Byte 0?. Retrieved 4 September 2011, from http://3bc.bertrand-blanc.com/endianness05.pdf. [Dav07] David Litchfield. (2007). The Oracle Hacker's Handbook: Hacking and Defending Oracle. John Wiley & Sons. ISBN: 978-0470080221. [Dim11] dimensio informatics GmbH. (2011). dimensio informatics GmbH. Retrieved 4 September 2011, from http://dimensio-informatics.com/index.html [Jdbc05] R.M. Menon. (2005). Expert Oracle JDBC Programming. Apress. ISBN: 9781590594070. [Occi05] Roza Leyderman. (2005). Oracle C++ Call Interface Programmers Guide, 10g Release 2 (10.2). Oracle Publishing Ltd. B14294-01. [Oci94] Tom Smith. (1994). Programmer's Guide to the Oracle Call Interface: Release 7.1. Oracle Publishing Ltd. ASIN: B0036BZOCE. [Odbc95] Robert Signore, John Creamer, Michael O. Stegman. (1995). The Odbc Solution: Open Database Connectivity in Distributed Environments. Mcgraw-Hill. ISBN: 9780079118806. [Ora11] Oracle Wiki. (n.d.). SQL*Net. Retrieved 4 September 2011, from http://www.orafaq.com/wiki/SQL*Net [Rag02] Raghu Ramakrishnan and Johannes Gehrke. (2002). Database Management Systems. Mcgraw Hill Higher Education. ISBN: 978-0071231510. [Sha86] Shapiro, Marc (1986). Structure and encapsulation in distributed systems: the Proxy Principle. Retrieved 4 September 2011, Int. Conf. on Distributed Computer Sys.: 198204. [Sql10] Karen Morton, Kerry Osborne, Robyn Sands et al. (2010). Pro Oracle SQL (Expert's Voice in Oracle). Apress. ISBN: 978-1430232285. [Tho06] Thomas, Keir (2006). Beginning Ubuntu Linux: From Novice to Professional. Apress. ISBN: 978-1590596272. "A proxy server helps speed up Internet access by storing frequently accessed pages"