Documentos de Académico
Documentos de Profesional
Documentos de Cultura
The business impact of this growing complexity is stark: multicore and multiprocessor software projects are 4.5X more expensive, have 25% longer schedules, and require almost 3X as many software engineers.1 One area in particular where this growing complexity can have a dramatic impact on cost and schedule overruns is in the area of software testing and code inspection. A multicore/ processor environment can add exponential complexity to effectively identifying errors in software. There are two areas in particular that have the ability to drag the productivity of a software team through the floor: concurrency errors and endian incompatibilities. This whitepaper will discuss these types of issues in detail, explain how Klocworks source code analysis engine, Klocwork Truepath can be used to address them, and walkthrough two examples of these problems in prominent open source projects.
Current Project
Multicore and multiprocessor 5.2% Multicore 9.3% Multiprocessor 20.8% Single processor 61.8% Dont know 2.9% Multicore and multiprocessor 19.4%
Expected in 2 Years
Dont know 8.5%
Figure 1 | Processing Architecture Used in the Current Project and Expected in Next Two Years (Percent of Respondents)
VDC Research, Next Generation Embedded Hardware Architectures: Driving Onset of Project Delays, Costs
WWW.KLOCWORK.COM
Compile
Emulate native build Build control ow graph
Symbolic logic
Analyze control ow graph Perform dataow analysis
Concurrency
Analyze lock dependencies Figure 2 | Klocwork Truepath tool chain provides concurrency analysis engine after control flow graph analysis and build emulation.
In this figure you can see that data relating to lock lifecycles is gathered by the normal analysis engine, and once this has been produced for all modules in the system, the whole program space is then analyzed by the new concurrency analysis engine so that loops in the lifecycle graph can be found, which equate to deadlocks. Consider a function that operates as follows:
lock_t Lock1, Lock2; void foo(int x) { if( x & 1 ) { lock(Lock1); lock(Lock2); } else lock(Lock1); }
You can easily see by inspection that when passed an odd number as its parameter, this function defines a dependency of Lock2 upon Lock1. Failing an odd parameter, Lock1 is still reserved, but this time there is no dependency of Lock2 upon Lock1 at the local scope, although there may still remain that dependency (or another) at an inter-procedural scope. Therefore, we have two discrete types of questions to ask when performing the analysis: 1. Symbolic logic questions: a. Is there a valid control flow that gets us to call function foo() with an odd parameter? b. Is there a valid control flow that results in foo() being called with an even parameter followed by a call to another function that results in another lock (e.g. Lock2) being reserved before Lock1 is released? 2. Lock dependency questions: a. If either of these are so, is there any other situation in the programs natural control flow whereby a counter-dependency of Lock1 upon Lock2 can be reached, potentially resulting in a deadlock? The first type of question is answered by Klocwork Truepaths symbolic logic engine during the normal course of program analysis, just as any other type of defect is analyzed for inter-procedural data flows that can or cannot occur. The second type of question is then answered by the concurrency analysis engine, fed by the collection of all possible dependencies within the program space. The result is what tends to be a small set of incredibly difficult to find (manually), and insanely difficult to understand (without a tool) deadlock scenarios that developers can triage and fix very quickly within the natural course of their implementation tasks.
Endian Incompatibilities
Whilst it may be true that there are 10 kinds of people in the world, a switch from a little endian platform to a big endian platform will muddy that impression considerably. An advisor of ours recently informed me with glee that hed finally set his MSB (having passed his 64th birthday), but store that in nibble representation on an unexpected endian architecture and hed be regressing to the nursery once more. In short, endian representations affect how the host processor stores integral types in memory. Considering 32-bit integers, each of which consists of four bytes of memory, the processor can chose to read and write those four bytes in a variety of orders, although traditionally only two are used: Little endian, in which the bytes are written in the order 0, 1, 2, 3 Big endian, in which the bytes are written in the order 3, 2, 1, 0
This picture becomes slightly muddied if the processor actually writes words at a time (this is mostly a fairly historical representation now, but we mention it for completeness), and applies its endian assumptions to each word: Little endian still writes bytes in the order 0, 1, 2, 3 Big endian, however, may now write bytes in the order 1, 0, 3, 2
However the processor stores and reads such types is entirely at its own discretion and the business of nobody else. Until, that is, the developer directs the processor to write such data into a medium for transmission, as opposed to storage in memory. Transmission media, which could be sockets, files, pipes, or any other interprocessor vector (e.g. interrupts that cause data to be written to the PCI-Express interface, or to the serial bus, or), are addressed by the processor in exactly the same way as memory unless specifically told to do otherwise.
Thus, a big endian processor will write a 32-bit integer onto a socket in byte order 3, 2, 1, 0. If the CPU on the other end of the socket uses a little endian architecture, then obviously a value written onto the socket will be interpreted completely differently when read. For example, a value of 29, written by a big endian processor and read by a little endian processor will be interpreted as 53,504 not a small correction by any means. Preparing a program for use with heterogeneous processor architectures therefore involves finding every integral type that ever hits a transmission vector that could legitimately target another processor and ensuring that the read/write operation involved transforms the data into / from a neutral representation that both sides agree on. In a program of any size at all, obviously this is a non-trivial task. Klocwork Truepath can help developers in this task as it now includes the ability to validate type representation usage symmetrically as those types cross transmission vector boundaries. That is, the data flow engine within Klocwork Truepath automatically validates that types that are written directly to a transmission vector are subject to host-to-neutral format transformation before the write operation takes place. Likewise, integral types read from a transmission vector are tracked to ensure that they are appropriately transformed prior to the first attempted usage on the host. For example, consider the following function:
void foo(int sock) { int x; for( x = 0; x < 256; x++ ) if( send(sock, &x, sizeof int) < sizeof int) return;
This simple function makes the basic assumption that the reader on the other end of its socket has the same processor architecture as the sender. This might be true, or more accurately it might be true today, but what designer can ever look far enough into the future to know that it will always be true, regardless of market shifts, great ideas that marketing interns have, etc. Klocwork Truepath, upon analysis of this function, will point out: Value x is used in host byte order, but should be used in environment/ network byte order. A developer versed in inter-architectural development will naturally modify this function to transform the value of the variable x prior to transmission:
void foo(int sock) { int x, xt; for( x = 0; x < 256; x++ ) { xt = htonl(x); // or some other suitable form if( send(sock, &xt, sizeof int) < sizeof int) return; }
Likewise when it comes to reading information across a transmission vector, Klocwork Truepath traces the data flow of any received integral types to ensure, in exactly the opposite way to sending, that any such values are transformed to host format prior to their first usage.
Now I can call enter() multiple times, simulating some of the capabilities of a true recursive lock, and as long as I remember to call leave() an equal number of times the lifecycle of the underlying non-recursive lock is managed correctly:
void foo() { // real lock is reserved enter(); if( i-really-want-to ) { // only the reference count is affected enter(); leave(); } // now the real lock is released leave(); }
Now consider the requirement to implement an abstraction over thread-specific data storage. To ensure safety when allocating such a structure, the database engine uses the singleton recursive lock described above to protect its activities with an implementation that simplifies as follows:
int tlsCreated = 0; data_t* create_data() { static data_t* tls; enter(); if( tlsCreated == 0 ) tls = create_thread_data(); tlsCreated = 1; leave(); init_data(tls); return tls;
To simple inspection, this appears quite correct as it calls leave() the same number of times as enter() and thus should be considered well behaved. Unfortunately life in the parallel world is rarely simple to analyze, and this case is certainly more complicated than it first appears. Consider a two core CPU executing two threads, both calling create_data at very slight offsets in time. The first thread lets call our threads Thread 1 and Thread 2 begins executing create_data() and successfully calls the enter() function. This results in the underlying lock, lock 2, being reserved to Thread 1:
Thread 1 create_data() enter() refCount = 0 reserve(lock1) reserve(lock2) release(lock1) refCount = 1
Now lets assume that Thread 2 begins its execution of create_data() during the time that Thread 1 is active, and before it releases lock 1:
Thread 1 create_data() enter() refCount = 0 reserve(lock1) reserve(lock2) release(lock1) Thread 2
One further assumption makes the scenario whole: Thread 1 at this moment is interrupted by the operating system, losing its time on chip. Crucially, this happens before the reference count is updated. (Check the implementation of enter() and youll see that the author unfortunately left the reference count update outside of what is supposed to guard access to it.) As the reference count will therefore still read zero for Thread 2, it will attempt to reserve lock 2, resulting in Thread 2 blocking (as lock 2 is already owned by Thread 1):
Thread 2
Upon return from interrupt, Thread 1 is released and resumes execution where it left off, incrementing the reference count and returning from the enter() function. Its execution of create_data() continues, leading to a call to the leave() function, which unfortunately attempts to reserve lock 1 before doing anything else:
Thread 1 create_data() enter() refCount = 0 reserve(lock1) reserve(lock2) release(lock1) interrupted refCount = 1 return leave() reserve(lock1); blocked Thread 2
Due to the fact that Thread 2 is currently blocked, waiting on lock 2, and currently owns lock 1, Thread 1 will now block on its own attempt to reserve lock 1. In short, this is a classic lock-order inversion contention caused by a poorly guarded data item, which when subject to race condition (being read by one thread whilst in the process of being updated by another) causes one thread to reserve locks in order while the other thread attempts to reserve them out of order, resulting in a deadlock. With the race condition fixed, this singleton will operate correctly, although as previously described the author actually chose to completely rewrite this module, providing a more useful re-entrant mutual exclusion capability for multiple threads, i.e. removing the singleton semantic.
In this example, its simple to see the assumption in all its glory, as that data member msg.msg_hdr.m_size is read and used directly off the wire, in what could be, but isnt in this case, network order. Now lets assume that a new generation of designers revisit this decision and instead place emphasis on scale and flexibility over ease of implementation. Now they decide to place the statistics collector process on an arbitrary node in the hardware design, rather than on the same node as the kernel process. With this decision in place, the assumption that network byte order and host byte order are the same can no longer be made in general. Porting to this new assumption set could take significant time, both for developers and for the test crew, faced with putting together a matrix of CPUs / hosts that embody the plethora of representations we can expect to support in the field. Using a tool-driven approach, however, this entire effort can be collapsed to a single analysis pass, taking minutes in total, to see a report of whats involved. In this case, the designers would be faced with the following endian vulnerabilities that would need to be addressed (along with the obvious logistical issues around how to place the process on the right host/CPU, of course): pgstats.c: line 1988: function pgstat_recvbuffer() Value msg.msg_hdr.m_size is used in network order. pgstats.c: line 1443: function pgstat_send() Value *msg is used in host byte order.
These two simple issues might be thought of as the whole problem domain. However, looking further into what this module is capable of, certain information can be persisted across sessions using a statistics file. If we further our decision to allow the process to be spawned on heterogeneous hardware, we might well continue that spread by allowing different instantiations of said process to occur on heterogeneous hardware, thus requiring persistent data to be endian safe: pgstats.c: line 2556: function pgstat_read_statsfile() Value format_id is used in environment byte order. Similar errors can be found on line(s): 2610, 2684, 2717, 2740. pgstats.c: line 2312: function pgstat_write_statsfile() Value format_id is used in host byte order. Similar errors can be found on line(s): 2351, 2384, 2411, 2412.
Armed with this information, the designer can make all required updates to remove endian vulnerability from their code in one pass.
Conclusion________________________________________________________________________________________________________________
The complexity of this problem domain is vast, so theres no one solution, tool, or approach that will address all your problems. Development teams need to equip themselves with good tools, smart design assumptions, and even smarter developers to reconcile the feature race being demanded by the market and the underlying platform complexity that implies. When it comes to selecting a tool, source code analysis should be on your shortlist as it offers a compelling mix of scalability, flexibility and the abiltiy to address a broad set of issues that will help you to ensure the overall quality and security of your code.
About Klocwork_________________________________________________________________________________________________________
Klocwork offers a portfolio of software development productivity tools designed to ensure the security, quality and maintainability of complex code bases. Using proven static analysis technology, Klocworks tools identify critical security vulnerabilities and quality defects, optimize peer code review, and help developers create more maintainable code. Klocworks tools are an integral part of the development process for over 700 customers in the consumer electronics, mobile devices, medical technologies, telecom, military and aerospace sectors.