Precis

Current programming practices using multithreading are error prone. Deadlocks are a well-known multithreaded programming fault. Moreover, due to the state-explosion problem, it is difficult to guarantee deadlock-free code. Thus, it is not uncommon for deadlocks to occur in production systems.

Deadlocked threads cannot make further progress, and frequently tie up resources requested by still other threads, causing more and more threads to come to a standstill. Thus, a deadlock should not remain undetected and uncorrected for a long time. If deadlock-detection processes are run too frequently, however, valuable system resources may be wasted. Therefore, it is important to choose the right interval between successive deadlock detections.

Deadlock recovery must follow deadlock detection to release held resources in the cyclic wait. It is desirable that programmers be able to implement fine-grained recovery actions such as releasing a resource currently not in use in addition to restarting the entire system. Such fine-grained recovery actions often require the knowledge of program contexts and deadlock states. Unfortunately, modern programming languages lack language-level support for signaling deadlock conditions and for structuring resolution code.

My thesis is that, under the assumption that the time to first deadlock in the system (after a system restart) follows an exponential distribution, a reinforcement-learning approach is effective in scheduling deadlock detection for a restart-oriented system, and that runtime exceptions are a programming abstraction that allows programmers to write fine-grained deadlock recovery code.

My approach to deadlock-detection scheduling as reinforcement learning estimates the deadlock rate and then performs an optimization to find the detection interval that maximizes system utility. It is theoretically proved that this technique finds the best tradeoff, and experimental results suggest that it is a reasonable approximation to assume that the time to first deadlock in the system (after a system restart) follows an exponential distribution

It is natural to consider deadlock occurrences as runtime exceptions because at runtime it is relatively easy to detect actual deadlock occurrences, which represent not only abnormal states but also fatal errors. Thus, exception handlers can be used to resolve deadlock occurrences based on deadlock states and program contexts. Furthermore, because exceptions are a widely used language concept, the technique of deadlock resolution via exceptions is intuitive and practical.

This dissertation study was motivated by my programming practice with Java. Consequently, the use of both techniques are demonstrated via implementations in Java and illustrated by some sample Java programs. However, the two techniques are not restricted to Java. Rather, they can be easily adapted for other modern programming languages supporting exception handling and multithreading.