A few thoughts on Parallelism
It’s been a while. I am very sure many people experienced connectivity issues to my server. I believe these issues have been resolved. While searching in my old archives, I stumbled across an essay that I had written as a small part of a bigger exam in the university in France, a while ago. It seemed like an interesting read. It concerns parallelism, how it affects development, and also includes a few thoughts about Automatic Parallelisation.
Parallelisation, how it shapes programming mentality and code reuse / scalability
Parallelism is a huge concept, I myself have been involved in it in a matter of different ways during my involvement with software throughout the years, and I am sure that it changed the way I am thinking about software in general.
In my opinion, thinking how to divide work loads and specific tasks among processors / threads or even individual computers has an immediate impact in the way that a typical programmer builds software, no matter the framework or technology used. By thinking in software design terms, and by taking an example form the software engineering project developed as part of the module, I personally found myself building the sequential approach for the solution of this problem in a different way, just so that it could support the extension of the implementation by utilising a parallel approach. Instead of iterating through all the pieces of a structure to be examined and storing locally the results in a method (still, not a good design, but this approach exists and it is being used), I chose to encapsulate operations and return values in data structures (dictionaries), which in turn had to be stored in other shared data structures (array), and serialise access to the shared structures to avoid race conditions (two threads trying to access the same resource at the same time).
In addition, while studying parallelism, I found myself tackling with fundamental notions which were necessary to implement a decent multithreading application. Today’s frameworks all provide classes that move the developer as far away as possible from the low-level implementation of parallel computing (example: classes for asynchronous downloading on Android, iOS, Windows). However, there are questions that should be answered even for the implementation of such a system:
- How many background threads should run?
- Based on what criteria? For this question, an understanding on the hardware capabilities must be had. Assigning four download operations on a single processor tends to be OK, but assigning two DNA analysis operations on one processor is likely to have the system brought to a halt.
- How much data is it going to get processed? If the memory utilised by the processors / thread is shared, then mutual exclusion mechanisms should be correctly set up. If the memory is distributed, how are we going to efficiently take advantage of the messaging system so that the thread set-up is as lightweight as possible? If the mechanism is taken care of by the OS, which of the two approaches does it use (or does it use a hybrid approach), and how does this play a role?
- Where should all data, before and after processing should be stored? I found out that this is almost as important as ensuring thread-safety through access serialisation on data, since these two concepts are usually intertwined.
- How much time should we allow for an operation to finish? What would be more acceptable levels? Would adding more threads take more time? The answer to this question is usually dependent on hardware and the cores/processors/speed available.
- What if we want to cancel a thread execution? Can we do it safely? Thread cancelling is usually implementation dependent, and thread cleanup is far more complicated. Usually, cancelling a thread means sending a stop signal, which may or may not stop the thread executing immediately, thus forcing utilisation of handling mechanisms for ensuring the stability of the system.
I had to become familiar with all these concepts before I was able to build a performant system.
Moreover, it is my personal feeling that multiprocessing / concurrent programming demands a knowledge of the underlying architecture of the operating system, and in many cases, the hardware on which this system is running. Since parallelising a process or an application requires distributing the work to machines / processes / threads / cores at least a basic understanding of these concepts is necessary in order to identify bottlenecks in the code, and to avoid deadlocks. While one can approach multithreading with a certain level of abstraction, especially with today’s languages and frameworks, there are cases where this approach is not enough.
An example is Audio Queues, a technology used by Apple, serving as a level of abstraction over Audio Units (the most low-level audio mechanism in iOS and OS X). Without giving too much detail, Audio Queues is a C – based framework. While sound is processed (recording / playing / streaming) the framework produces a queue of N packets, ready to be processed one – by – one by a C callback. One could very easily assign the recording to a background thread (either using Grand Central Dispatch) and the processing of each packet to be examined in another thread. While this approach works for iOS devices, the iPhone 3GS (later devices will not have that problem) will crash using this approach (the whole device will reboot), because of how the ARM prioritises threads and sound recording at the same time, due to its inability to process relatively large chunks of data before the next chunk arrives. I have personally stumbled across this situation during development of a medium-scale project. The solution to this problem is non-trivial and out of the scope of this article, but the example serves as a way to indicate that knowing the device on which the thread is going to be executed is indeed important.
I also found out that multithreading usage can have a severe impact on the readability / maintainability of the code. The more parallel-aware an application is, the more the code that has to be written to manage resources, messages between processors and / or avoid race conditions between threads. That is why I consider the use of parallel operations a usually necessary but dangerous approach, if there is a misuse / abuse.
Automatic Parallelisation; Is it a realistic goal for the near future?
I personally believe that the term “realistic goal” is not easily used in software, given the advances of software done over the last year, and also the advances in the logic behind modern systems. I prefer not to give an answer directly, I would rather give an overview of how I perceive this technology, and if it is feasible with today’s standards.
The concept of automatic parallelisation is not new, considering the novelty of modern software, and it dates back to the early 90s, maybe even more. There are a number of automatic parallelisation tools, like Intel’s auto parallelisation feature, or MATLAB’s automatic parallelisation (which is based on some restrictions). These solutions are generally bound to a technology, compiler and / or language, a specific domain (graphics, automotive), or a specific set of problems, such as loop optimization. However, general – purpose parallelism of sequential code in the form of being completely automatic with no human interaction still remains an issue.
In my opinion, the biggest problem in parallelisation is the notion of “decision”. Code analysis tools exist for years, and compilers and tools are getting better at efficiently understanding the code given to them. An example is static analysis tools, which show potential logic errors and memory leaks, not just errors in syntax. So far, the division of the workload is left on the part of the programmer, as it happens with many software features that include the necessity of “decision”.
Other technical problems include, but are not limited to:
- Efficient dependency analysis for code that is to be executed
- Efficiently managing shared resources is difficult
Aside from the technical problems, by seeing some of the most common general – purpose manual parallelisation tools today (OpenMP, OpenMPI, Grand Central Dispatch), I can’t help but noticing that there is a lot of glue code that could be made automatic by these tools. And when it comes to creating general-purpose applications, I can’t help but noticing that the majority of the applications that share some common features (synchronous networking, math computations) also share a number of common solutions. In these cases, I believe that there could be a tool that could identify common patterns and problem solutions, and could parallelise the code automatically. However, this approach raises some practical questions:
- If there was such a tool, would the programmers be willing to lose the control they have now, and leave the parallelisation completely to an automatic tool?
- Who guarantees that this tool would choose the best approach possible? When it comes to approaches, there are many examples where “automatic” is a synonym of “no-stress but a slower and more generic” approach.
- Who / What technology is going to provide the tools with the necessary information (state of the machine, target hardware, end application goals) that are required in order to make the architectural decisions possible? Since it is usually the job of the compiler, are there any cross-platform APIs with which these tools can talk and can make decisions?
Regardless of my personal feelings about the above concerns I believe that these problems are solvable, or at least compromises can be made, especially if it helps speed up software development. I believe that the creation of a unified, multiplatform, general-purpose solution for automatic parallelisation will be feasible at some point in the future. And I would also take the risk of concluding that such a tool / language will be build on top of a virtual machine or some kind of language runtime that is able to identify available resources on runtime, and act accordingly when it comes to thread and resource management.
The real question is, whether it will be as performant when compared to the high-quality manual multithreading code that will surely exist at this point, and wether it will actually convince developers to lose control on this important aspect of software development.
Links
Louis Savain – How to solve the Parallel Programming Crisis
Introduction to Parallel Computing
Automatic Parallelisation with Intel Compilers
OpenMPI Project
http://openmp.llvm.org