Book Home Programming PerlSearch this book

Chapter 17. Threads


The Process Model
The Thread Model

Parallel programming is much harder than it looks. Imagine taking a recipe from a cookbook and converting it into something that several dozen chefs can work on all at the same time. You can take two approaches.

One approach is to give each chef a private kitchen, complete with its own supply of raw materials and utensils. For recipes that can be divided up into parts easily, and for foods that can be transported from kitchen to kitchen easily, this approach works well because it keeps the chefs out of each other's kitchens.

Alternatively, you can just put all the chefs into one kitchen, and let them work things out, like who gets to use the mixer when. This can get messy, especially when the meat cleavers start to fly.

These two approaches correspond to two models of parallel programming on computers. The first is the multiprocessing model typical of traditional Unix systems, in which each thread of control has its own set of resources, which taken together we call a process. The second model is the multithreading model, in which each thread of control shares resources with all other threads of control. Or doesn't share, as the case may be (and upon occasion must be).

We all know that chefs like to be in control; that's okay, because chefs need to be in control in order to accomplish what we want them to accomplish. But chefs need to be organized, one way or another.

Perl supports both models of organization. In this chapter we'll call them the process model and the thread model.

17.1. The Process Model

We'll not discuss the process model in great detail here, simply because it's pervasive throughout the rest of this book. Perl originated on Unix systems, so it is steeped in the notion that each process does its own thing. If a process wants to start some parallel processing, then logically it has to start a parallel process; that is, it must fork a new heavyweight process, which by default shares little with the parent process except some file descriptors. (It may seem like parent and child are sharing a lot more, but most of the state of the parent process is merely duplicated in the child process and not really shared in a logical sense. The operating system may of course exhibit laziness in enforcing that logical separation, in which case we call it copy-on-write semantics, but we wouldn't be doing the copy at all unless there were a logical separation first.)

Historically, this industrial-strength view of multiprocessing has posed a bit of a problem on Microsoft systems, because Windows has not had a well-developed multiprocessing model (and what it does have in that regard, it doesn't often rely on for parallel programming). It has typically taken a multithreading approach instead.

However, through heroic efforts, version 5.6 of Perl now implements the fork operation on Windows by cloning a new interpreter object within the same process. That means that most examples using fork in the rest of the book will now work on Windows. The cloned interpreter shares immutable code with other interpreters but gets its own copy of data to play with. (There can still be problems with C libraries that don't understand threads, of course.)

This approach to multiprocessing has been christened ithreads, short for "interpreter threads". The initial impetus for implementing ithreads was to emulate fork for Microsoft systems. However, we quickly realized that, although the other interpreters are running as distinct threads, they're running in the same process, so it would be easy to make these separate interpreters share data, even though they don't share by default.

This is the opposite of the typical threading model, in which everything is shared by default, and you have to take pains not to share something. But you should not view these two models as totally distinct from each other, because they are both trying to bridge the same river; they're just building from opposite shores. The actual solution to any parallel processing problem is going to involve some degree of sharing, together with some degree of selfishness.

So over the long run, the intent is to extend the ithreads model to allow as much sharing as you need or want. However, as of this writing, the only user-visible interface for ithreads is the fork call under Microsoft ports of Perl. We think that, eventually, this approach will produce cleaner programs than the standard threading approach. Basically, it's easier to run an economy where you assume everyone owns what they own, rather than assuming that everyone owns everything. Not that people aren't expected to share in a capitalist economy, or peculate[1] in a communist economy. These things tend toward the middle. Socialism happens. But with large groups of people, sharing everything by default only works when you have a "head chef" with a big meat cleaver who thinks he owns everything.

[1] peculate: v.i., to swipe the People's Property from the commons in the middle of the night; to embezzle from the public something that is not necessarily money (~L. peculiar, "not common"), cf embrace, extend, GPL.

Of course, the actual government of any computer is run by that fascist dictator known as the operating system. But a wise dictator knows when to let the people think they're capitalists--and when to let them think they're communists.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.