Mac OS X: beautiful but...

The Mac OS X (Server) operating system can't be described easily. Apple:

"Mac OS X Server starts with Darwin, the same open source foundation used in Mac OS X, Apple's operating system for desktop and mobile computers. Darwin is built around the Mach 3.0 microkernel, which provides features critical to server operations, such as fine-grained multi-threading, symmetric multiprocessing (SMP), protected memory, a unified buffer cache (UBC), 64-bit kernel services and system notifications. Darwin also includes the latest innovations from the open source BSD community, particularly the FreeBSD development community."

While there are many very good ideas in Mac OS X, it reminds me a lot of fusion cooking, where you make a hotch-potch of very different ingredients. Let me explain.


Hexley the platypus, the Darwin mascot

Darwin is indeed the open Source project around the Mach kernel 3.0. This operating system is based around the idea of a microkernel, a kernel that only contains the essence of the operating system, such as protected memory, fine-grained multithreading and symmetric multiprocessing support. This in contrast to "monolithic" operating systems, which have all of the code in a single large kernel.

Everything else is located in smaller programs, servers, which communicate with each other via ports and an IPC (Inter Process Communication) system. Explaining this in detail is beyond the scope of this article (read more here). But in a nutshell, a Mach microkernel should be more elegant, easier to debug and better at keeping different processes from writing in eachother's protected memory areas than our typical "monolithic" operating systems such as Linux and Windows NT/XP/2000. The Mach microkernel was believed to be the future of all operating systems.

However, you must know that applications (in the userspace) need, of course, access to the services of the kernel. In Unix, this is done with a Syscall, and it results in two context switches (the CPU has to swap out one process for another): from the application to the kernel and back.

The relatively complicated memory management (especially if the server process runs in user mode instead of kernel) and IPC messaging makes a call to the Mach kernel a lot slower, up to 6 times slower than the monolithic ones!

It also must be remarked that, for example, Linux is not completely a monolithic OS. You can choose whether you like to incorporate a driver in the kernel (faster, but more complex) or in userspace (slower, but the kernel remains slimmer).

Now, while Mac OS X is based on Mach 3, it is still a monolithic OS. The Mach microkernel is fused into a traditional FreeBSD "system call" interface. In fact, Darwin is a complete FreeBSD 4.4 alike UNIX and thus monolithic kernel, derived from the original 4.4BSD-Lite2 Open Source distribution.

The current Mac OS X has evolved a bit and consists of a FreeBSD 5.0 kernel (with a Mach 3 multithreaded microkernel inside) with a proprietary, but superb graphical user interface (GUI) called Aqua.

Performance problems

As the mach kernel is hidden away deep in the FreeBSD kernel, Mach (kernel) threads are only available for kernel level programs, not applications such as MySQL. Applications can make use of a POSIX thread (a " pthread"), a wrapper around a Mach thread.


Mac OS X thread layering hierarchy (Courtesy: Apple)

This means that applications use slower user-level threads like in FreeBSD and not fast kernel threads like in Linux. It seems that FreeBSD 5.x has somewhat solved the performance problems that were typical for user-level threads, but we are not sure if Mac OS X has been able to take advantage of this.

In order to maintain binary compatibility, Apple might not have been able to implement some of the performance improvements found in the newer BSD kernels.

Another problem is the way threads could/can get access to the kernel. In the early versions of Mac OS X, only one thread could lock onto the kernel at once. This doesn't mean only one thread can run, but that only one thread could access the kernel at a given time. So, a rendering calculation (no kernel interaction) together with a network access (kernel access) could run well. But many threads demanding access to the memory or network subsystem would result in one thread getting access, and all others waiting.

This "kernel locked bottleneck" situation has improved in Tiger, but kernel locking is still very coarse. So, while there is a very fine grained multi-threading system (The Mach kernel) inside that monolithic kernel, it is not available to the outside world.

So, is Mac OS X the real reason why MySQL and Apache run so slow on the Mac Platform? Let us find out... with benchmarks, of course!

The G5 as Server CPU Mac OS X versus Linux
Comments Locked

116 Comments

View All Comments

  • mongo lloyd - Tuesday, June 7, 2005 - link

    At least the non-ECC RAM, that is.
  • mongo lloyd - Tuesday, June 7, 2005 - link

    Any reason for why you weren't using RAM with lower timings on the x86 processors? Shouldn't there at least have been a disclaimer?
  • jhagman - Tuesday, June 7, 2005 - link

    OK, this clears it up, thanks.

    One little thing still, what is the number you are giving in the ab results table? Is it requests per second or perhaps the transfer rate?

  • demuynckr - Tuesday, June 7, 2005 - link

    jhagman:
    As i mentioned before, we used gcc 3.3.3 for all linux, and gcc 3.3 mac compiler on apple, because that was the standard one.
    I did a second flops test with the gcc 4.0 compiler included on the Tiger cd, and the flops are much better when compiled with the -mcpu=g5 option which did not seem available when using the gcc 3.3 Apple compiler.
    As for ab i used these settings,
    ab -n 100000 -n x http://localhost/

    x for the various concurrencies: 5,20,50,100,150.
  • spinportal - Monday, June 6, 2005 - link

    Guess there's no one arguing that the PPC is not keeping its paces with the current market, but rather OS/X able to do Big Iron computing. And if rumors be true, where will you be able to get a PPC built once Apple drops IBM for Intel?
    In a Usenet debate in 93, Torvalds and Tannenbaum go roasting Mach microkernel vs. the death of Linux. Seems Linus' work will be seeing more light of day, and Mach go the way of the dodo. Will Apple rewrite OS/X for Intel x86/64? As far as practical business sense, that's like shooting off one's leg foot.
  • spinportal - Monday, June 6, 2005 - link

  • jhagman - Monday, June 6, 2005 - link

    Could you please give the exact method of testing apache with ab? It is really hard to try to redo the tests when one does not know which methodology was used. The amount of clients and switches of ab would be appreciated.

    Also an answer to why Apple's newest gcc (4.0) was not used would be an interesting one and did you _really_ use gcc 3.3.3 and not Apple's gcc?

    Other than these omissions I found the article very interesting, thanks.
  • demuynckr - Monday, June 6, 2005 - link

    Yes I have read the article, I also personally compiled the microbenchmarks on linux as well as on the PPC, and I can tell you I used gcc 3.3 on Mac for all compilation needs :).
  • webflits - Monday, June 6, 2005 - link

    demuynckr, did your read the article?

    "So, before we start with application benchmarks, we performed a few micro benchmarks compiled on all platforms with the SAME gcc 3.3.3 compiler. "


    BTW I ran the same tests using Apple's version of gcc 3.3
    As you can see my 2.0Ghz now beats the 2.5Ghz on 5 of the 8 tests, and a 2.7Ghz G5 would be on par with the Opteron 250 when you extrapolate the results.

    Lets face it, Anandtech screwed up by using a crippled compiler for the G5 tests


    ----------------------------
    GCC 3.3/OSX 10.4.1/2.0GHz G5

    FLOPS C Program (Double Precision), V2.0 18 Dec 1992

    Module Error RunTime MFLOPS
    (usec)
    1 4.0146e-13 0.0140 997.2971
    2 -1.4166e-13 0.0108 648.4622
    3 4.7184e-14 0.0089 1918.5122
    4 -1.2546e-13 0.0139 1076.8597
    5 -1.3800e-13 0.0312 928.9079
    6 3.2374e-13 0.0182 1596.1407
    7 -8.4583e-11 0.0348 344.3954
    8 3.4855e-13 0.0196 1527.6638

    Iterations = 512000000
    NullTime (usec) = 0.0004
    MFLOPS(1) = 827.5658
    MFLOPS(2) = 673.7847
    MFLOPS(3) = 1037.6825
    MFLOPS(4) = 1501.7226
  • demuynckr - Monday, June 6, 2005 - link

    Just to clear things up: on linux the gcc 3.3.3 was used, on macintosh gcc 3.3 was used (the one that was included with the OS).

Log in

Don't have an account? Sign up now