All MacBU

Risks and Rewards

A little over two years ago, I wrote a long post about bugs. In that post, I briefly discussed a number of reasons why not every known bug gets fixed in any particular dot-release of a product. I’m going to go into a bit more detail about one of those reasons today, so allow me to quote myself from that post:

Some bugs are easier to hit, but fixing them may have a high risk of regression and may cause another bug that is even worse. One example might be a performance bug in Excel’s recalc engine. We could fix it and make things faster, but currently the code calculates correctly (albeit slowly) and any fix might totally break all function dependency analysis. It is sad but safer to leave things alone.

Let’s look at performance as an example. Over the ten or so months that Office 2008 has been available, we’ve received a lot of feedback about the suite. One of the most common areas mentioned is performance, such as the amount of time it takes Office apps to boot, the amount of time it takes a chart to draw, or the amount of time it takes Excel to recalculate a worksheet in particular. Each of these three areas are places where we’ve actually put in some significant research and investment in improving Office 2008 performance. Each of the Office 2008 updates (12.0.1, 12.1, 12.1.1, 12.1.2, and 12.1.3) have contained a number of performance improvements, among many other fixes. That said, we’re continuing to make additional improvements. Some of the code changes needed are invasive and we’ve had to evaluate the risk-vs.-reward tradeoffs and make decisions to delay these changes until a more appropriate time.
Let’s look at one of these scenarios. Pretend you are working in Excel and want to create a simple line chart with a large data set that contains as many as 30,000 data points. I’ll presume that you have a pretty good monitor, perhaps running at a resolution of 1600×1200 pixels. Now, your chart probably doesn’t actually fill the entire screen from left-to-right — after all, you need space to show the columns of data that you are working with, right? So let’s further assume that your chart is roughly 1000 pixels wide. That means you are displaying 30,000 data points in 1000 pixels, or about 30 data points per pixel. Even if not all of those data points fall directly on top of each other in the y-axis, the only possible line to draw to connect all 30 of those points in one vertical pixel slice is a single line up and down and up and down again, on top of itself many times. Your chart looks kind of smushed together. Or, let’s pretend for the sake of simplicity that all those 30 data points are approximately the same on the y-axis, and you just get one single pixel, drawn over and over and over again. And, as you may have noticed, the current charting engine that we share with Windows Office is rather sluggish as it faithfully draws that line or pixel over and over and over again.
Now, the fact that 30 data points end up in one pixel slice on the x- axis isn’t the charting engine’s fault, as it didn’t choose the data set or the size of the chart — you did, or you at least accepted the charting defaults. Now, to improve things, you could stretch the chart out to be a full 1600 pixels wide, right? But that doesn’t help much; you’re still at almost 19 data points per pixel. Or, if you had $17,207+tax on hand, you could set up a Mac Pro with four ATI Radeon HD 2600 XT video cards and eight 30″ Apple Cinema Displays (2560 pixels of horizontal space per monitor) for a total horizontal space of 20,480 pixels, which is still not enough space to achieve one data point per pixel! Ok, so perhaps it’s not reasonable to ever expect that such a data set would ever have more than one pixel available for each data point. What would you do? How might you redesign the charting engine to handle this?
One option might be to only plot at most one data point per pixel. But, how do you decide which of the 30 data points is the best one to plot? The first? The middle one? The average value? Do you add math to best-fit the overall line, which adds processing time, which is what you are trying to cut down on? I’ve only mentioned one single possible change, and there are many. The point here isn’t to actually craft a redesign in this blog post (or in the comments either…), but to demonstrate that a redesign can be a complex thing — I’ve posed some design questions and probable decision points already, and I haven’t even considered how the current code is actually designed. (Neither have you, since you don’t have access to it!)
This example is just a very long way of saying that whatever redesign is planned, it has to fit into the current architecture of the code or be willing to accept the risk of changing that architecture and possibly breaking something else. That breakage may in fact be worse than the current bug. If we make a major change to the code, we run the risk of introducing a variety of different possible bugs — incorrect visual representation of the data, or a crash, etc. Testing architecture changes is particularly challenging because it’s impossible to craft a series of tests that exercises every possible type of data that could be charted, and if we miss one edge case we could let a critical bug slip through that is worse than the current problem. We’d rather make the change when we have sufficient testing resources lined up for a longer period of time than we usually have for a dot-release. That way we have all the test coverage needed to validate the change, and is why a bug fix in a feature as important as charting may be delayed.
Bear with me while I shift gears now, and talk about another issue that the MacBU has heard a lot about: Office 2008 and the OS X feature called Spaces. If you read through the links in that previous sentence, a couple of themes pop up:

  1. Mac Office 2008 doesn’t work properly with Spaces
  2. It happens most often in Word or when the Formatting Palette is open
  3. People rarely see the bug in non-Microsoft applications
  4. People assume the Mac Office 2008 code base is the cause of the problem

Let me give you some of the background of the Formatting Palette, to help explain why the problem shows up so much more readily in Office 2008 than in Office 2004 or in other applications.
When people talk about the “Formatting Palette” in Office 2008, they usually mean the Toolbox window. The Toolbox is actually two separate windows, bound together by Carbon Window Groups. The first window has the title bar and the row of buttons across the top (the buttons that toggle between the Formatting Palette, the Scrapbook, the Reference Tools, the Object Palette, and whatever else is there that I can’t remember off the top of my head.) That first window is a true floating window created by OS APIs. The second window is everything below that row of buttons, and is the instantiation of one of those toolbox items. These windows are slightly customized, in that we tell the OS to create them with no border or shadow, again through OS APIs. When the Formatting Palette is showing, you’ve actually got the root toolbox window showing first and then the FP window bound tightly to it, on top in the z-order. If you click on the Scrapbook button, the FP window is destroyed and a new window is created to hold the scrapbook, and that new window is bound against the root window. I think that Spaces and Exposé don’t take the window bindings into account (my understanding is that they manipulate windows at the Core Graphics level, which is a lower-level private system interface upon which both Cocoa and Carbon windows sit), and that is why Spaces and Exposé seem to get confused by the root floating window and the upper child window.
The reason MacBU uses this window separation is that most of these child windows are hosted in different modules of code, most of which have their origins in different architectures. The Carbon Window Group APIs allow for very rich and precise control over how windows are presented to the user, and gave us the ability to combine UI from a variety of sources in our codebase with minimal rearchitecting of each of the individual components. The Scrapbook window, for example, is a PowerPlant window because it actually lives deep in the Entourage code (due to the fact that Entourage is currently PowerPlant-based, and that was the easiest way to get access to the Entourage database). PowerPlant is very picky about owning its entire window, which is why we use a separate window here — it misbehaves rather badly if you try to put PowerPlant objects in a sub-frame of a window that is not fully under PowerPlant’s own control. The Formatting Palette is actually a special instantiation of the toolbar code, which has its own assumptions about the sort of window it lives in, and the Compatibilty Report is actually an instantiation of what was originally a modeless dialog.
We have long-term plans to overhaul the entire architecture of the Toolbox and all its clients to use Cocoa, but that didn’t happen in 2008. The Cocoa AppKit window APIs do not yet contain functionality that supports the full richness of window management features that the Carbon APIs do. The Toolbox and its use of Carbon Window Groups were introduced in Office 2004 and predate both Spaces and Exposé. The Office 2004 Toolbox has the same issues with Spaces and Exposé, but you only notice it if you show the Toolbox. In Office 2004, the Formatting Palette was separate from the Toolbox, so the Toolbox was not shown by default. In order to reduce the amount of screen space the Toolbox and Formatting Palette obscured at the same time, we merged the two together early in the 2008 cycle, long before Leopard and Spaces were demoed or available for us to test with in beta.
After we received a beta of Leopard with Spaces, we tested our apps and identified a number of issues that our apps have with the feature. We had an engineer spend several days digging into these issues. He did some serious spelunking into our windowing code and determined that we were not moving the windows incorrectly in our code so we reported them to Apple to investigate. Apple has fixed a number of problems with Spaces in OS X 10.5.3, but some still remain.
So, let’s circle this discussion back to the point I opened with, and the real point of this post: some changes are just too risky for dot-releases, and every company that writes software has to deal with that. Microsoft does and certainly Apple does too. I don’t know the Spaces code — as I don’t work for Apple, I’ve never seen it. Microsoft and Apple work together to troubleshoot customer issues as situations warrant, and as part of that joint effort I’ve spent some time talking to some of the Apple Carbon and Spaces developers over the last few weeks about Spaces. We wanted to see if there’s any sort of change we could make in our code to avoid the issues. If there was anything we could reasonably change in our code at this time I would love to do so. However, changing our windowing system to not use Carbon Window Groups entails a complete rewrite, and is not something we can feasibly do in a bug-fix release. Given the direction Apple is taking with OS X, any significant rewrite of our windowing system should be done in Cocoa, and that is a tremendous task to do in a dot-release. It would almost certainly cause other serious bugs as I alluded to in the charting example above. The Apple developers I’ve spoken with have been unable to come up with any simple code changes to that we could make to work around the issues, and have indicated that our code is generally acting correctly.
For now, you’ll have to wait for Apple to release an update to or new version of OS X where Spaces works with apps that use Carbon Window Groups. Our User Experience team has put up a brief help topic about the issue. We’re keenly aware of our customers’ frustration with Office 2008 and Spaces, and we will continue to work with Apple to help find a solution or workaround. Please continue to share your product experiences with us on Mactopia, or privately with me if you wish. As long-time software developers, it’s frustrating to me and to my peers when we’re unable to fix every problem instantly. Improving your experience with Mac Office on a continuous basis is part of our job. Sometimes we can do it with a quick fix, sometimes not. Your input helps us get it done.
(Edited to clean up the extraneous formatting tags in the last paragraph. Also fixed spelling of Mactopia.)