All MacBU

Pseudo Code

Odysseus asked about Pcode in response to my last post, and Rick Schaut answered on his blog. As Rick noted, we compiled C and C++ into Pcode, as opposed to writing in Pcode. You can think of Pcode kind of like assembly language, or better yet, like Java bytecodes. At the time I joined Microsoft, we weren’t using Pcode for any processor-agnostic reasons because, as Rick mentioned, the Windows and Mac OS API sets are so wildly divergent that there’s no point in compiling for a pseudo ‘common’ processor between them.
So, why use it as our assembly base instead of going directly to PPC assembly? Because a major advantage to Pcode is that it is small! Let me get a little geeky for you here (apologies to non-techy readers. If you want me to elaborate on some of the upcoming discussion in a later post, please ask!)
All PowerPC instructions are always four bytes in size, no more and no less. This means that by definition, the average PPC instruction is also exactly four bytes in size. PCode, on the other hand, is specifically designed to make common PCPU (Pseudo-CPU) ‘instructions’ as small as possible, such that the average PCPU instruction is something like 1.2 or 1.3 bytes in length. Now, obviously instructions have to be an integral number of bytes, so this means that many instructions are only 1 or 2 bytes long, and just a few are 3 or more. This means that code compiled into Pcode is on average only about 1/3 as large as PPC-native code (1.3 divided by 4).
Now, since Pcode is obviously not native PPC instructions, you have to have an interpreter to run the code. The Pcode interpreter was hand-coded assembly totaling roughly 10k or so of native code. I don’t recall for sure, but I believe the average Pcode instruction required roughly 20 native instructions to run (standard fetch, decode, execute loop with common instructions optimized to be as close to native as possible.)
So, you have about 1/3 as much code to load, but you are running it on average 20 times slower. Why do this? Where this matters is when your hard disk and memory subsystems are so slow that it is more expensive time-wise to load data off disk than it is to run the interpreter.
Case in point: Mac OS 7.x would load the entire executable code section of your CFM app into memory before executing a single instruction of your application. One particular release (7.5.3?) had a bug where it would actually load the entire data fork into memory, not just the executable part! This meant that the larger your app was, the longer it took to boot. The VM system on Mac OS 7 and 8 was also particularly slow to page data in and out, so if you had minimal RAM and the app was too large, you’d spend inordinate amounts of time waiting for the disk to get your code ready for you. This meant that using Pcode enabled us to fit our code in much fewer pages of RAM and thus avoid the dreaded disk swap.
Some might ask at this point, why not just factor your app into sections, or overlays, and load and unload them dynamically to avoid having so much code in place? That would let you run all your code natively, right? Well, yes, it would, but notice the word “load” in that last sentence? That’s the whole thing we were trying to avoid — the disk! Also, doing dynamic overlays adds significant complexity to your code. (Anybody ever work on 68k Macs with CODE resources? You had to decide what code to put in what resource because the OS couldn’t handle chunks of code larger than 32k! You were forced into dynamic code, and woe be unto those who scattered their code willy-nilly across segments.) Pcode was just simpler and less prone to runtime bugs.
It’s kind of interesting that my very first major task in the MacBU (around March 1997) was turning on Pcode for Excel 98. I was well suited to it, because I was a student consultant for the Introduction to Assembly course in college. When I took it in the spring of 1993, CS314 used Motorola 68k assembly, and then when I helped teach it in 1995 and 1996 we used PPC assembly, so I was Real Good(tm) with PPC assembly and MacsBug. Ahh, MacsBug.
Anyway, we used Pcode for Office 98 and started developing Office 2001 with it too. Mac OS 9 came out little while prior to Mac Office 2001, and both it and the new hardware Apple was shipping were much faster, so the benefits of code compression suddenly became quite a bit smaller relative to the performance hit of the interpreter. However, we were getting too close to shipping and the risk of removing Pcode was to large, so we shipped Office 2001 still using it.
One of the downsides to Pcode from a developer’s point of view is that it is a pain to debug in anything other than a source-level debugger. When you look at Pcode in MacsBug, all you really see is the interpreter instructions; the compiled Pcode is just a bytestream of data fed to the interpreter. This means that trying to track down a gnarly bug in shipping code that has had symbols removed is a real pain. That, combined with the huge improvements in OS X (real VM, anyone?), larger code caches in the new PPC chips, etc, all made the decision to remove Pcode for Office X pretty easy.
More on assembly for Intel and PPC tomorrow (perhaps).

All MacBU

A Brief History of Mac Office

Pip asked about my experiences with Mac Office’s previous compiler transitions. Now that I stop to recall, that is an interesting topic. I joined Microsoft in September 1996, right before Win Office 97 shipped. (I had actually been an intern the summer before, when the product was optimistically called Office 96 internally!) I worked on Win Excel for a few months, and then joined the MacBU when it was formed.
Prior to the formation of the MacBU, the Win Office team really was the entire ‘Office’ team for both Mac and Win products. The debacle known as Mac Word 6 was one of the reasons the MacBU was formed. When we sprung forth, we took a copy of the integrated Win+Mac Office source code with us, and that source became Mac Office 98.
The ‘interesting’ part of that project is that we developed Mac Office 98 using the same tools that the Win Office team used — Microsoft Visual C++ version 4.0. (Remember how I said that we inherited code that was deemed acceptable to a compiler that wasn’t incredibly standards-compliant at the time?) MSVC4 had a PowerPC cross-compiler (it was used for the PowerPC version of Windows NT) that ran on PCs. So, we edited and wrote our code on the PC, compiled it on the PC, and then pushed it over the network to our Macs and debugged it remotely. The process worked reasonably well but could be horrifically slow, especially if you were trying to view a large memory window.
Well, we managed to ship Mac Office 98 and 98-J, and then turned our attention to what eventually became Mac Office 2001. We knew there were some major limitations to our current dev setup:

  1. The MSVC compiler had a customized set of Apple OS headers, making it very hard to roll out new headers for new Apple technologies (such as the rapidly evolving QuickTime)
  2. Dev productivity was not optimal due to waiting for files to be copied over the network every time
  3. Our system required static IPs for our Macs, which ate up corporate network resources unnecessarily
  4. Various and sundry other issues
  5. We were the MacBU, for Pete’s sake! Why were we working on PCs?

Some MacBU folks began to investigate switching over to CodeWarrior (version Pro 5, or so, i think.) Since I was rather new still, having been at Microsoft for less than two years at that point, I was not involved in the planning or decision-making for that switch. I didn’t actually do much of the conversion work either, being fully involved in making Mac Excel 2001 use Unicode internally. I do recall the conversion taking several months (perhaps 6 all together?) That transition had its own set of headaches, just like our current move to Xcode.
For example, while MSVC only ran on the PC, it could be command-line driven and we had a whole set of scripts, makefiles, custom tools, and other goodies that we used to build Office. All of these had to be redone to work on the Classic Mac OS without a command line. Each custom tool had to be converted from a cmd-line tool to a CodeWarrior plugin. Interestingly enough, we’ve converted now many of them back to cmd-line (Perl and Python where we could; C/C++ still for others) to use with Xcode. Also, CodeWarrior’s parser was picky in different ways from MSVC, so yes, we had scads of errors and warnings to slog through then, too.
Overall, the MSVC -> CodeWarrior transition was shorter and perhaps smoother than our move to Xcode, but Office was a lot smaller then too. We didn’t have Entourage yet, we hadn’t spent 5 more years adding features or migrating more code from Win Office (they hadn’t yet shipped a new version after Office 97 yet), etc. So, it is a little hard to directly compare the transition.
So, all the way back to Pip’s comment: yes, this is at least the 2nd major compiler transition we’ve made. I don’t know what tools were used on versions of Mac Office prior to Office 98 (Rick Schaut would know about Word 5.1 and Word 6). MPW? Maybe, but that would have been a really long time ago!

All MacBU

Predicting the future. Or not.

MacPredict caught my post on our Xcode status, and is asking people to predict when Microsoft will ship a Universal Binary of Mac Office. So far, I see that the ‘consensus’ is January of 2007. Heh.
I’m not going to drop any hints about when we’ll ship, but I do want to point out two things.
1) MacPredict says “it sounds like they haven’t even got it compiling in xCode yet.” That’s not really true. We do have Office building in Xcode; we’re just not 100% done. The vast majority of MacBU devs are using Xcode on a daily basis for the vast majority of their work. We just have a few tricky issues left to work out around a number of core bits of code.
2) The fact that Office builds under Xcode not a very good predictor of when we’ll ship. We’re doing a heck of a lot more in this release than just ‘making it Universal.’ Remember, we were already more than a year into the product development cycle when Steve & Co. announced the Intel switch, with lots of planning done well before Office 2004 shipped.
However, the world just wouldn’t be right without rampant speculation and rumors, so please, go ahead and give it your best guess. Predict away, and we’ll see where we end up.
Full disclosure: I ‘predicted’ June 30, 2006. It’d be nice if we hit this internally… 🙂

All MacBU

A million errors? Really?

Well, yes and no. Yes, when I added up all the errors reported from each of the separate projects that I was trying to build, the total was somewhere over 950,000. However, that does include lots of duplicate errors (for example, from one header that had several errors, and that header was included over and over again), cascading errors (the parser choked on a line, causing it to get confused by the subsequent correct lines of code and thus report lots of bogus errors) and other sundry items that inflated the error count.
So, we didn’t actually have to make 1 million edits to the Office source code, but we did have to slog through a lot of output to clean things up. Including fixing lots of the warnings…
And no, I certainly didn’t do it all by myself. We have a great team of developers at MacBU who collectively tackled this transition. I’ve just been the point man. This long process has been very interesting and I’ve seen a lot of code I’d never looked at before, but I’m ready for us to complete the switch so I can get back to coding real features.
Perhaps tomorrow I’ll discuss the joys of assembly code in a multi-architecture world…

All MacBU

Whither Xcode?

Rick Schaut, one of the devs I work with, had an excellent post back in March about the MacBU’s transition to Xcode. Now that Apple has released Xcode 2.3, I thought I’d chime in with a little more info on our progress.
First, let me acknowledge that as an employee of the MacBU at Microsoft, I too am under NDA both with Apple and with Microsoft. As such, there’s a lot of specific detail I can’t comment on publicly.
I want to start by telling you about my role here. I’m a development lead in the MacBU, with a team of 4 devs reporting to me. (No, Rick is not one of them…) Part of my team’s responsibility is tool evaluation and maintenance. For the last few years I had been our point-of-contact and liaison with Metrowerks. My team tested their pre-releases of CodeWarrior, filed bugs with Metrowerks, updated our code to adhere to their ever-stricter-and-more-standards-compilant compiler, and rolled the updates out to the MacBU as they shipped. Given my experiences with that aspect of our tools and external dependencies, my manager asked me to take the lead dev role in our coordination with Apple as we investigated switching to Xcode.
So, we created a virtual team for me to lead (virtual in the sense that the other devs on the team did not report to me in the official ‘chain-of-command’) and we dug into our source to see what the impact would be. Apple’s announcement last summer about switching to Intel chips took us by surprise. I mean, sure, we had seen all the commentary on the rumor sites and whatnot in June 2005, but way back in early 2004 when we were planning out the schedule for the version of Mac Office to come after Office 2004, we had no inkling of the impending switch and thus had not put any time into our schedules for it! This meant that our early investigations had to not only identify what code would be affected, but how long it would take to change over and how that would affect our shipping schedule. Now, I can’t give any details here, but suffice it to say the changeover would not come for free!
The decision to switch dev tool environments was pretty much a slam dunk, especially after Metrowerks announced they were getting out of the Mac dev tool business and would not be producing any Intel tools. That then set us up to start working with Apple, getting early seeds of what was then Xcode 2.2, and running our code through it.
That was the sound of the fans in my G5 taking off after trying to compile Mac Office in Xcode! You see, Xcode uses GCC, and one of the great things about GCC is that it is incredibly standards-compliant. Unfortunately, one of the really annoying things about GCC is that it is incredibly standards-compliant. Parts of Mac Office date back to the mid-1980s (Remember Excel 1.0 for the Mac?). Office is so large that the older code hasn’t necessarily been brought up to modern styles/conventions/standards, and we routinely migrate code from Win Office into Mac Office. Win Office uses Visual Studio, which wasn’t always so compliant itself, meaning that we sometimes inherited code that needed some friendly compliance coercion. 🙂 Our first passes at compilation generated almost a million errors and several million warnings!
Then, there were the issues in the Xcode IDE itself. Now, I want to be very clear up front that Apple has done a heck of a job with Xcode. They have been incredibly helpful and have provided a ton of assistance to us as we work on the transition. Apple has done a huge amount of work on Xcode in the last couple of years. Xcode, however, is a relatively new IDE whereas CodeWarrior has been around for over 10 years and had become pretty mature. In the last year, we’ve worked with Apple on literally hundreds of feature requests, trivial bugs and critical bugs, suggestions for future releases, documentation clarifications, etc.
That last bit is actually one of the neatest things about this transition from my perspective. Apple’s tools, being so nitpicky about code standards, have helped us find and fix bugs in our code and make the next version of Office a better product. At the same time, the sheer amount of code we’ve thrown at Xcode have enabled us to help Apple make Xcode a better IDE, which in turn helps us be more efficient, write more features, and yes, make a better product! Our two companies can and do help each other improve. It’s a great concrete example of how positive our business relationship really is.
So, where are we now? Almost done. Yes, Apple has shipped Xcode 2.3. No, the MacBU is not totally done with the transition (more of those picky details I can’t actually talk about.) But we’re close! Perhaps one day soon I’ll get an email (I’m at home on parental leave right now) saying we fixed that last line of code and we’ll be fully on the Steve bandwagon.
With that, I’ll close this out for the evening. Its 12:30am and my new baby daughter is starting to wake up and ask for her next bottle, and the bottle always has a higher priority than the blog…