Operating system survey results

Many thanks to those of you who completed the operating system survey. There were 67 responses received, 56 of which were used for the final analysis. The 11 which were rejected either showed massive browser corruption of the post (many missing fields) or the answers were in some way not suitable for inclusion. If you want to you may (re)analyze the raw data - all email and other identifying information has been stripped, but it is otherwise intact, including even the posts which were rejected (which are prefixed by a "!"). Here is a description of the file's format.

The numbers were crunched and the results are presented in a series of tables.

General overview of the result

Who responded to what

In Table 4. you will see that the response rate varied considerably by operating system. While this was no big surprise, it does mean that insufficient (less than or equal to 5) responses were collected for each question for DG/UX, MVS, Nonstop-UX, OS/400, and UnixWare. MPE responses were only slightly more common, averaging out to a hair over 5 responses per question. Other OS's typically had 10 or more responses per question. Since the number of responses varied with each question, you should weight the significance of each cell in tables 2 and 3 only after checking the corresponding counts in Table 4.

Correlations

While an attempt was made to correlate different answers, the results turned out to be less than informative, since the best correlations were often the result of a few outlier data points. If somebody with better statistical software than I have does this properly I'll be happy to post the results (or a link to the results.)

Medians, Means, and Standard Deviations

Many of those questions with dollar valued fields had very large standard deviations around the mean. In many cases these were due to small numbers of outliers, probably users with "mission critical" needs, who were willing to pay more, a lot more, for some features than would the median user. The median response is therefore usually more informative than is the mean for those questions. In the A vs. B. comparison questions the median values often diverged radically from the mean. For instance, for workstations, medium and large servers the median values assigned to "Win32 compatibility" were $40, $200, and $100, whereas the means were $220, $1697, and $10919. That is, while some people valued this capability, and indicated a willingness to pay an amount roughly commensurate with the machine cost, the majority of respondents didn't see that this capability had very much value for larger machines.

General impressions

I'm not going to interpret these results (much!) for the reader - have a look at the tables and draw your own conclusions, keeping in mind the number of responses which went into each. However, there are a few things which I'd like to draw your attention to.

The median perceived values for the various operating systems on "unit" systems costing $2k $20k, and $200k were not even close to proportional to the system costs, and were in most cases much less than the manufacturers charge for the respective operating systems. The median values and street prices on workstations for the "desktop" operating systems (Macintosh, Windows95, and OS/2) were, however, actually very close, and the margin for Windows NT and Solaris was not large. Linux is, as expected, a bargain, having a perceived value much in excess of the price of purchasing a distribution. On small servers Linux is even more of a bargain (greater difference between value and actual price), while Windows NT once again has a "street price" close to its median perceived value. Also as expected, the perceived value of the Windows95 and Macintosh on larger machines is low or nil, but OS/2 retains a perceived value even on very large machines.

In terms of which operating system factors users valued most highly, it is interesting to see what was least valuable first. This was clearly the number of available applications and Win32 compatibility, with the median, mean, and low standard deviations indicating considerable agreement on these points, across all sizes of platforms. Several people wrote in to express the same sentiment - that the only thing that really matters is whether or not the key application(s) for the machine is available, because the other 999 (or 9999) applications would never be required anyway. Neither the cost of the applications nor operator retraining were seen as very significant factors.

The operating system factors which users valued most varied a bit depending on the size of the machine, but overall, the most valuable feature seems to be reliability in its various guises, with system up time, support quality, and user level software not failing at OS upgrades being considered very valuable on all size machines. Beyond that, the qualities that mattered were very dependent on the size of the machine. For instance, having a stable "cluster" was more important for larger machines than for smaller ones, and code development time was proportionally more important on workstations than on large servers.

Here is a bit of philosophy to mull over - why do people buy the operating systems that they do? After you look at these results for a while you will notice that there is a very poor correlation between market share and most of the scores and values determined in this survey. However, there is a very good correlation between market share and the scores for "manufacturers ability to market the OS", and "software availability" (both personal and workgroup, but not enterprise). Also, there is a good correlation between market share and products whose perceived values are near their street prices.

What I learned from this exercise

Just in case anybody else wants to do a survey like this, here are a few suggestions.

So you want to crunch the data yourself...

The fields in the raw data file are comma separated, one record per line, and each record is laid out as follows.

  If the first character is "!", ignore the whole line.
  (Field numbers + 2 -> Question Number in form.)
  N99999 may appear in any field and indicates "missing data" (the
    browser did not return the field)

    Field    1         I,A,G,O  (Industry, Academia, Government, Other)
    Fields   2 -  72   Integer values
    Fields  73 - 126   E,G,A,P,U,N  (Excellent<->Unacceptable or No Opinion)
    Fields 127 - 180   Blank (no opinion) or integer
    Fields 181 - 414   E,G,A,P,U,N  (Excellent<->Unacceptable or No Opinion)
    Fields 415 - 462   Blank (no opinion) or integer
    Field  463         The browser they used (only in some records)

Here's hoping that somebody (else) finds this useful!

David Mathog
mathog@seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech
11 February, 1998