
A Business Case
for
Windows Server Optimization
TCOnow! for Windows Server Optimization allows one to quickly and accurately compare the economics of choosing a
TCOnow! allows one to analyze the TCO for the competing platforms using the following cost categories:
While one can carry out this analysis without consolidating servers many companies will find that it is an economic imperative to carry out some kind of consolidation or rationalization of their existing server environment. There are five different consolidation strategies that one may elect to select from. Each strategy has its own unique effect on the number and type of new servers necessary, software costs, services costs, and your personnel requirements. The five consolidation options available from TCOnow! include:
Depending on your workload profile and your specific situation, your benefits from consolidation may include:
If you are migrating or consolidating to Windows Server 2003, you may be able to achieve even greater savings due to such features as:
Given the information entered into CIOview’s TCOnow! software, the financial costs and benefits of deploying your chosen application using a
Total Cost of Ownership for
Total Cost of Ownership for

The Total Cost of Ownership for your chosen application using your selected platform is based upon the assumptions in Table 1:
TCO Assumptions | |||
Implement a SAN? |
| ||
Hours of operations |
| ||
Basis for costing downtime |
| ||
Size for current or future needs |
| ||
Table 1 represents the most obvious factors that can immediately impact the TCO of either solution. However, there is a wide range of other factors such as workload, type of server etc., that can all contribute to a significantly different TCO result. A large part of this report is dedicated to explaining what factors most influence the TCO and how these can be manipulated to ensure that the optimal platform is selected.
Costs
Table 2 depicts the cost categories that have been selected for this analysis.
Cost Category | Included? |
Servers | |
Software | |
Storage | |
Network | |
Services | |
Training | |
Facilities | |
Ongoing Personnel | |
Downtime | |
Support and Maintenance |
The comparative costs of deploying your workload using a
Cost Category | Delta | ||
Servers | |||
Software | |||
Storage | |||
Network | |||
Services | |||
Training | |||
Facilities | |||
Ongoing Personnel | |||
Downtime | |||
Support and Maintenance | |||
Total Cost of Ownership |
Table 3 shows the cost for each solution in each major cost category. It is important to recognize that many of the cost categories feed into each other. As a result, making changes to one category will tend to ripple throughout the entire TCO model. Attaining the lowest TCO tends to an iterative process requiring a great deal of “what-if” analysis. The financial rewards of such an effort can be substantial and in many cases point to other IT endeavors that could benefit from a similar approach.
Table 4 presents the recommended configuration for the two platforms being examined in this analysis. Since the server configuration effectively represents the keystone to TCO, getting the server configuration right is fundamental to an accurate TCO analysis.
Consolidation Strategy
Your consolidation strategy will play a major role in the type and number of servers necessary. If you do not consolidate at all, you will be able to use your existing servers but may need to purchase a large number of inefficient servers as your application portfolio grows. If you have chosen a Server Migration, you will typically need a larger number of servers. If you have chosen a Server Virtualization strategy, you can typically purchase a smaller number of very large multiprocessor servers that can be partitioned. If you are consolidating applications or running a Multiple workload consolidation, you can typically reduce the amount of server resources needed to inefficiently handle small application instances and reduce the overall number of servers necessary.
Current Environment
Please note that if you have chosen to size for your current as opposed to future needs, you will not see any quantity recommended servers for your Current Environment. However, if you choose to size for all of your future growth, you will see TCOnow!'s estimate of the servers necessary to accommodate the growth in your Current Environment.
Category | Vendor 1: | Vendor 2: |
Mixed Workload Servers | ||
File and Print Servers | ||
Email Servers | ||
Database Servers | ||
Line of Business Application Servers | ||
Test and Development Servers | ||
QA Servers | ||
Backup Servers | ||
Other Servers |
Table 5 compares the key system metrics and Total Cost of Ownership (TCO) for the chosen vendors. Table 5 represents a summary method to review the basic TCO assumptions and ensures there is no single item that has been misrepresented. Remember, your consolidation strategy will have a major impact on your overall TCO.
Category | Vendor 1: | Vendor 2: |
Number of Existing Production and Non-production Servers | ||
Number of New Production Servers | ||
Number of New Non-production Servers | ||
Storage Dollar Cost per Megabyte Used | ||
Number of IT Full Time Equivalent (FTE) | ||
Estimated System-wide Availability | ||
Total Cost of Ownership |
TCOnow! for Windows Server Optimization uses a minimum number of questions to generate rapidly a reasonable estimate of your costs if you migrate your Windows NT4 systems to Windows Server 2003; if you consolidate with Windows 2000 Server or Windows Server 2003; or if you consolidate on Windows 2000 Datacenter or Windows Server 2003 Datacenter Edition compared to UNIX. TCOnow! also will help you understand the implications of consolidation strategies and high availability requirements.
The results of this analysis are based on the workload characteristics you specified, your current environment, and your personnel assessment. Tables 6A through 6C summarize your consolidation needs.
Category | New Environment | Current Environment |
File and print infrastructure | ||
Email infrastructure | ||
Database infrastructure | ||
Line of business application infrastructure | ||
Test and development infrastructure | ||
QA and backup infrastructure |
Category | UNIX | Windows |
What is the size, in CPUs, of the largest single SMP (Symmetric Multi-Processor) server that can run your file and print applications without a substantial reduction in processor scaling capability? | ||
What is the size, in CPUs, of the largest single SMP server that can run your email applications without a substantial reduction in processor scaling capability? | ||
What is the size, in CPUs, of the largest single SMP server that can run your database applications without a substantial reduction in processor scaling capability? | ||
What is the size, in CPUs, of the largest single SMP server that can run your line of business applications without a substantial reduction in processor scaling capability? |
Workload | % of Total | Application Users | Environment | # of Unique Instances | # of Users Accessing Largest Instance |
File and print | Current Environment | ||||
Email/Communication | Current Environment | ||||
Database | Current Environment | ||||
Line of business application | Current Environment | ||||
Your availability requirements will play a major role in determining whether or not you require redundant servers or highly available server clusters. If you require an application recovery time of 4 hours or less you will require redundant servers. If you require an immediate application recovery time then you will require clustered servers. Using redundant servers can typically reduce your application failover time by 25% but requires you to purchase twice as many servers and to devote some minimal staff resources to maintaining these servers. Deploying a cluster will allow you to reduce your application recovery time by 60%-90% depending on the type of cluster you set up. However, deploying a clustered server environment may require you to purchase extra production servers, extra server software, special clustering software, and redundant storage. You will also require more personnel to manage your now-larger infrastructure. Tables 7 summarize your availability needs.
To see the effect of clusters on your server requirements, please refer to the Servers section of this business case. To see the effect of clusters on your downtime, please refer to the Downtime section of this business case.
Category | Value |
What are your hours of operations? | |
Will a single system application server component failure cause your application to suffer a performance decline or a full application failure? | |
If a file and print system component fails and causes a file and print application failure, what is the maximum allowable wait for a replacement file and print server part? | |
If an email system component fails and causes an email application failure, what is the maximum allowable wait for a replacement email server part? | |
If a database system component fails and causes a database application failure, what is the maximum allowable wait for a replacement database server part? | |
If a general line of business application server system component fails and causes a line of business application failure, what is the maximum allowable wait for a replacement line of business application server part? | |
Which of your chosen workloads is used most by your IT and business departments to estimate and cost system-wide availability? |
Your TCO is also affected by infrastructure items such as the amount of storage necessary, the opportunity to consolidate storage, the type of software licenses necessary, and your expectations of future application and data growth. Tables 8A through 8C summarize these factors.
Category | Value |
How many gigabytes of storage are necessary? | |
What level of RAID do you need on your storage? | |
Will you implement a storage area network (SAN) | |
Assuming no change to your storage infrastructure, how much of your existing storage do you expect to be able to still use if you migrate or consolidate? | |
If you are migrating or consolidating onto Windows Server 2003, can your chosen SAN hardware take advantage of Volume Shadow Copy Service? |
Category | Value |
Have you previously purchased Windows 2000 client access licenses for your user base? | |
Have you previously purchased user licenses for any email software you will consolidate? | |
Have you previously purchased systems management user licenses for your user base? |
Category | Requirement |
Annual average user growth rate for your application | |
Annual average storage growth rate | |
Will you size your servers for your initial needs or to account for all of your expected growth? |
The decision whether to size a system for current requirements or to include the growth requirements envisioned for the investment period is an important question. In sizing for future growth, acquisition costs will obviously be higher and in the early years support, software and perhaps even downtime will add to the TCO. However the advantage is no interruption or large-scale hardware additions will be required. This is a question that is ripe for “what-if” analysis and is a good example that the beauty of TCO lies in the details.
Over time, the costs associated with your two competing platforms can also be compared. For your first selection,
Initial | Year 1 | Year 2 | Year 3 | Year 4 | Year 5 | |
Servers | ||||||
Software | ||||||
Storage | ||||||
Network | ||||||
Services | ||||||
Training | ||||||
Facilities | ||||||
Ongoing Personnel | ||||||
Downtime | ||||||
Support and Maint. | ||||||
Total By Year |
For the second platform,
Initial | Year 1 | Year 2 | Year 3 | Year 4 | Year 5 | |
Servers | ||||||
Software | ||||||
Storage | ||||||
Network | ||||||
Services | ||||||
Training | ||||||
Facilities | ||||||
Ongoing Personnel | ||||||
Downtime | ||||||
Support and Maint. | ||||||
Other A | ||||||
Other B | ||||||
Total By Year |
The economics of consolidating from one environment to another routinely represents annual savings of millions of dollars to many companies. As a result, it is appealing to spend some time to understand the major cost drivers for TCO and to complete the necessary “what-if” analysis to design the optimal solution. Inherently figuring out the best platform is an iterative process that requires:
This kind of sophisticated iterative form of analysis cannot be completed using a series of spreadsheets and so called “stubby pencil” since there are simply to many variables to keep track of. As a result this report was prepared using CIOview’s TCOnow! as a way to reduce the labor involved and simplify the analysis process.
This report is designed to provide a simple side-by-side comparison of migrating to one environment over another. It looks in detail at the total costs associated with:
For this report each of these cost categories has been examined in great detail using CIOview's TCOnow! software. This platform analysis software allows a consistent evaluation approach to be in place and, perhaps most importantly, ensures different model scenarios can be easily tested under a variety of "what-if" scenarios. For example, one can see rapidly how consolidating existing IT functions, such as File/Print functions, to newer, faster servers can result in the need for fewer servers and a corresponding reduction in IT resources. Then one can proceed to drill down on the major cost categories to review and possibly change the underlying assumptions. All cost categories have at least two levels of detail that can be examined and subsequently changed. In fact, you can drill down to the level of changing the cost of power to reflect your local electricity costs as a factor in determining facilities costs.
Irrespective of whether you utilize a software product, sophisticated spreadsheets or a “stubby pencil” to appreciate the potential economic consequences of consolidation and migration, the very same factors have to be taken into account.
The phrase “server consolidation” has a multitude of meanings that are typically very different from each other. In a sense using the word consolidation can be as vague and general as the term ROI. Many companies are familiar with the problems that arise when different groups think of an ROI in different ways and the resulting time wasted mis-communicating proposals or embarking down the wrong project path. Considering that a consolidation often involves making changes to crucial parts of your IT infrastructure, there are many opportunities to foul up if different people are assigning very different meanings to the words Server Consolidation.
Your first and most important task in carrying out a server consolidation is then to decide what you mean by Server Consolidation and make sure that everyone involved in your project is using it. You will need to define:
Of course, it helps to have a set of consolidation strategies that already answer some of these questions for you. There are a vast number of different consolidation strategies that include server, storage, and network consolidation components. However consolidating a network is a mammoth task separate from this analysis and so CIOview has not included strategies such as forward and backward consolidation. The five strategies available in TCOnow! for Windows Server Optimization are listed below. Click on any of the hyperlinks to be taken to an explanation of the consolidation strategy:
Please read the following explanations and then choose a consolidation model at the bottom of the screen: |
Operating System Migration - You will carry out an in-place operating system upgrade from Windows NT4 to either Windows 2000 Server or Windows Server 2003. The upgrade will be 1:1. Server Migration - You will migrate your current Windows NT4 and/or UNIX servers on to new UNIX or Windows servers. Your migration will be 1:1 meaning that you do not consolidate the total number of servers. Server Virtualization - You will use virtualization technology to migrate your current servers onto new UNIX or Windows servers that are divided into partitions; each partition has its own operating system and application installed. UNIX server virtualization is achieved with Sun Dynamic System Domains, HP nPars, and IBM Dynamic LPAR while Windows server virtualization (available soon) can be achieved with Windows Virtual Server or with hardware partitioning on IBM xSeries or Unisys. Single workload consolidation - You will migrate to a new operating system and consolidate your current Windows NT4 and/or UNIX environment to newer UNIX or Windows servers. You will take advantage of new Windows technology to combine a single workload or application running on multiple servers to one server or possibly a high-availability cluster. Multiple workload consolidation - You will migrate a number of Windows NT4 and/or UNIX applications onto one large Windows server. Multiple workload consolidation involves using Windows System Resource Manager or third party-tools such as Unisys Server Sentinel to allocate capacity to different types of workloads on the same operating system. |
If you choose OS Migration, you can only look at 1) the ongoing cost of your current Windows NT4 environment compared to 2) the cost of an in-place upgrade to Windows 2000 Server or Windows Server 2003 using your existing server hardware. This approach provides the least benefit since you will not get any consolidation benefits whatsoever; however, you also have the least risk since you do not need to incur the cost and time for a hardware replacement. An OS migration is likely to benefit your operations in the areas of software license and support costs, personnel, and downtime.
If you choose Server Migration, you can look at either the ongoing cost of your current environment, the cost to migrate to new Sun, HP PA-RISC, or IBM pSeries servers running Linux, the cost to migrate to new Intel servers running Windows 2000 Server or Windows Server 2003, or the cost to migrate workloads onto specially configured servers running Windows 2000 Datacenter Server or Windows Server 2003 Datacenter Edition. A server migration involves simply replacing existing servers with faster, more space efficient servers.
If you choose Server Virtualization, you can look at either the ongoing cost of your current environment, the cost to consolidate servers using Sun Dynamic System Domains, HP-UX nPars, or IBM AIX DLPA, the cost to migrate to new Intel servers running Windows 2000 Server or Windows Server 2003, or the cost to consolidate onto servers running Windows 2000 Datacenter Server or Windows Server 2003 Datacenter Edition that are divided into separate hardware partitions. Unlike a simple server migration, you will use available partitioning technology to divide larger servers up into smaller subpartitions that can more efficiently run your application (due to server scaling and data latency) and reduce your hardware footprint.
While a server virtualization approach may require purchasing more expensive servers and additional operating system software, you can typically reduce the number of servers necessary as well as the complexity of your datacenter environment. Server virtualization also allows you to run different applications or test and QA environments all on the same server with each application or instance in its own isolated section - this can allow cluster-in-a-box functionality that has the potential to substantially reduce downtime. However, since you are not actually consolidating your applications, your virtualized environment must match 1:1 to your previous physical environment. You may find that therefore your personnel costs are relatively unchanged compared to a server migration unless you use proprietary virtualization technology such as IBM DLPAR.
Currently, TCOnow! for Windows Server Optimization supports server virtualization using Sun Solaris, HP-UX, and IBM AIX, as well as IBM xSeries EXA physical partitioning for Windows and Unisys's Windows CMP partitioning technology. Dell and HP ProLiant servers can be partitioned using virtualization technologies such as Microsoft Virtual Server or VMWare- check soon for more information on virtual partitioning.
If you choose Single Workload Consolidation, you can look at either the ongoing cost of your current environment, the cost to consolidate on new UNIX servers, the cost to consolidate on new Intel servers running Windows 2000 Server or Windows Server 2003, or the cost to consolidate onto specially configured servers running Windows 2000 Datacenter Server or Windows Server 2003 Datacenter Edition. You will take advantage of new technologies such as the Windows Distributed File System, the ability to run multiple Exchange post offices on one server, or the ability to run multiple SQL Server databases together and actually reduce the number of unique OS instances necessary. This strategy is sometimes referred to as File System Consolidation, Email Consolidation, or Database Consolidation.
Single Workload Consolidation allows you to reduce the number of application and operating system instances which can result in substantial personnel savings. At the same time, the workload on each server increases, meaning that you may need to purchase larger servers and you may find it useful to run a high availability cluster to reduce single points of failure. Also, there is typically a large professional services element to an application consolidation as you reorganize and re-write the way your IT systems map to your business users.
The last consolidation strategy TCOnow! allows you to select is a Multiple Workload Consolidation where you run multiple application workloads on the same server. Traditionally, UNIX and Windows operating systems do not efficiently handle running a mixed workload (for example, file and print and email) on the same operating system instance.
However, you can use new Windows operating system features such as Windows System Resource Manager to assign capacity to different applications and improve application co-existence. In addition, Unisys has taken its legacy systems experience and introduced the concept of "processor affinity" to its Windows line of servers - you can use "processor affinity" to effectively create partitions within a Unisys server that are NOT isolated from each other and that share one operating system instance. Although Windows Systems Resource Manager works on hardware from other Windows OEMs such as Dell, HP, and IBM, TCOnow! does not suggest running mixed workloads in a high volume production environment - instead a Server Virtualization strategy such as Windows Virtual Server (coming soon) is more appropriate. Multiple Workload Consolidation in a single operating system instance substantially reduces personnel costs and can allow you to consolidate almost all of your work onto a very small number of servers.
If you choose Multiple Workload Consolidation, you can choose to compare your current server environment to Unisys Windows 2000 Datacenter Server or Unisys Windows Server 20003 Datacenter Edition environment.
One has to recognize that server requirements will largely be driven by:
In other words, there are a great number of factors that must be taken into account when configuring servers. However, in much the same way that the horsepower of an automobile ultimately determines the performance of a car it is reasonable to expect that a similar measure will be available for servers. For non-car buffs, horsepower is simply one measure of the power output of an engine and usually determines much of a car’s performance, up to 100 miles per hour, when aerodynamics kick in as a larger factor. There are some pretty simple performance metrics that can be applied across the automobile industry in general that allow a customer to look at a few stats and get a pretty good buyer’s comparison. Since many servers cost several times the cost of even the most exotic car one can surmise that the same type of comparison shopping data will be available for configuring a computer system.
The performance tests or benchmarks that commonly are used for servers include:
TPC
SPEC
SAP
Oracle
Notes
LINPACK
ICOMP
All of these different benchmarks can be extremely useful. However, when vendors report public results one has to bear in mind that the results:
Benchmarks differ based on how output changes as the size of the server changes. Server size can be thought of as the number of processors, namely:
In a perfect world, growth in server size would relate directly to growth in server output. In other words switching from 2-way to a 4-way would result in a doubling of output. Likewise switching from a 2-way to a 16-way would mean an eightfold increase in output.
In a simplified example of server scaling a 1-processor server would produce 1000 units of output at top performance. A 2-processor server would output 2000 units of performance and a 72-processor server would come it at 72,000 units of output. In other words, there is an expectation that performance grows directly with the number of processors on a 1:1 ratio. However, in the real world no server scales at 1:1 or in a perfectly linear fashion. Each new processor added to your server gives only a partial increase in performance. Most commercial IT workloads show a scaling curve. In contrast to a single scaling factor, a scaling curve is a collection of scaling factors. As you keep adding new processors, the scaling factor decreases. In other words, adding new processors always runs into the problem of diminishing returns.
Imagine a second workload that shows diminishing returns:
The scaling factors in this workload change substantially as the number of processors increases. The scaling factor from 1-way to 2-way is .97 but the scaling factor from 1-way to 72-way is only .36.
If we create a graph of our first workload showing the output per server related to the number of processors, we would see:
If we create a graph of our second workload showing the output per server related to the number of processors, we would see:
The key point to remember is that the different benchmarks have different scaling curves. As a result if you select the wrong benchmark for configuration purposes it is a foregone conclusion that you either will severely under- or over-configure your servers.
The ability of a workload to scale depends largely on two factors:
Workloads by definition carry out a very specific type of transaction, e.g., online purchases, file serving, loan approvals etc. A particular transaction typically can be described as CPU-intense or input-output (IO)-intense. A CPU-intense transaction is limited by the amount and power of the processors on your server. In other words, the transaction mostly involves processing happening in the CPU. Therefore, the number/speed of transactions will grow as soon as you add new CPU power. However, the CPU needs information to do its calculation and it needs to return an answer. After all, there is no point to carrying out an activity that has no input or output. To return to our automobile analogy for a moment, a CPU-only application would be like having an exotic sports car with no wheels, no steering, and no brakes. Excellent examples of CPU-intensive workloads would include:
The two most common CPU-intense benchmarks are SPECint2000 and SPECfp2000. However, before applying these benchmarks to your workload it is worth recalling that each benchmark is designed for specific applications.
SPECint2000 simulates:
MeanwhileSPECfp2000 simulates:
The common characteristics of all these seemingly disparate applications are:
All of these characteristics ultimately translate into a very parallel computing environment. The chart below details the scaling associated with a truly parallel application.
Answering that question goes a long way in being able to select which benchmark one should be using for server evaluation purposes. Your workload will become less CPU intensive, as you require:
As your workload involves more data sharing, data locking, and sequential work, your processing performance becomes limited by the ability of the CPU to retrieve and distribute information.
Once your application is constrained by data transfer considerations, it becomes input/output (I/O) intensive. An I/O intense workload is limited by the ability of the CPU to get access to the data necessary to carry out its calculation. In other words, the transaction involves very little processing happening in the CPU. Instead, the transaction involves moving information back and forth.
I/O intensive transactions suffer from diminishing CPU returns. In fact I/O workloads exhibit the traits of:
I/O intense applications largely move and manage data. Examples include:
An I/O intense transaction can be described as a serial transaction. In other words, each step happens one after the other instead of in parallel. Serial transactions require internal server bandwidth and large local RAM memory to move data around. Serial transactions are common when your data is unstructured–ad-hoc queries, online bank account access, etc.
The most common published benchmark, TPC-C, is of a relatively I/O intense OLTP workload. TPC-C simulates a warehouse system and includes:
It is characterized by:
Serial workloads by nature have a large need for hard disk access since the transaction involves searching, processing, and manipulating information stored in outside tables. There tends to be a large amount of shared data since each CPU needs to access and manipulate the same exact piece of data, which means that the CPU needs to wait until a different CPU is done. Data locking is required since each transaction must have a number of verification steps (did the transaction finish, is this the right account, etc) that must take place before another CPU can work on the same piece of data. As a result, in a very heterogeneous server environment not only must each CPU wait for information, but your server must also gather data from and return results to a relatively unstructured back-end database or transaction monitor. The chart below depicts the scaling efficiencies associated with a highly serial application.
Unfortunately, server performance is related to both the number of processors and type of processor used. To make matters difficult, there is no single metric that can be used to compare processors sold by different vendors or even processors sold by the same vendor but with different architectures. Many market participants use CPU clock speed (frequency) as a proxy for server performance but the following analysis will show how this is inadequate and can even be misleading. Instead, evaluating processor performance can be thought of as a smaller version of evaluating server performance and comparing how many calculations can be done against how much data can be moved. Comparing processor performance requires looking at the following factors:
Although not a valid proxy for performance, CPU clock speed is still an important element of processing ability. Holding everything else constant, comparing the CPU clock speeds of two different processors from the same processor family will give a rough comparison of how much work the CPU itself can do. A higher clock speed, for example 1600 MHz vs 900 MHz, will typically deliver higher performance. However, just the same way that an application is useless if it is not transferring data to and from a server’s processors, a CPU is useless if other parts of the processor do not move data to and from it. This effectively limits the gain you can get from increasing processor clock speeds and makes processor elements such as a processor cache size, system bus capacity, or addressable memory size more important. It also prevents making a useful CPU clock speed comparison between processors of two different architectures or processor families.
While increasing the number of calculations a CPU can perform is one solution to increasing performance, another possibility is to increase the amount of cache, or memory, that is on the processor itself. This cache can be used to store data very close to the processor and so reduce the amount of time a CPU needs to wait to get new information and perform calculations. Processors can have from 1 to 4 levels of cache – this refers to how close to the CPU the cache actually is and therefore by how much it improves performance. Increasing cache can have a substantial effect on processing performance, especially for very memory intensive applications such as Web serving and databases. A full discussion of processor design would take its own 70+-page report, but it is important to briefly note the type of cache used by RISC (Reduced Instruction Set Computing) vendors such as Sun, HP, or IBM and in chips by Intel.
RISC chips vary quite a lot from vendor to vendor but all share a few key cache characteristics: they typically have 96kb to over 2.25MB of Level 1 cache and 1-8MB of Level 2 cache. For example, Sun’s UltraSPARC III processors (used in most of their entry level servers and all of their workgroup, midframe, and high-end servers) have a Level 1 64kb instruction cache and a Level 1 32kb data cache as well as 1-8MB of Level 2 cache. HP’s PA-RISC architecture used in their rp2430 through SuperDome servers has a Level 1 0.75MB instruction cache and a Level 1 1.5MB data cache. In contrast, IBM’s POWER4 and POWER4+ architecture uses 2-processor Multi-Chip-Modules that share resources between the 2 CPUs and have a Level 1 128kb instruction cache, a Level 1 64kb data cache, 1.5MB of Level 2 cache, and 32MB of Level 3 cache.
Intel Xeon servers have a Level 1 16kb instruction cache, a Level 1 16kb data cache, and 512kb of Level 2 cache. Intel Itanium II servers, which are out of the scope of this TCO analysis, have a Level 1 16kb instruction cache, a Level 1 16kb data cache, 256kb of Level 2 cache, and 3MB of Level 3 cache.
This information is not provided to allow you to create substantive comparisons between different RISC architectures or between Intel and RISC cache sizes but to understand that they are not standardized and that differences in cache sizes and types precludes the simple “16 Sun processors are the same as 16 Intel processors” type of analysis seen in some TCO white papers.
The system bus capacity is also of crucial importance since the system bus is the main “thoroughfare” used to transport data to and from the CPU. RISC, Intel Xeon, and Intel Itanium II processors have very different bus capacities. The system bus capacity is related both to the system bus speed as well as the size of the bus itself. Increasing either the system bus speed or size can have very important performance implications. For example, moving from an Intel Xeon 400MHz system bus to an Intel Xeon 533MHz system bus provides a 12% performance improvement in overall server performance. Similarly, doubling the size of the system bus when moving from a 32-bit Intel Xeon architecture to a 64-bit Intel Itanium II architecture improves the system bus capacity by a factor of 2 (although performance may be increased by somewhat less).
Probably the most important difference between 32-bit and 64-bit processors is the difference in addressable memory. As CPU processing ability increases, memory becomes a more important limiter of performance. A 32-bit architecture such as the Intel Xeon architecture can only address a 3GB natively and use operating system extensions to address a maximum of 64GB of memory. Most 32-bit servers address much less memory and using operating system extensions causes overhead. In contrast, a 64-bit address space allows you to address much more memory. Intel Itanium II servers can address theoretically up to 16TB of memory but most OEMs limit you to 8-12GB per processor. Different RISC server OEMs have chosen different memory limits depending on their particular technical implementation. For example, Sun limits its servers to 4-8GB of memory per processor, HP sets a 4GB per processor memory limit, while IBM allows 16GB of memory per 2-CPU module (or 8GB of memory per processor).
The last major difference between processors is floating point performance. Floating point performance is very important in technical computing and scientific applications that carry out a large number of numerical calculations. It is less important in traditional business applications where the CPU must typically process business logic (yes or no, etc). The current 32-bit Intel Xeon processor does not have any special floating-point execution abilities. In contrast, the Intel Itanium II and all newer RISC processors have dedicated floating-point components that are specially designed to carry out floating point computations. This means that while Intel Xeon floating point (high performance) is roughly equivalent to Intel Xeon integer (traditional business) computing, Intel Itanium II, Sun UltraSPARC, HP PA-RISC, and IBM POWER4 are substantially faster at floating point execution than integer processing.
A full analysis of processor families would cover the way that memory and disk is accessed, within-processor communication speeds, graphics accelerator modules, the difference between Level 1, Level 2, and Level 3 cache, and a host of other factors and could be its own 70+ page report. However, since TCOnow! is a total cost of ownership comparison tool and not a processor feature/function comparison tool, you will have to account for these differences yourself.
Lest the above discussion dissuade you from ever wanting to try to compare processors, you should be aware that you can make some basic within-family comparisons. TCOnow! for Windows Server Optimization makes CPU clock speeds for comparisons for two general processor families:
You may also be curious to see the effect of CPU clock changes on some processor families that are not in the scope of this TCO analysis but are becoming more interesting to IT shops such as:
64-bit RISC architectures are very efficient at converting a higher CPU clock speed to faster processor performance. As a general rule of thumb, 93% to 95% of the percentage increase in CPU clock speed will follow through to the processor’s performance. For example, if you were to replace Sun UltraSPARC III 1050MHz processors with 1200MHz processors, a 14% increase in CPU clock speed, you would expect to see a 13.5% increase in performance.
In contrast to RISC architectures, Intel’s 32-bit Xeon architecture is somewhat less efficient at converting CPU clock speed increases into processing improvements. In fact, as a general rule of thumb,
system performance increases by only 15-50% of the percentage increase in processor speed. The more you increase your CPU clock speed, the less your performance will increase.
The Intel Itanium II processor uses an architecture that has been redesigned from the ground up. This means that at least between the two existing Itanium II processors, there is quite a good tie between faster CPU clock speed and faster performance. Moving from an Intel Itanium II 900MHz processor to an Intel Itanium II 1000MHz processor provides a 20% performance boost.
Intellectually interesting as all this may be, how does it translate into sizing a server? In order for us to size our servers correctly, we need to determine how CPU intense versus how I/O intense our application is. Once we know our application characteristics, we then can turn our application requirements (number of users, Web pages per second, etc) into infrastructure requirements (number and size of servers).
The discussion of CPU-intense versus I/O intense easily can become complicated and never-ending. Imagine instead that we have a range of CPU-intense and I/O intense workloads. They can be ordered by degree of CPU-intensity. Doing so also will reveal what applications are very parallel and what applications are very serial.
Let’s create a range of 1 to 25, with 1 equaling very CPU-intense and 25 equaling very I/O-intense. We then can use this scale to rate our application, by examining the following questions:
You then can use your application rating to place yourself on a performance curve. For simplicity purposes, CIOview uses two performance curves to bound our performance estimations. One curve is very CPU-intense and represents our estimate of scaling for a very technical computing workload. This curve represents an application rating of 1. The second curve is very I/O-intense and represents our estimate of scaling for a workload bound almost entirely by I/O considerations. This curve represents an application rating of 25. Your application rating will place you somewhere between these two curves.
Once you estimate where you fall between a CPU-intense performance curve and an I/O-intense performance curve, you can determine the best size of server to deploy (2-way, 4-way, etc). You typically would choose a server size that shows good scaling for your workload. To determine the output of your server you could use a 1-processor server as a reference to determine the potential output of your chosen server. And then finally, you would know the number of servers necessary.
Knowing the characteristics of the workload you are going to deploy is key. One then needs to identify where that workload falls on the serial/parallel continuum. Having done that one must select the performance benchmark most appropriate to your workload, take into consideration how your application scales and then size accordingly.
What does this have to do with my Windows Server Optimization project?
Trying to determine whether an application is serial or parallel can be quite daunting. Thankfully, this TCO analysis looks at four distinct applications in your general IT environment: file and print, email/communication, database, and line of business application. Because these applications have relatively well-defined parameters, CIOview has already placed on a performance curve for you.
File and print is very disk-intensive. In other words most of the application processing involves moving data from disk to disk and across the network. This sort of work is very I/O intense and is not limited much by CPU performance. In fact, most file and print applications do not scale much beyond 2-4 CPU per server.
While line of business applications can vary quite a lot in purpose, they are all typically very similar to file and print applications in terms of processing characteristics. Therefore, TCOnow! uses the same performance curves for both file and print and line of business applications.
On a scale of 1-25, file and print and line of business applications can be thought of as a 25 (very serial).
There are published benchmarks showing very high file and print performance on very large servers but the fine print shows that these benchmarks typically use so much disk that you are basically paying for storage and not for a server. Also, benchmarks do not delve deeply into the effect of network traffic and network use on performance.
Another important point to remember when looking at file and print or line of business applications is that because these applications involve a large amount of client-server communication, improvements in the client or the communication protocol can substantially improve performance. For example, if all of your users are on Microsoft’s newest Windows XP client, then Microsoft Windows Server 2003 can scale effectively up to 4 CPU and even somewhat to 8 CPU. While this is a clear improvement over the 1-2CPU scaling seen in Microsoft Windows 2000 Server, you cannot gain this performance boost unless most of your users use Windows XP. Considering that many organizations have 2-5 year client upgrade cycle, the average organization may not benefit from this improvement and so TCOnow! does not include improved client communication in its default server sizing.
The following graph shows the processor performance models CIOview uses to estimate file and print and line of business application server performance on Sun UltraSPARC, HP PA-RISC, IBM POWER4, and Intel Xeon (Dual Processor for 1-2 CPU and Multi Processor for 4-8 CPU) based servers. The graph will also show the difference in performance between the fastest available Intel processors and the slowest available Intel processors.
Please remember that this graph assumes that your processors are utilized at 100% average prime shift utilization rates. This is obviously impossible – a much more likely UNIX average prime shift utilization rate is 15%-25% while analysts and customer studies have shown that a typical average prime shift utilization rate in a consolidated Intel environment is roughly 12%-15%. Please also remember that this graph makes certain assumptions about CPU clock speeds within each processor family and the amount of RAM accessed and does not represent the entire range of CPU clock speeds and RAM combinations you can choose.
In contrast to file and print, email is typically a more CPU-intense application. Most of the email application’s processing is devoted to your email datastore that functions very similar to a database. However, there is also quite a large amount of client-server communication that makes your email application somewhat less CPU intense. You will also find that most Intel email applications such as Microsoft Exchange or Lotus Notes do not scale well beyond 4 CPU. In a UNIX environment, you can typically run your email application on a larger server but can often not gain much better performance past 16 CPU.
On a scale of 1-25, email can be thought of as a 5 (parallel).
There are published benchmarks showing higher performance but these benchmarks typically show a server that has been tuned to run the application and typically is overloaded with disk to improve performance.
Although some client side caching and communication protocol improvements can improve performance at 8 CPU, this benefit only applies if all of your clients are Windows XP.
The following graph shows the processor performance models CIOview uses to estimate email server performance on Sun UltraSPARC, HP PA-RISC, IBM POWER4, and Intel Xeon (Dual Processor for 1-2 CPU and Multi Processor for 4-16 CPU) based servers. The graph will also show the difference in performance between the fastest available Intel processors and the slowest available Intel processors.
Please remember that this graph assumes that your processors are utilized at 100% average prime shift utilization rates. This is obviously impossible – a much more likely UNIX average prime shift utilization rate is 45%-65% while analysts and customer studies have shown that a typical email average prime shift utilization rate in a consolidated Intel environment is roughly 15%-25%. Please also remember that this graph makes certain assumptions about CPU clock speeds within each processor family and the amount of RAM accessed and does not represent the entire range of CPU clock speeds and RAM combinations you can choose.
While file and print and email applications follow relatively standardized data and usage patterns, database applications can differ substantially in the amount of partitionable work vs shared work and the extent to which performance is limited by the CPU vs performance that is limited by memory or simply connections with other databases. In other words, on a range of 1-25, databases can fall anywhere from 1 to 25. There are five key questions that determine whether your database is parallel or serial. The more you answer “Yes” to the following five questions, the more serial your database:
Thankfully many databases fall within a general range of requirements and thus parallel vs serial processing. Although you can eventually drill down to the above 5 questions in TCOnow! and change the default assumptions, TCOnow! uses a collection of defaults that lead to a database performance curve resembling an on-line transaction processing application much more than a scientific computing application. The following graph shows the default processor performance models CIOview uses to estimate database performance on Sun UltraSPARC, HP PA-RISC, IBM POWER4, Intel Xeon (Dual Processor for 1-2 CPU and Multi Processor for 4-32 CPU) based servers.
Please remember that this graph assumes that your processors are utilized at 100% average prime shift utilization rates. This is obviously impossible – a much more likely UNIX average prime shift utilization rate is 65% while analysts and customer studies have shown that a typical database average prime shift utilization rate in a consolidated Intel environment is roughly 50%. Please also remember that this graph makes certain assumptions about CPU clock speeds within each processor family and the amount of RAM accessed and does not represent the entire range of CPU clock speeds and RAM combinations you can choose.
For the following cost categories, please note that subcategory costs may appear even though you may have left the corresponding category unchecked when you selected your desired cost summary. The costs listed below represent what your costs would be had you selected the option. You may refer to the table on page 4 of this report to view which categories you selected.
One of the simplest ways to determine server needs is to complete an inventory or assessment of what is currently deployed. Table 11 lists the servers in use and their respective average utilization rates. This current server environment will form the base of your costs for your Current Environment and will have a very large impact on your costs if you choose not to consolidate.
Item | Qty | Model | Utilization Rate | |
1. | Production Server 1 | |||
2. | Production Server 2 | |||
3. | Production Server 3 | |||
4. | Production Server 4 | |||
5. | Production Server 5 | |||
6. | Test Server | |||
7. | Development Server | |||
8. | QA Server | |||
9. | Backup Server | |||
10. | Other Server |
If you are consolidating your server environment using Microsoft Windows Server 2003 Standard, Enterprise, or Datacenter Edition you should take note of a number of server performance improvements that can affect your overall server requirements and total cost of ownership. Unlike client-communication related benefits, these server benefits apply irregardless of whether your clients are using Windows 2000, Windows XP, or even Windows 98 or Windows 95. There are five key performance benefits that CIOview has chosen to include in its TCO model. You may find that there are other important performance benefits relating to network traffic and directory replication that apply to your unique situation, but you should manually adjust your server requirements to account for these benefits.
The five major server performance benefits delivered by Windows Server 2003 are highlighted in the table below and explained on the next few pages. You can click on any of the five hyperlinks to be taken directly to an explanation of the appropriate performance improvement:
Improvements to server performance in Windows Server 2003 |
Improved Scheduling- 10% faster database performance on any edition of Microsoft Windows Server 2003. Hyper-threading awareness - 15% faster file and print, email, database, or domain/line of business application performance on Microsoft Windows Server 2003 Enterprise Edition and Microsoft Windows Server 2003 Datacenter Edition. Increased memory addressing - potentially a 10-20% performance improvement in file and print, Microsoft Exchange 2003, database, and line of business applications running on Microsoft Windows Server 2003 Enterprise Edition and Microsoft Windows Server 2003 Datacenter Edition. Distributed File System - Potentially 5-6x increase in file and print server consolidation on Microsoft Windows Server 2003 Enterprise Edition and Microsoft Windows Server 2003 Datacenter Edition. Volume Shadow Copy Service - Potentially 35% reduction in the number of test and development, quality assurance, and backup servers. |
Typically one of the largest brakes on performance in a database environment is not your actual CPU processing capability or even other hardware such as memory or disk dependencies. Instead, the threading and scheduling technology used to handle multiple concurrent requests such as log-ins, system processes, libraries, and data manipulation can greatly affect performance since all the parallel processing capability in the world is useless if the requests cannot be distributed to your CPUs at the same time.
Improvements in the Windows Server scheduling and threading implementation have the potential to increase your database performance by 10% to 15%. If you consolidate using Microsoft Windows Server 2003, TCOnow! will automatically adjust your database performance upwards by 10% to account for better threading and scheduling capabilities in the Windows operating system.
Intel's newer 32-bit Xeon processors are equipped with a resource optimization technology called hyperthreading. Hyperthreading allows each Intel Xeon processor to create two synthetic logical processors. Each of the two logical processors can accept requests or pass on data but must share the same physical CPU resources. While this may not seem like much since the physical resources are no greater, tests have shown that using hyperthreading can allow one logical processor to carry out CPU-intense work on the physical CPU while the other processor handles memory and I/O management and other "garbage collection" using free resources. Hyper-threading can potentially increase performance by 10% to 30% depending on how multi-threaded your application is but will typically improve Intel 32-bit Xeon performance by 15%.
Microsoft Windows NT 4.0 is not hyperthreading compatible at all since it was developed long before hyperthreading existed. Microsoft Windows 2000 is "hyperthreading-compatible" but not "hyperthreading-aware." This means that although Microsoft Windows 2000 can use both logical processors associated with a physical processors, it has no way of differentiating between a physical and logical processor. So if you have a 4-CPU server running Windows 2000 Standard Server Windows 2000 will activate the first four processors its sees - if you try to take advantage of hyperthreading then Windows 2000 will active your first four logical processors which are likely to be on only 2 physical processors and not end up using the rest of the server. The same sort of difficulty arises when Windows 2000 has to determine which processor to send work to - it may send a request to a logical processor that does not have free resources instead of a logical processor associated with a free physical processor. For these reasons, Microsoft Windows 2000 is effectively not able to take advantage of hyperthreading's performance capabilities.
In contrast, Microsoft Windows 2003 Enterprise Edition and Datacenter Edition are hyperthreading aware and can differentiate between a physical processor and a logical processor. If you consolidate using Microsoft Windows 2003 Enterprise Edition or Microsoft Windows 2003 Datacenter Edition then TCOnow! will automatically adjust your database performance upwards by 10% to account for the performance boost provided by hyperthreading.
Microsoft Windows 2000 and Microsoft Windows 2003 both use a technology called Address Windowing Extensions (AWE) that allows you to assign more than 4GB of memory to an application or to a server. IT shops are finding more and more that memory has become just as much as if not more of a limitation than CPU processing, especially in database processing. In fact, a general rule of thumb used by most Intel-hardware performance experts is that memory must increase directly in line with CPU count - so moving from a 2 CPU server to a 4 CPU server would require you to double the amount of RAM. Obviously then, taking advantage of more memory through AWE can boost performance substantially.
Microsoft Windows 2000 can support 4GB of memory using Standard Server, 8GB of memory using Advanced Server, and 32GB of memory using Datacenter Server. However, if you use Microsoft Exchange 2000 you can only access up to 3GB of memory. While Microsoft Windows 2003 Standard Edition can also only support 4GB of memory, Windows 2003 Enterprise Edition can access 32GB of memory on a 32-bit server while Windows 2003 Datacenter Edition can access 64GB of memory on a 32-bit server. If you migrate to Exchange 2003 you can take advantage of more memory. This is absolutely necessary to be able to use 8+ CPU servers in your computing environment. If you consolidate using Microsoft Windows 2003 Enterprise Edition or Microsoft Windows 2003 Datacenter Edition TCOnow! will allow you on the Servers screens to define how much extra RAM you wish to purchase and see the effect on your performance of varying your memory upgrades.
Typically, file and print server consolidation has been limited by the number of separate roots, or high-level data hierarchies, necessary. Each root might have a certain number root targets which each have their own user access rights and file information. For example, you might require a separate root target for different groups within sales, marketing, HR, manufacturing, etc (by product, geography, etc).
Windows NT 4.0 effectively required a separate server for each root or root target. Windows 2000 allows each file and print server to have up to 16 different root targets but only one root. In contrast, Microsoft Windows 2003 Enterprise Edition or Microsoft Windows 2003 Datacenter Edition allow you to host multiple roots on one server and theoretically up to 5,000 root targets on each server. This can greatly increase the efficiency of a Single workload consolidation project. If you choose to carry out a Single workload consolidation project using Microsoft Windows 2003 Enterprise Edition or Microsoft Windows 2003 Datacenter Edition then TCOnow! will help you estimate the extent to which you can consolidate your application on the User and Workload Allocation screen and associated file and print pop-up window.
One of the most time-consuming and inefficient processes in many IT environments is the backup and restore function. Backing up a production server and transferring the information to a test or backup server can take anywhere from 2 days for a 300-400GB database to 4 days for an 800GB to 1TB datastore and even more time for a larger amount of data. This effectively means that if your application development and testing staff wish to be able to run tests and scripts more than once every 2-4+ days, they must maintain a non-production environment that duplicates a very large portion of your production environment. By maintaining duplicate test, development, quality assurance, and backup servers your application developers and testers can run a test on one non-production server, use a different duplicate server for their next test, etc. until they are able to get access to the original data during your IT organization's regularly scheduled backup window. Of course, if you do not maintain enough non-production servers then you may have application developers or testers who lose a day to a full week of productive time while waiting for a new copy of the data they need to test on. This can slow application development as well as force developers to skip crucial tests as they approach a deadline that cannot be moved but need to wait for a new backup of your production information.
Microsoft Windows Server 2003 Standard Edition, Enterprise Edition, and Datacenter Edition use a technology called Volume Shadow Copy Service (VSS) that allows you to freeze an application and let your storage area network (SAN) hardware and software carry out a point-in-time backup of your application. Because Volume Shadow Copy Service freezes your application and then transfers the frozen information to your backup utility you can avoid backup issues related to files that are open as the backup is being carried out. Volume Shadow Copy Service also improves the efficiency of restores since an on-disk restore over a SAN now takes 90 seconds. VSS's ability to take point-in-time snapshots while your production server is operating means that application development and testing staff can take a snapshot of the data they need when they need it instead of maintaining a whole host of duplicate non-production servers that get refreshed only when your production servers are backed up. In a very large-scale environment this can reduce your non-production server needs by up to 35% and increase the ratio of production to non-production servers from 1:1 up to 1.5:1.
However, to take advantage of Volume Shadow Copy Service your SAN hardware and software must be Volume Shadow Copy Service compatible. Although many new SAN products are VSS-compatible you should check with your storage vendor to find out if you can take advantage of VSS.
Table 12 lists the server costs that would be required for the server consolidation strategy you have chosen. The server requirements are very much driven by the type of workload that is being migrated, as well as the average utilization rate of the existing and new servers.
Item | Delta | |||
1 | Total Allocated Server Hardware Cost |
Item | Cost Differential | |||
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 | ||||
8 | ||||
9 | ||||
10 | ||||
11 | ||||
12 | ||||
13 | ||||
14 | ||||
15 | ||||
16 | ||||
17 | ||||
18 | ||||
19 |
Your decisions on clustering influence your server requirements and therefore your total costs. TCOnow! allows you to choose between creating two types of cluster. They are:
Types of Clusters |
Active/Active Clusters: All nodes in your cluster run copies of your application. They are all active at all times (other than a node failure of course). If a failure occurs then your application processing is distributed across any free nodes in the cluster. Active/Passive Clusters: Only some nodes in your cluster run your application. These active nodes are clustered with a set of passive standby nodes that are idle most of the time. If a failure occurs then the application processing on your failed node is shifted in its entirety to one of your passive nodes. |
An efficient active/active cluster is the nirvana of clustering. After all, if you could simply take your production server environment and institute failover policies at no extra cost in server hardware, redundant software, personnel, etc, you would have a very cheap way of sharply reducing your cost of downtime. Unfortunately, there are three limits on active/active clusters that if not managed correctly can create either clusters that cannot handle the overall application or create a big hole in your IT budget. They are:
Application Availability
The first obstacle to creating a costless an active/active cluster is that your application must be able to function in an active/active state. Active/active clustering requires all clusters to share disk and application resources and for each clustered node to be able to handle any work currently being run on another node. This creates clear complications since many applications do not have a mechanism for simultaneously distributing work to many servers while making sure that no two servers are trying to work on the same piece of data. So you must determine whether your application can be tuned or adapted to an active/active state. Alternatively, you can purchase a cluster-ready software package from your ISV but this will typically be more expensive. For example, Oracle’s Real Application Clusters ability can be purchased for an additional 50% premium over your Oracle database. Applications that can be run in an active/active environment include:
You will notice that Microsoft SQL Server 2000 cannot be part of an active/active cluster.
Node Scalability and Cluster Efficiency
Once you have purchased or custom-development the code necessary to allow your application to run in an active/active cluster, you are faced with the second obstacle: node scaling. An active/active cluster requires constant communication and sharing of work across the nodes in the cluster. This is very similar in concept to a symmetric multiprocessor (SMP) server tha