Following my earlier post (http://philipflint.com/2011/12/29/optimise-xenapp-ram-and-cpu/) I have done some digging and, thanks to the hard work done by Jeremy at J House Consulting and the advice he gives (http://www.jhouseconsulting.com/2008/05/13/processor-scheduling-20), I can give what I think is a correct explanation of why the RAM and CPU are optimized in the way they are in the majority of XenApp deployment.
In a Windows 2003 32 bit environment server memory can be optimised either for System Cache or for Programs. The LargeSystemCache registry value ( found ain the registry at HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management) specifies whether the system maintains a standard size or a large size file system cache, and influences how often the system writes changed pages to disk.
Increasing the size of the file system cache generally improves server performance, but it reduces the physical memory space available to applications and services. Similarly, writing system data less frequently minimizes use of the disk subsystem, but the changed pages occupy memory that might otherwise be used by applications.
The value set for this key can be viewed in the server GUI by accessing the performance settings on the advanced tab of the system properties careen. There are two possible values for this key which are:
|0||Establishes a standard size file-system cache of approximately 8 MB. The system allows changed pages to remain in physical memory until the number of available pages drops to approximately 1,000. This setting is recommended for servers running applications that do their own memory caching, such as Microsoft SQL Server, and for applications that perform best with ample memory, such as Internet Information Services (IIS).|
|1||Establishes a large system cache working set that can expand to physical memory, minus 4 MB, if needed. The system allows changed pages to remain in physical memory until the number of available pages drops to approximately 250. This setting is recommended for most computers running Windows Server 2003 on large networks.|
The LargeSystemCache is allocated from kernel memory area, which is shared with the paged pool memory and system page table entries. In a 32 bit environment paged pool memory is limited to a maximum of approximately 650MB however much physical RAM is installed on the server. Similar limitations exist for system page table entries. Reducing either of these in a Citrix environment may lead to increased paging and poorer performance.
In a 32 bit Citrix Presentation Server environment it is recommended that memory is optimised for Applications over System Cache to ensure that the maximum amount of RAM is available for kernel operations.
In a 64 bit environment the above limits have been increased and while the LargeSystemCache entry still exists in the registry its use has been deprecated.
In a Windows 2003 32 bit environment processor scheduling can be optimised either for Programs or Background Services.
Multi-tasking Operating Systems switch from task to task using various algorithms and heuristics giving the impression that they are multi-tasking whereas they are really processing each thread in turn. The time allocated to processing each thread is determined by the CPU scheduler. The choice of scheduling algorithm can be immensely important when it comes to determining the performance of a Server based Computing system such as Citrix Presentation Server.
Changing the processor scheduling option modifies the Win32PrioritySeparation value under the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\PriorityControl key, which consists of 6 bits (AABBCC).
- Where AA =
01 – longer timeslice interval
10 – shorter intervals
- Where BB =
01 – timeslice can have variable length
10 – timeslice has fixed length
- Where CC =
00 – foreground/background processes have same priority
01 – foreground process gets 2 x boost compared to background process
10 – foreground process gets 3 x boost compared to background process
When you set the performance option for processor scheduling in the GUI, you only see two possible choices for the duration of time allocated to a process:
- Programs – Registry value is 38 decimal, binary is 100110 = shorter intervals, variable timeslice length, 3 x boost
- Background Services – registry value is 24 decimal, binary is 011000 = longer timeslice interval, timeslice fixed length, no boost
Neither of these settings are optimal for a Remote Desktop Services Server, although the Programs option is the better of the two simply because shorter timeslices are mandatory for a Terminal Server environment.
However, in a Citrix Presentation Server environment the situation changes as Presentation Server provides functionality to optimize the environment over and above that provided by RDS.
If the CPU Utilization Management Feature introduced Citrix Presentation Server 4.0 Enterprise Edition and above has been enabled (which it is by default), the 3 x boost for the default Programs value makes this feature somewhat less effective. In these circumstances the variable timeslice length is often better fixed. Under these conditions it is suggested that the optimum value may indeed be 40 decimal, 101000 binary. That gives us small, fixed length timeslices, allowing the CPU Utilization Management Feature to efficiently do its job by giving each user a fair share of the CPU by modifying the normal job priority scheduling in the operating system.
Setting the Win32PrioritySeparation value to 40 decimal will produce a message similar to the following in the Application Event Logs once the “Citrix CPU Utilization Mgmt/Resource Mgmt” service is next restarted.
Event Type: Warning
Event Source: CTXCPUUtilMgmt
Event Category: (1)
Event ID: 1591
Time: 12:34:28 AM
Windows is using a custom priority separation value and CPU Utilization Management performance may be degraded. To optimize CPU Utilization Management performance, on the Advanced tab of the System Properties dialog, open Performance Options and select Background Services. Then restart the CPU Utilization Management service.
From the above it can be seen that Citrix Best Practice when CPU Utilization Management is enabled is to set Processor Scheduling to favour Background Services. If processor performance remains a bottleneck or a very high number of conext switches are observed then it may be worthwhile changing the value of this key to 40.
In short it can be seen that the “default” settings in a 32 bit XenApp environment should be to optimize RAM for Programs and CPU for Background Services and in a 64 bit environment optimize CPU for Background Services UNLESS CPU Optimization has been disabled.