Parallel Processing in SAP

New generations of CPUs do not provide any significant single-thread performance improvements. Instead, the number of logical CPU cores is increasing with each new CPU generation. You can significantly reduce the runtime of a task by running sub-tasks in parallel on many CPU cores. However, splitting a task into sub-tasks consumes additional resources (CPU and memory). This overhead may or may not be significant, dependent on the used algorithm.

Therefore, you can optimize one aspect of IT system performance using parallelism: The response time. If you want to optimize the throughput of an IT system, then parallelism can be counterproductive on high-loaded systems (because of the overhead for coordinating the sub-tasks).

Parallelism on the SAP application server

Many tasks in an ERP system are already fast enough, for example recording of sales orders. Therefore, it makes no sense to parallelize such tasks (except parallelism of users by hiring more employees). In contrast to dialog processing, batch processing can highly benefit from parallelism on the application server. A typical example is the Data Transfer Process (DTP) in an SAP BW system. Parallelism is used here by default, but you can further customize it (for all DTPs or separately per DTP):

  • You can define whether parallelism is used at all
  • You can define the type (DIA or BTC) and the number of work processes
  • You can define the package size, which is the number of rows processed within a sub-task. If the total number of rows of a DTP is smaller than the package size, then only one work process is used. In this case, you could reduce the package size for enabling parallelism.

Historically, the default number of work processes used in a DTP is 3. In the meanwhile, customers have application servers with much more CPU threads than years ago. Therefore, it makes sense to adjust the number of configured work processes for existing DTPs in an SAP BW System. This can be configured in the Batch Manager settings of SAP transaction RSA1 (at Administration->Housekeeping Tasks->Batch Manager->Mass Maintenance)

parallel1

Parallelism in SQL Server

SQL Server can utilize many CPU threads at the same time for running a single SQL query. This is called intra-query parallelism. Creating the sub-tasks of such a query is done automatically by SQL Server. You do not have to define a package size, but you can control the maximum number of CPUs, which are used by an operator.

Before executing a SQL query, SQL Server creates an execution plan. This plan can be either a serial or a parallel plan. An execution plan consists of several operators (iterators). Each of them can be either serial or parallel. An execution plan with at least one parallel operator is called a parallel execution plan. A parallel operator is displayed with two arrows within a yellow circle:

parallel2

The same execution plan without any parallelism looks like this:

parallel3

The maximum allowed degree of parallelism (MAXDOP) is defined in the SQL Server configuration option "max degree of parallelism" (value 0 means, MAXDOP is infinite). You can overwrite MAXDOP for a particular query using an optimizer hint. SQL Server will not create a parallel plan in the following cases:

  • if the database server has only one logical CPU
  • if MAXDOP is 1
  • if the query optimizer does not see a benefit in using parallelism.

A parallel execution plan does not contain the number of logical CPUs used. The actual degree of parallelism (DOP) is decided at runtime of the query. It is not only dependent on MAXDOP, but also on the available system resources at query runtime. You can check the actual DOP of a query in the SQL Server DM view sys.dm_exec_query_stats. SQL Server intra-query parallelism typically decreases the runtime of a query, but it can result in varying execution times of the same query: You can configure MAXDOP, but the actual DOP may be different when running the same query again! Therefore, the actual runtime of a query is not predictable any more.

In SQL Server 2014, there is one important limitation: You can only use batch-mode operators, if the actual DOP is higher than one (This limitation has gone in SQL Server 2016). Batch-mode operators are much faster than row-mode operators. Hence, we want to make sure, that there are always sufficient resources for running the query in parallel (DOP 2 or higher). This can be achieved by restricting MAXDOP to a relative small value, for example MAXDOP 4 on a machine with 32 CPU threads.

Configuring SQL Server parallelism in SAP ERP

The SAP installation changes the value of the SQL Server configuration option "max degree of parallelism" to 1. Therefore, no parallelism is used for running a single database query (As a matter of course, you can still run many database queries in parallel). For years, we did not recommend changing this configuration for an SAP ERP system. We wanted to avoid the overhead of intra-query parallelism and keep a predictable query runtime. However, in the meanwhile, customers often have more logical CPUs available on the database server than concurrently running SQL queries. Not using intra-query parallelism would simply be a waste of CPU resources. Therefore, customers can increase "max degree of parallelism" for an SAP ERP system. See https://blogs.msdn.microsoft.com/saponsqlserver/2015/04/29/max-degree-of-parallelism-maxdop for details.

Configuring SQL Server parallelism in SAP BW

In contrast to SAP ERP, SQL Server is using intra-query parallelism for SAP BW since years. The configuration of SQL Server parallelism is much more sophisticated in an SAP BW system than just configuring a global MAXDOP value. For SAP BW, the SQL Server configuration option "max degree of parallelism" should always be set to 1. Hence, normal Open SQL commands use MAXDOP 1. However, SAP BW queries use MAXDOP 4 by default using optimizer hints. SQL Server index creation is running with MAXDOP 8 by default. The optimizer hints are controlled by SAP BW RSADMIN parameters, documented in

The RSADMIN parameters can be changed in SAP report SAP_RSADMIN_MAINTAIN:

parallel4

Unfortunately, we have seen a few cases with SAP BW queries, where the SQL Server query optimizer decided to create a serial execution plan. However, SAP BW queries are always quite complex. Therefore, we would always benefit from intra-query parallelism. The latest version of SQL Server 2016 provides an optimizer hint, which enforces the creation of parallel plans. We use this hint in the SAP BW statement generator. Therefore, all operators in an execution plan for SAP BW queries are parallel, if possible (be aware, that execution plans with a spool table never contain any parallel operator):

parallel5

To benefit from these forced parallel execution plans, you have to apply the newest SP und CU of SQL Server 2016 and the correction instructions of

The created execution plans are slightly different from normal parallel execution plans, because all operators are parallel (see NonClustered Index Seek in the picture above). However, we did not see a single case in SAP BW, where this slightly difference caused an issue.

Parallelism used in DB pushdown

SAP NetWeaver uses a single-threaded, multi-process application server. Parallelism on the application server has to be explicitly coded. Furthermore, it is often not easy (or even impossible) to divide a task on the application server in sub-tasks of similar size. Using intra-query parallelism on the database server is much easier.

More and more functionality in SAP is being pushed down from the application server to the database server. The main idea behind this is to reduce the network load (between application server and DB server). However, such a DB pushdown has further advantages. You can now benefit from intra-query parallelism without manually generating sub-tasks. An example is the FEMS-Pushdown, described in https://blogs.msdn.microsoft.com/saponsqlserver/2017/03/06/bw-queries-by-factors-faster-using-fems-pushdown. To make it clear: FEMS-Pushdown does not just use parallelism. A new algorithm improves performance beyond the usage of additional CPUs.

Conclusion

Parallelism is the key for reducing response time. However, it could result in a reduced throughput, particularly in high-loaded ERP systems. In SAP BW systems, you can increase intra-query parallelism by setting RSADMIN parameter MSS_MAXDOP_QUERY. Parallelism on the SAP application server has to be manually adjusted, dependent on the available CPU resources and the size of processed data.