Tuesday, March 26, 2024

Mixing Db2 Database Administration with DevOps - Part 5: SQL Performance Testing

Although implementing database schema changes is the most important component when incorporating database structures into your DevOps pipeline, it is not the only thing to consider. It is also important to be able to analyze and optimize SQL performance within your application code.

As anyone who has written SQL knows, it is a very flexible language. There are multiple ways to write SQL queries to achieve the same results. For example, you can combine multiple tables using a join or a subselect and achieve the same results. But each SQL formulation is likely to perform differently, one better than the other. And this is but one example of the various ways you can build SQL statements to perform the same function.

The development mindset is usually to write code that matches the requirements and delivers the expected results, not necessarily to assure the best performance. Therefore, SQL performance testing should be carried out on all programs before they are migrated to a production environment. Failure to do so will likely result in poorly performing applications.

In a DevOps environment, the best approach is to measure, analyze and improve SQL statements at all stages as your code progresses from development to testing to production. The more SQL performance testing that can be accomplished by developers the earlier performance problems will be found and corrected. And that means the cost of delivering high-quality Db2 applications will decline.

However, things are not as simple as just running your program and evaluating its performance metrics. The data that you use in your test environment will not be the same as your production data. Typically, you will have less test data than you do in production. So, if you run the RUNSTATS utility on your test data you will get different statistics than in production, which means you will also get different access paths and performance results.

Setting up the test environment with production statistics and modeling the environment to mimic production is an important aspect of performance testing during development.

With the proper setup and tooling, developers can examine the access paths of their SQL statements to judge their efficiency. Of course, tools that can simplify this process are needed to speed up SQL performance testing. Such tooling should be able to capture Explain information, display it graphically and combine it with pertinent catalog statistics, store a repository of access paths by statement, compare access paths, identify changes, and make recommendations. Ideally, the tool should be integrated into the DevOps toolchain so that information is automatically captured and analyzed each time the program is compiled and bound. 

Considerations should also be made for testing specific use cases for performance. For example, consider skewed data. Db2 assumes that data values are mostly uniformly distributed throughout the data. However, not all data is uniformly distributed. RUNSTATS can be used to capture information about non-uniformly distributed and skewed data.

Another performance testing consideration is to always try multiple SQL variations, especially for queries that access a lot of data or have complex access paths. Do not just find one SQL formulation that works and stick with it. Remember that you can code multiple variations of SQL statements that return the same data, but that perform quite differently.

Tools that can help set up testing for various use cases and SQL variations will be needed for integrating SQL performance testing into the DevOps toolchain. There are a wide variety of vendors and solutions for managing Db2 for z/OS SQL performance, but I am not aware of any that have been fully integrated into the DevOps toolchain.



No comments: