Thursday, May 13, 2010

IDUG NA 2010, Days Two and Three

I’ve been running around kinda busy the past couple of days here at IDUG in Tampa, so I got a bit behind in blogging about the conference. So, today I’m combining two days of thoughts into one blog post.

(For a summary of IDUG Day One, click here.)

I started off day two by attending Brent Gross’ presentation on extracting the most value from .NET and ODBC applications. Brent discussed some of the things to be aware of when developing with .NET, an important “thing” being awareness that .NET is designed to work in a disconnected data architecture. So applications will not go through data a row at a time but instead send the data to the application and let it process it there. As an old mainframe DBA that caused alarm bells to ring.

I also got the opportunity to hear Dave Beulke discuss Java DB2 developer performance best practices. Dave delivered a lot of quality information, including the importance of developing quality code because Java developers reuse code – and you don’t want bad code being reused everywhere, right?

Dave started out mentioning how Java programmer are usually very young and do not have a lot of database experience. So DBAs need to get some Java knowledge and work closely with Java developers to ensure proper development. He also emphasized the importance of understanding the object to relational mapping method.

From a performance perspective Dave noted the importance of understanding the distributed calls (how many, where located, and bandwidth issues), controlling commit scope, and making sure your servers have sufficient memory. He also indicated that it is important to be able to track how many times Java programs connect to the database. He suggested using a server connection pool and to be sure that threads are always timed out after a certain period of time.

And I’d be remiss if I didn’t note that Dave promoted the use of pureQuery, which can be used to turn dynamic JDBC into static requests. Using pureQuery can improve performance (perhaps as much as 25 percent), as well as simplifying debugging & maintenance.

Dave also discussed how Hibernate can cause performance problems. Which brings me to the first session I attended on day three, John Mallonee’s session titled Wake Up to Hibernate. Hibernate is a persistent layer that maps Java objects to relational tables. It provides an abstraction layer between DB2 and your program. And it can also be thought of as a code generator. Hibernate plugs into popular IDEs, such as Eclipse and Rational tools. It is open source, and part of JBoss Enterprise Middleware (JBoss is a division of Red Hat).

John walked attendees through Hibernate, discussing the Java API for persistence, its query capabilities (including HQL, or Hibernate Query Language), and configuration issues. Examples of things that are configurable include JDBC driver, connection URL, user name, DataSource, connection pool settings, SQL controls (logging, log formatting), and the mapping file location.

HQL abstracts SQL. It is supposed to simplify query coding, but from what I saw of it in the session, I am dubious. John warned, too, that when HQL is turned into SQL the SQL won’t necessarily look the way you are used to seeing it. He recommended to setup the configuration file such that it formats the generated SQL or it won’t be very readable. John noted that one good thing about HQL is that you cannot easily write code with literals in them; it forces you to use parameter markers.

OK, so why can Hibernate be problematic? John talked about four primary concerns:

  1. SQL is obscured
  2. performance can be bad with generated code
  3. Hibernate does not immediately support new DB2 features
  4. Learning curve can be high

But he also noted that as you learn more about these problems -- and how Hibernate works -- that things tend to improve. Finally (at least with regard to Hibernate) John recommends that you should consider using HQL for simple queries, native SQL for advanced queries, for special situations use JDBC, and to achieve the highest performance use native DB2 SQL (e.g. stored procedure).

I also attended two presentations on the DB2 for z/OS optimizer. Terry Purcell gave his usual standout performance on optimization techniques. I particularly enjoyed his advice on what to say when someone asks why the optimizer chose a particular path: “Because it thinks that is the lowest cost access path.” After all, the DB2 optimizer is a cost-based optimizer. So if it didn’t choose the “best” path then chances are you need to provide the optimizer with better statistics.

And Suresh Sane did a nice job in his presentation in discussing the optimization process and walking thru several case studies.

All-in-all, it has been a very productive IDUG conference… but then again, I didn’t expect it to be anything else! Tomorrow morning I deliver my presentation titled “The Return of the DB2 Top Ten Lists.” Many of you have seen my original DB2 top ten lists presentation, but this one is a brand new selection of top ten lists… and I’m looking forward to delivering it for the first time at IDUG…

Wednesday, May 12, 2010

IDUG Tampa 2010, Day One

As usual, the North American IDUG conference is proving to be a hectic, yet enjoyable and informative time. The days are packed from morning til evening with technical sessions, networking, and running from here to there and back again.

Tuesday was the first day for normal IDUG sessions (the day-long seminars were moved to Monday this year), and the day was dominated (for me at least) by DB2 10 sessions. The spotlight session by Jeff Josten was an information-packed 90 minutes overview of DB2 10 that can only be described as drinking from a firehose. Myself and about 200 other curious attendees sat in attention as Jeff discussed the features that back up the themes of Versionn 10, which are efficiency, resilience, and growing new workloads on DB2 for z/OS.

Jeff didn’t share a GA date for the new version, nor would anyone else from IBM this week, but it has been strongly hinted that it could be before the end of the year (2010).

The biggest “thing” being touted by IBM about DB2 10 is the performance gains it delivers right out-of-the-box. Jeff discussed IBM’s performance objectives as historically being to deliver less than a 5% performance regression from release to release. But things have perked up recently. For DB2 9, most customers reported no regression or gain out of box. And the new goal is no longer containing regression, but delivering gain. For DB2 10, the expectation is that many customers will reduce CPU time 10% to 20% right out-of-the-box.

In IBM’s labs, Jeff indicated that the out-of-the-box CPU reduction numbers for traditional workloads are ranging from 5-10% and for newer workloads (e.g. TCP/IP, stored procedures) the improvement is as much as 20% in lab measurements. And when you start using new functionality, you can reasonably expect to see up to 10% CPU reduction. Of course, Jeff was careful to note that these are pre-GA numbers so things could change, even though there is no expectation that they will change.

Additionally, there is a lot of focus on scalability in DB2 10. Shops can expect to support 5x to 10x more concurrent users, up to 20,000 per subsystem. This is possible due to virtual storage relief: threads have been moved above the bar.

Jeff went on to cover a lot of additional new functionality to be delivered with DB2 10 including parellel index update during INSERT (which should speed up inserts against tables with multiple indexes), DB2’s usage of 1MB page size (z/OS) in buffer pools, multiple SQL access path and performance improvements, efficient caching of dynamic SQL with literals, LOB streaming between DDF and rest of DB2, Workfile spanned records (PBG), INSERT improvements for UTS, solid state disk monitoring and exploitation, temporal data support, timestamp data type improvements, and more.

Hash support is particularly interesting. With hashing you can get direct access to data with a single getpage instead of the multi-getpage approach of b-tree indexing. The targeted use case for hashes is for lookup of a row based upon primary key. The hashing algorithm is stored in the DB2 engine. Never fear, though, because you can still define additional indexes on hashed tables and the optimizer will understand and prefer hashed access when it is possible. (I hear the IMS DBAs out there laughing. DB2 DBAs are now going to need to understand space calculations for hash space and what collisions and overflow means.)

Next up was Roger Miller who covered DB2 10 from a database administration perspective. He began his session by referencing the extra detail that is available in the DB2 10 webcast presentation that Roger did about last month, which is available on the web.

Roger states that a lot of what is at the heart of DB2 10 is about making things easier for DBAs. And then to prove his point he talked for an hour about all of those things. Highlights included the reduced need for REORG, monitoring enhancements, hashing, and pureXML enhancements for usability, scalability, and performance.

A particularly interesting point made by Roger is that query parallelism these days is less about decreasing elapsed time and more about the ability to shuttle workload to a zIIP.

Roger also discussed the ability to skip V9 and go directly from V8 to V10. He also expressed concern that folks who choose to do this not ignore learning all about V9 when they do this. For example, RUNSTATS in V9 had key changes, so shops need to be careful to run RUNSTATS when moving to V10.

Roger also spoke about the significant changes to the DB2 Catalog and DB2 Directory in DB2 10. There are about 60 new table spaces, the links have been removed, inline LOBs are used in many places, and row level locking is used. These changes mean that online REORG works for everything in the catalog & the directory.

He also spoke about the various improvements to security administration in DB2 10. There is a new SECADM authority with no access to data and there is also a new option for DBADM without data access. Another nice new option is DBADM authority for every database in the subsystem. And then there is the ability to REVOKE without cascading, something that DB2 security administrators have been looking for for years!

Changing pace, I attended Billy Sundarrajan’s presentation on “De-mystifying JDBC Universal Drivers – for the z/OS DBA.” The reality is that more and more dynamic SQL applications are being implemented, so knowing about JDBC drivers is a necessity, not a luxury for the mainframe DBA.

Billy discussed the types of JDBC drivers and the installation issues involved. You can connect using a type 2 or type 4 driver. The Type 2 driver connects directly without DB2 Connect gateway; Type 4 driver connects thru DB2 Connect gateway.

He also discussed the benefits of setting end user variables for monitoring and the different properties that can be used for configuration.

Of course, I attended a few other sessions and spent some time at the exhibit hall and caught up with some old friends and… well, this is long enough of a post for the first day… check back tomorrow for a shorter (I promise) synopsis of day two.

Sunday, May 09, 2010

IDUG in Tampa

It is Sunday, May 9, 2010 and I'm posting a brief blog entry today to remind everyone about IDUG in Tampa this week. I will be attending (arrive Monday morning) and I will update my blog with the highlights of what is happening in Tampa this week... so be sure to check in regularly.