Wednesday, February 17, 2016

The Most Misunderstood Features of DB2 – Part 4: Base Table Views

Welcome to my continuing series focusing on The Most Misunderstood Features of DB2. Today’s topic is an oldie but a goodie: base table views.

What is a Base Table View?
I am going to assume that regular readers of this blog know what a view is (if not, go here). So what is a base table view? It is a view that contains all of the columns and all of the rows of the base table and nothing else. In other words, it is like a SELECT * against the base table with no WHERE clauses.

Now why would somebody create such a beast? This gets to the heart of the misunderstanding here. There is a continuing notion “out there” that it is a good idea to create a base table view for every table and to give programmers access to the base table view only, not the base table itself.

The reason given for doing so is to insulate programs from change. But this is a weak argument at best. I admit that this particular misunderstanding is less prevalent than it was in the earlier years of DB2. But I do still find shops adhering to this archaic, and poorly founded, idea.

Reasons Given for Base Table Views
One reason given by base table view proponents is that when you add a column to a table you do not have to change the program. But this is the case, too, if you code your programs appropriately, avoiding SELECT * and coding only the specific columns you need to access. Most shops do this even when using base table views.

Other reasons given revolve around removing columns or splitting tables. If you are interested in all of the reasons and rebuttals I urge you to read my article (PDF) on this topic from 1991 titled One View Per Base Table? Don’tDo It! (Wow! This is a quarter of a Century old now!)

Another View Fallacy
I recall yet another misunderstanding about views from many years ago that I have not heard in a long time. Basically, the argument was that accessing a base table view would out-perform accessing the base table.

This was never proven to me – and I do not believe it was ever true. How could it be? Static SQL against a base table view would use view merge so it would be the same as going against the base table. And dynamic SQL would have the added step of having to resolve the view to the base table. Sooo…

Summary

Base table views are not worth the time and effort because the bring no added value of any type whatsoever. Do not use them.

Wednesday, February 10, 2016

The Most Misunderstood Features of DB2 – Part 3: Nulls

Welcome to Part 3 in my on-going blog series on The Most Misunderstood Features of DB2. You can find the first two parts here: Part 1 (on locking) and Part 2 (OPTIMIZE FOR v. FETCH FIRST). Today’s topic is one that confuses many SQL developers, Nulls.

What is a Null?
A null represents missing or unknown information at the column level. When a column is set as null, it can mean one of two things: the attribute is not applicable for certain occurrences of the entity, or the attribute applies to all entity occurrences, but the information may not always be known. Of course, it could be a combination of these two situations, too.

A null is not the same as 0 or blank. Null means no information has been specified for the column and it implies that the value is either unknown or not applicable.

Because DB2 supports null you can distinguish between a deliberate entry of 0 (for numerical columns) or a blank (for character columns) and an unknown or inapplicable entry (NULL for both numerical and character columns). Null indicates that the user did not explicitly make an entry or has explicitly entered NULL for the column. For example, a null “value” in the Price column of the ITEM table in some database does not mean that the item is being given away for free; instead it means that the price is not known or has not yet been set.

Nulls sometimes are inappropriately referred to as “null values.” Using the term value to describe a null is inaccurate because a null implies the lack of a value. Therefore, it is a good practice to simply use the term null or nulls (without appending the term “value” or “values” to it).

A Few Examples 
When are nulls useful? Well, defining a column as NULL provides a placeholder for data you might not yet know. For example, when a new employee is hired and is inserted into the EMP table, what should the employee termination date column be set to? I don’t know about you, but I wouldn’t want any valid date to be set in that column for my employee record. Instead, null can be used to specify that the termination date is currently unknown.

Let’s consider another example. Suppose that we also capture employee’s hair color when they are hired. Consider three potential entity occurrences: a man with black hair, a woman with unknown hair color, and a bald man. The woman with the unknown hair color and the bald man both could be assigned as null, but for different reasons. The woman’s hair color would be null meaning presently unknown; the bald man’s hair color could be null too, in this case meaning not applicable. But there are probably better methods of dealing with the bald man’s hair color.

How could you handle bald without using nulls? You could create special values for the HairColor column that mean “bald” and “unknown.” This is possible for a CHAR column like HairColor, but what about a DB2 DATE column? All occurrences of a column assigned as a DATE data type are valid dates. It might not be possible to use a special date value to mean “unknown.” This is where using a null is most practical.
DB2 does not differentiate between nulls that signify unknown data and those that signify inapplicable data. This distinction must be made by the program logic of each application.

Keep in mind, though, that using null to indicate “not applicable” can be an indication of improper database design. By properly modeling and normalizing your data structures you can usually eliminate the need to use nulls to indicate that a column is inapplicable for a specific row.

DB2 and Indicator VariablesDB2 represents null in a special “hidden” column known as an indicator variable. An indicator variable is defined to DB2 for each column that can accept nulls. The indicator variable is transparent to the end user, but must be managed when programming in a host language (such as Java, COBOL or PL/I).

The null indicator is used by DB2 to track whether its associated column is null or not. A positive value or a value of 0 means the column is not null and any actual value stored in the column is valid. If a CHAR column is truncated on retrieval because the host variable is not large enough, the indicator value will contain the original length of the truncated column. A negative value indicates that the column is set to null. If the value is -2 then the column was set to null as the result of a data conversion error.

Let’s take a moment to clear up one common misunderstanding right here: nulls NEVER save storage space in DB2 for z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not.

DB2 for Linux, Unix, and Windows has a compression option that allows columns set to null to save space. Using this option causes DB2 to eliminate the unused space from a row where columns are set to null. This option is not available on the mainframe, though.

Every column defined to a DB2 table must be designated as either allowing or disallowing nulls. A column is defined as nullable – meaning it can be set to NULL – in the table creation DDL. Null is the default if nothing is specified after the column name. To prohibit the column from being set to NULL you must explicitly specify NOT NULL after the column name. In the following sample table, COL1 and COL3 can be set to null, but not COL2, COL4, or COL5:

CREATE TABLE SAMPLE1
 (COL1 INTEGER,
  COL2 CHAR(10) NOT NULL,
  COL3 CHAR(5),
  COL4 DATE NOT NULL WITH DEFAULT,
  COL5 TIME NOT NULL);

In SELECT statements, testing for null is accomplished differently than testing for other “values.” You cannot specify WHERE COL = NULL, because this does not make any sense. Remember, null is a lack of a value so the column does not equal anything. Instead, you would have to code WHERE COL IS [NOT] NULL. In INSERT statements NULL can be specified in the VALUES clause to indicate that a column is to be set to NULL; but in UPDATE statements you can use the equality predicate (=) to assign a column to NULL. When inserting data, if the user fails to make an entry in a column that allows nulls, DB2 supplies the NULL as the default (unless another default value exists). If an attempt to insert NULL is made against a column defined as NOT NULL, the statement will fail.

More Confusion
OK, now that we have discussed what nulls are and the basics of using nulls in DB2, let’s look at how the presence of nulls can confuse SQL developers.

The AVG, COUNT DISTINCT, SUM, MAX, and MIN functions omit column occurrences set to null. The COUNT(*) function, however, does not omit columns set to null because it operates on rows. Thus, AVG is not equal to SUM/COUNT(*) when the average is being computed for a column that can contain nulls. You must be aware of which columns can be null as well as which functions operate on columns versus rows to make sure you code correctly!

It is important to understand, too, that for comparison in a SELECT statement, two null columns are not considered to be equal. When a nullable column participates in a predicate in the WHERE or HAVING clause, the nulls that are encountered cause the comparison to evaluate to UNKNOWN. But ORDER BY, GROUP BY, DISTINCT, and unique indexes consider nulls to be equal and handle them accordingly.

Additionally, when a column set to null participates in a calculation, the result is null. This is true even if the calculation is NULL/0 (which as every math savvy person knows should be an error). But because there is a null in the calculation, the result will be null!

Let’s look at another statement that confuses a lot of people:

SELECT SUM(SALARY)
FROM   EMP
WHERE  JOBCODE > 999;

Assume that SALARY is defined as NOT NULL. Furthermore, assume that the largest JOBCODE is 500 (that is, less than the 999 we are checking for). What is the results of this SQL?

No rows satisfy the predicate. As such, many people say the results should be 0. But it is not. The result is null! Yes, you can get a null result when performing functions on a non-nullable column. The predicate resolves to the empty set and the sum of the salaries in the empty set is null because there are no employees, and hence no salaries. Therefore, you better make sure that you code a null-indicator in your program when you write code like this.

The Bottom Line
Nulls are confusing, but cannot be completely avoided in DB2. Take the time to learn how nulls work and how they impact your coding and development practices… or you will certainly make mistakes.



----------------------------------------------------------------------------------------------------------
Additional resources for understanding nulls and their usage:

Tuesday, February 02, 2016

The Most Misunderstood Features of DB2 – Part 2: Optimize vs. Limited Fetch

Welcome to Part 2 in my on-going blog series on The Most Misunderstood Features of DB2. In Part 1 of the series we tackled the topic of locking, which IMHO is easily the most misunderstood feature of DB2 (probably of most DBMSes).

Today's topic is a brief one, but one that I've found folks to be confused about. Namely, the difference between the OPTIMIZE FOR and FETCH x ROWS ONLY clauses.

The OPTIMIZE FOR n ROWS and FETCH FIRST n ROWS ONLY are similar when it comes to influencing the optimizer. Both will provide help to the optimizer about the scale of the query (that is, the number of rows to expect when optimizing the access path to the data).

But, the two clauses have different purposes. If all you want to do is influence the optimizer then you should use OPTIMIZE FOR n ROWS as its usage does not limit the number of rows that can be fetched when the statement is executed. For example, you might specify OPTIMIZE FOR 1 ROW on a SELECT statement. But if 500 rows qualify then you can FETCH all 500 rows at run time. 

(Note: the specific numbers used in the previous paragraph are not important; it is important only that you know the number of rows specified in the OPTIMIZE clause does not inhibit the number of rows that can later be accessed by the query/program.)

The FETCH FIRST n ROWS ONLY clause is different. It actually limits the number of rows that can be fetched when the query is run. If you specify the clause FETCH FIRST 5 ROWS ONLY, then only 5 rows can be fetched - even if 100 or 1000 or more actually qualify. And yes, the number of rows you specify for FETCH FIRST will be provided to the optimizer for access path formulation, but it also will change the way the query behaves when it runs. 

So, when you write a SELECT statement that qualifies a large number of rows, but you only need a few rows at run time, it is a good idea to use FETCH FIRST n ROWS ONLY. On the other hand, if you are simply trying to tweak query performance by getting the optimizer to consider a different number of qualifying rows, then you should consider using OPTIMIZER FOR n ROWS.

The two statements are quite different and it is important that developers not confuse the two!

Tuesday, January 26, 2016

The Most Misunderstood Features of DB2 – Part 1: Locking


Today I am introducing a new series of blog posts here on misunderstood DB2 features and functions. But before I start this blog post I want to emphasize that this is just my opinion. I’m sure many of you have your own ideas of the DB2 features that are most misunderstood. But please, take a moment to consider my thoughts here… and then share your own in the comments section below!

Locking!

One of the most misunderstood features of DB2 is how locking works. This is probably to be expected because database locking is a rather complex topic and it is constantly morphing with each and every new version of DB2.

First of all you have to understand the lock options available and how DB2 locks at each level: table space, table, page and row. You need to understand share and exclusive locks and that the manner in which DB2 takes them depends upon the SQL statements you issue, the program bind options you choose, and even, in some cases, the DDL options you’ve chosen.

And more. What about claims and drains? Do you know the different between locks and latches? What about the difference between lock suspension, timeouts, and deadlocks?

Then there is the whole issue of lock avoidance and what that means. I visit a lot of shops each year and I still see a lot of ambiguous cursors. Simply adding FOR READ ONLY to all of the cursors in your program where you will not be modifying the data can make a world of performance difference because DB2 can avoid locking.

There is also the frequently-ignored need for a commit strategy. Every batch program should have a commit strategy implemented whereby modifications are committed to the database and locks are released. Failing to do this causes other programs to timeout waiting for locks. I wrote about this in the past and I call it Bachelor Programming Syndrome.

Another locking issue is the whole dirty read everywhere problem. Programmers learn that they can improve performance by adding WITH UR to the end of their SQL statements in their programs. UR, Uncommitted Read – aka dirty read, causes DB2 to not take locks. But that can cause problems because the program is reading uncommitted data – and that data it is reading may never actually exist in the database (it could be rolled back) and may not be consistent because it is not reading only committed units of work. I’ve visited shops that use WITH UR everywhere and they insist that they have not encountered any problems. To which I always add, “…yet.”

Then there are the new features added in the last few releases of DB2, including improved lock efficiencies when accessing the DB2 Catalog, the ability to access currently committed data, optimistic locking, skipping locked data, and more.

Summary

Yes, it is easy to misunderstand locking, and many do so. Taking the time to bolster your knowledge of how DB2 locks, the locking options available, and how they work can do you and your shop a world of good.


And be sure to check back here for future blogs about misunderstood DB2 features!

Monday, January 11, 2016

The Impact of Data Volume on Operational Databases

Operational databases are growing in size for many reasons, not the least of which is the growing importance of big data and analytics projects. There is the overarching trend of more and more data being generated every year. But also, there is the growing need to store more data for longer periods of time due to regulatory and compliance issues. Some organizations and business have encountered the need to store certain types of data for 100 years or more (as this video and this storage project point out).

But I doubt that I really need to convince you that your databases are growing in size. Most DBAs experience the reality of data growth every day.

As data volumes expand, it impacts operational databases in two ways:
  1. additional data stresses transaction processing and can cause performance slow downs, and;
  2. database administration tasks are negatively impacted.
In terms of performance, the more data in the operational database, the less efficient transactions running against that database tend to be. Table scans must reference more pages of data to return a result. Indexes grow in size to support larger data volumes, causing access by the index to degrade because there are more levels to traverse to return an answer. Such performance impacts are causing many companies to seek solutions that offload older data to either reference databases or to archive data stores.

The other impact, database administration complexity, causes longer processing time and outages to perform such functions as backups, unloads, reorganizations, recoveries, and disaster recoveries.  The larger the underlying data sets for your tables and table spaces, the longer it takes to run administrative utilities for them. In many cases the lengthened outages can become unacceptable, causing companies to again seek ways to lighten up the operational databases... or perhaps acquire next generation utility technology that understands the reality of large DB2 database objects.

But even though we want to keep all of that additional data, there is no reason that it necessarily has to be stored in operational databases that run the business. For many reasons, you probably want to separate active data from historical data. 

Some companies create purge jobs for all (or many) of their tables to remove data from the production databases as it ages. This can be an acceptable approach to reduce the size of your operational databases. But it also means that the data, which you might want to keep for analytical purposes, is lost. Another approach is to archive the data. Archiving data and purging data are two different processes. When data is purged, it is removed from the operational database and discarded. But archived data is removed from the operational database and maintained in an archive data store. The archive might be a flat file, another relational table or to HDFS using Hadoop.

The bottom line is that it makes sense for us, as DBAs, to keep any eye on the size of our operational databases and take action when production workload is impacted.  

Monday, January 04, 2016

A Lot of Extraneous Data Sets?

In a recent blog post here I talked about a quick and dirty method of converting your partitioned table spaces from index-controlled to table-controlled. If you haven't read that post, take a moment to click over and read it here: Easily Convert to Table-Controlled Partitioning.

The reason I bring this up today is that I received an interesting e-mail from a long-time friend and DB2 DBA who read the post and had some information to share with me. He told me about how his organization used one of my tips to drop unused indexes as part of this process.

He told me that during the conversion process they dropped a lot of the clustering indexes because they weren't being used for access paths or for uniqueness. And they were able to release an "astonishing 4,100 data sets" by doing so!

Now I am not suggesting that every shop will be able to experience a similar savings, but if you have indexes that have no purpose other than enforcing index-controlled partitioning, it is time to bite the bullet and drop those indexes as you convert to table-controlled-partitioning (and then on to Universal table spaces).

And when you convert, please drop a note here on the blog to let us know how your conversion efforts went!

Tuesday, December 15, 2015

Happy Holidays!

Well, it is that time of year again. The days are shorter and the weather is colder... even if it isn't as cold as normal it is colder than it was in July! And most people are taking the time to celebrate the holiday season. 

Here's wishing each and every one of my readers a happy holiday... regardless of your chosen season to celebrate! Whether you celebrate Chanukah, Christmas, Kwanzaa, the Winter Solstice, Saturnalia, or just the end of another year on Planet Earth, I'm with you, and celebrating my good fortune, great family and friends, and you, my regular blog readers. I appreciate and thank you all...

This will be the final post of the year (2015) for this blog, but be sure to join me again next year - 2016 - as we continue to examine all aspects of everybody's favorite DBMS... IBM's DB2...

Tuesday, December 08, 2015

Easily Convert to Table-Controlled Partitioning

Up through DB2 V8 for z/OS, the only way to control partitioning of DB2 table spaces was by using a clustering index that specified the range of key values for each partition. With V8, though, DB2 adds the ability to specify the partitioning criteria in the CREATE TABLE specification. This is known as table-controlled partitioning and it is the preferred method for creating (non-Universal) partitioned table spaces. With table-controlled partitioning you can cluster on a different column (or set of columns) than you are partitioning on. Furthermore, you can make changes such as dropping a partitioning index or creating a table in a partitioned table space without defining any indexes at all.


But given the long history of DB2, many existing partitioned table spaces are index-controlled. 

Fortunately, there is a quick-and-dirty technique that you can use to easily convert from index-controlled to table-based partitioning. Simply follow these steps:

  • Identify the index-controlled partitioned table space you wish to convert
  • Convert the clustering index on the table to NOT CLUSTER using ALTER INDEX. (Alternately, you could drop the clustering index, but I wouldn’t recommend that unless you no longer need that index at all.)
  • Convert the index back to CLUSTER, again using ALTER INDEX
Voila! DB2 will have converted your table space to table-controlled partitioning.



Note: DB2 will also convert from index-controlled to table-controlled partitioning if you use ALTER TABLE to add a new partition, change a partition boundary, or rotate a partition to last on an index-controlled partitioned table space. But these are more intrusive methods than simply altering the index from clustering to non-clustering and back again.

Wednesday, November 25, 2015

Happy Thanksgiving 2015

Every year this week those of us in the USA take time out to give thanks for all that we have. We do this by taking time off of work, gathering with our families, eating turkey (and a lot of other stuff), and watching football.

It is one of my favorite holidays as it offers most of the joys of Christmas without many of the trappings.

So with this in mind, I'd like to wish all of my readers -- whether you reside here in the USA or anywhere in the world -- a very Happy Thanksgiving. Take some time to reflect on your good fortune... consider what you might be able to do to help others achieve success... and relax a bit and enjoy yourself...



We can talk about DB2 and databases again in December!

Wednesday, November 18, 2015

Midwest DB2 User Group (Dec 4, 2015)

Just a short blog post today to promote my upcoming speaking engagement at the Midwest DB2 User Group, in Chicago. If you are in or around the Chicagoland area on December 4th, 2015 I invite you to stop by and participate in the meeting!

The meeting starts at Noon and a free lunch is provided. I will be giving one of the 3 presentations that day. My presentation is titled Database and DB2 Trends circa 2015 - An overview of an industry in transition.... This is an ever-changing presentation that I have delivered on several occasions in the past, but not in exactly the same way. This pitch provides an overview of the transformation of data management over the course of the past few years. I discuss Big Data, analytics, NoSQL, and their impact on the modern data architecture and DB2 for z/OS in particular. 

But that is not the only highlight of this event. Sheryl M. Larsen, now with BMC Software, will regale the group with the results of BMC's Annual Mainframe Research Survey. BMC started their mainframe survey ten years ago as a way to gain insight into the issues and challenges facing mainframe customers. And it always contains a lot of useful information and details for those of us in the business of mainframe computing.

The third speaker is Tim Lenahan, who I've been told will be presenting something a little bit different this time around. And having heard Tim speak in the past, I'm looking forward to what he has to say now!

So if you are going to be near Chicago in early December, register and attend the MWDUG meeting. I'm sure it will be worth your time!

Hope to see you there!

Friday, November 13, 2015

A Quick and Dirty Guide to DB2 Performance

All performance problems are caused by change. Now as regular readers know, normally I avoid all-encompassing terms like all, always, none and never, but in this case, all is the appropriate term. If your DB2 environment is humming along and things are performing well, then the only thing that will disrupt that is a change of some form.

Change can take many forms, not just a program change. Consider the following ways in which change can be introduced to your DB2 world:
  • Physical changes to the environment, such as a new CPU, additional memory, new disk, or different storage technology (e.g. SSD).
  • Changes to system software, such as a new release of a product (for example, QMF, CICS, or GDDM), the alteration of a product (for example, the addition of more or fewer CICS regions or an IMS SYSGEN), or a new product (for example, implementation of DFHSM). 
  • And let's not forget the installation of a new release or version of DB2, which can result in changes in access paths and the utilization of features new to DB2. 
  • An updated or  new version of the operating system can also cause DB2 performance problems.
  • Changes to the DB2 engine from maintenance releases, which can change the DB2 optimizer and other facets of DB2's operation.
  • Changes in system capacity. More or fewer jobs could be executing concurrently when the performance problem occurs.
  • Environment changes, such as the implementation of dynamic SQL, web-based programs or the adoption of Data Sharing.
  • Database changes. This involves changes to any DB2 object, and ranges from adding a new column or an index, to changing table space types, to dropping and re-creating an object.
  • Changes to the application development methodology, such as usage of check constraints instead of application logic or the use of stored procedures.
  • Changes to application code.

So regardless of how it may seem at times, performance problems are not caused by magic. Something tangible changes, creating a performance problem in the application, database, or subsystem. The challenge of tuning is to find the source of the change -- the root cause if you will, gauge its impact, and formulate a solution.

Although the majority of your problems will be application-oriented, you will frequently need to explore the tuning opportunities outside of SQL and programming changes when application tuning has little effect.

The following is a quick reference of the possible tuning options for each environment:

To tune z/OS
  • Change WLM parameters to enhance throughput for DB2, its allied agent, and/or programs.
  • Modify swappability.
  • Add memory.
  • Upgrade CPU.
  • Add zIIPs and tune SQL/program to take advantage of zIIP processors.
  • Implement data sharing.
  • Use an active performance monitor (enables tuning on the fly).

To tune the teleprocessing environments
  • Change the system generation parameters.
  • Tune the program definition (PSBs and PPT entries).
  • Modify the Attachment Facility parameters.
  • Add or modify resource parameters (e.g. RDO)
  • Use an active performance monitor (enables tuning on the fly).

To tune the DB2 subsystem
  • Modify DSNZPARMs, for example to increase log buffers, increase or decrease the number of concurrent users, change lock escalation, and so on.
  • Issue DISPLAY and ALTER BUFFERPOOL commands to review buffer pool usage characteristics, change buffer pool sizes, and increase or decrease buffer pool thresholds.
  • Tune the DB2 Catalog, including dropping and freeing objects, executing MODIFY, reorganizing DB2 catalog table spaces and indexes, recovering the DB2 catalog indexes, building additional indexes on the DB2 catalog, tuning disk usage, and implementing data set shadowing.
  • Perform DSNDB07 tuning.

To tune the DB2 database design
  • Modify the logical and physical model.
  • Modify and issue DDL.
  • Execute ALTER statements.
  • Ensure that proper parameters are specified.
  • Implement table changes.
  • Partition data into universal PBG table spaces.
  • Spread non-partitioned objects over multiple devices using PIECESIZE.
  • Add indexes.
  • REORG tablespaces.
  • REORG or RECOVER indexes.
  • Consider or reconsider data compression.
  • Denormalize the database design.
  • Consider adding redundant tables.
To tune programs
  • Perform SQL tuning.
  • Tune the high-level language (such as COBOL or 4GL).
  • Use a program restructuring tool.
  • Run RUNSTATS.
  • Execute EXPLAIN, modify your code, and REBIND.
  • Examine the REOPT clause and whether it can help your specific code.
  • Consider using SQL tweaks and/or hints.
  • Use the OPTIMIZE FOR n ROWS clause.
  • Consider activating query parallelism.
  • Change locking strategies.
  • Change the DB2 catalog statistics and REBIND.
  • Use a testing tool to provide what if testing and tuning.
  • Use a tool to sample the application's address space as it executes.

Of course, this is not an exhaustive list and should only be used as a guideline for the

types of tuning options available to you.

Hopefully this high-level overview of DB2 performance and various tuning options at your disposal has given you some food for thought as you tackle the task of keeping your DB2 environment operating smoothly and efficiently... 

Monday, November 02, 2015

IBM Insight 2015 Wrap-Up

Last week I attended the IBM Insight conference and blogged about the first few days of the conference here at http://db2portal.blogspot.com/2015/10/an-update-from-ibm-insight-2015.html… and I promised to blog about the remainder of the conference, so here is a synopsis of the highlights.


On Wednesday, the focus of the general session was on IBM’s acquisition of The Weather Company’s technology.  The deal calls for IBM to acquire The Weather Company’s B2B, mobile and cloud-based web properties, including WSI, weather.com, Weather Underground and The Weather Company brand. IBM will not be acquiring The Weather Channel television station, which will license weather forecast data and analytics from IBM under a long-term contract. IBM intends to utilize its newly acquired weather data in its Watson platform.

The deal is expected to close in the first quarter of 2016. Terms were not disclosed.

You can read all about the acquisition in this IBM press release

I spent some of my time at Insight this year learning more about dashDB and it is a very interesting technology. Marketed as data warehousing in the cloud, IBM touts four use cases for dashDB: standalone cloud data warehouse, as a store for data scientists, for those implementing a hybrid data warehouse, and for NoSQL analysis and rapid prototyping.
IBM promotes simplicity, performance, analytics on both traditional and NoSQL, and polyglot language support as the most important highlights of dashDB. And because it has DB2 BLU under the covers IBM dashDB not only super-compresses data, but it can operate on that data without necessarily decompressing it.
Additionally, a big theme of the conference was in-memory technology, and dashDB sports CPU cache capabilities. In fact, I heard several folks at the conference say some variation of “RAM is too slow”… meaning that CPU cache is faster and IBM is moving in that direction.
The bottom line for dashDB is that it offers built-in high availability and workload management capabilities, along with being in-memory optimized and scalable. Worth a look for folks needing a powerful data warehousing platform.
For you DB2 for z/OS folks, IDAA was a big theme of this year’s Insight conference. The latest version, V5.1, adds advanced analytics capabilities and in database transformation, making your mainframe queries that can take advantage of the accelerator faster than ever.
Apache Spark was another pervasive topic this year. It was talked about in multiple sessions and I even had the opportunity to play with it in a hands-on lab. The big news for z folks is that IBM is bringing out a version of Spark for the mainframe that will run on z/OS – it is already supported on zLinux.
Of course, I attended a whole slew of DB2 sessions including SQL coding, performance and administration presentations. Some of the highlights include DB2 11 for LUW being announced, several discussions about dark data, and a lot of information about IBM's Big SQL and how it can be used to rapidly and efficiently access Hadoop (and other unstructured) data using SQL.
I live-tweeted a bunch of highlights of those sessions, too. Indeed, too many to include here, if you are interested in catching everything I have to say about a conference, keep reading these blog posts, of course, but you should really follow me on Twitter, too at http://twitter.com/craigmullins
I also had the honor of delivering a presentation at this year's conference on the changes and trends going on in the world of DB2 for z/OS. Thanks to the 70 or so people who attended my session - I hope you all enjoyed it and learned something, too!
As usual, and well-you-know if you've ever attended this conference before, there was also a LOT of walking to be done. From the hotel to the conference center to the expo hall to lunch to the conference center. But at least there were some signs making light of the situation this year! 
There was a lot of fun to be had at the conference, too. The vendor exhibition hall was stocked with many vendors, big and small, and it seems like they all had candy. I guess that’s what you get when the conference is so close to Halloween! The annual Z party at the House of Blues (for which you need a Z pin to get in – this year’s pin was orange) was a blast and the Maroon 5 concert courtesy of Rocket Software was a lot of fun, too.

If you are looking for a week of database, big data, and analytics knowledge transfer, the opportunity to chat and connect with your peers, as well as some night-time entertainment, be sure to plan to attend next year’s IBM Insight conference (October 23 thru 27, 2016 at the Mandalay Bay in Las Vegas).

Monday, October 26, 2015

An Update from IBM Insight 2015

The annual IBM Insight conference is being held this week in Las Vegas, as usual at the Mandalay Bay Resort and Conference Center. And I have the good fortunate to be in attendance.

If you follow me on Twitter (http://www.twitter.com/craigmullins) I am live tweeting many of the sessions I am attending at the conference. But for those of you who are not following along on Twitter, or just prefer a summary, here’s a quick overview of Monday’s highlights.

The day kicked off with the general session keynote which was delivered by a combination of Jake Porway (of DataKind), IBMers and IBM customers. The theme of the keynote was analytics and cognitive computing. The emphasis, in my opinion, of the event has kind of shifted from the data to what is being done with the data… in other words, the applications. And that is interesting, because the data is there to support the applications, right? That said, I’m a data bigot from way back, so it was a bit app-heavy for me.

That said, there were some insightful moments delivered during the keynote. Bob Picciano, Senior VP of IBM Analytics, kicked things off by informing the audience that true insight comes from exploiting existing data, dark data, and IoT data with agility driven by the cloud. That’s a lot of buzzwords, but it makes sense! And then executives from Whirlpool, Coca-Cola, and Twitter were brought out to talk about how they derive insight from their data and systems.

Perhaps the most interesting portion of the general session was the Watson discussion, led by Mike Rhodin, Senior VP, IBM Watson. We learned more about cognitive systems and how they get more valuable over time as they learn, which makes them unique in the world of computers and IT. IBM shared that there are more than 50 core technologies used by IBM Watson to deliver its cognitive computing capabilities. It was exciting to learn about systems that reason, never stop learning and drive more value over time.
Additional apps were discussed that let us learn about the various ways to choose wine, that nobody starts thinking about ice cream early in the week and that when the weather changes women buy warmer clothes; men buy beer and chips. You kind of had to be there, I guess!

Of course, this was only the first session of a long day. Additional highlights of the day included a high-level overview of the recently announced (but not yet released) DB2 12 for z/OS, the features that should be in a next generation database, and Gartner’s take on the state of big data and analytics. Let’s briefly address each of these one by one.
Firstly, DB2 12, which will be available in the ESP (early support program) in March 2016. There are a lot of nice new features that will be available in this new version. We’ll see a lot more in-memory capabilities which will speed up queries and processing. Buffer pools can be up to 16 TBs, even though today’s z systems can support only 10 TBs – IBM is planning for the future with that one!

And we’ll continue to see the blurring of the lines between static and dynamic SQL. How? Well, we’ll get RLF for static SQL and plan stability for dynamic SQL in DB2 12. IBM claims that we’ll be able to achieve up to 360 million txns/hour with DB2 12 for z/OS using a RESTful web API. Pretty impressive.

And there will be no skip-level migration for DB2 12... You have to migrate thru DB2 11 to get to 12.

OK, what about the features that should be in a next generation database? According to IBM a next gen DBMS should:
  • Deliver advanced in-memory technology
  • Be fast for both transactional and analytic workloads
  • Provide scalable performance
  • Be available, reliable, resilient, and secure
  • Be simple, intelligent and agile
  • And be easy to deploy and cloud-ready

Sounds about right to me!                                                                                        

And finally, let’s briefly take a look at the some of the Gartner observations on big data and analytics. The Gartner presentation was delivered by Donald Feinberg, long-time Gartner analyst on the topic of data and database systems. First of all, Feinberg rejects the term “big data” saying there is no such thing. It is all just data. He went on to claim that “big data” is perhaps the most ambiguous term out there, but it is also the most searched term at Gartner!

Feinberg also rejected the term data lake, saying “There is no such thing as a data lake, it is just a place to dump data.” He warned that it will  come back to bite organizations in a few years if they do not take the time to manage, transform, secure, etc. the data in the lake, turning it into a data reservoir, instead.

He also made the very interesting observation that BI/analytics was the number 1 priority for CIOs on Gartner’s annual CIO priorities survey and that it has been in that slot for 8 years running. But if analytics was really that much of a priority why haven't they gotten it done yet?

Of course, a lot more happened at IBM Insight today – and many more things were discussed. But I don’t want this blog post to become too unwieldy, so that is all I’m going to cover for now.

I’ll try to cover more later in the week as the conference progresses.