Monday, April 28, 2008

DB2 Precompile & BIND Process

THE PRECOMPILER
The DB2 Precompiler does not need DB2 to run. It carries out three primary tasks as it reads the program serially, top-to-bottom, looking for DB2 delimiters.

First, if the delimiters surrounded an INCLUDE statement, the Precompiler goes to the INCLUDE library named in the job control language data definition statement and pulls the included MEMBERNAME into the program. This function is the same as a COBOL COPY MEMBERNAME, but the timing is different. COBOL COPYBOOKs get copied in at COMPILE time; DB2 INCLUDEs get copied in at precompile time. The only difference between an SQL INCLUDE and a COBOL COPY is timing. The most common item INCLUDEd in a program was (and is) a DCLGEN. DCLGENs are structures that describe a table. One DCLGEN is usually included for each table that the program will access at run time. Each DCLGEN is a two-part structure consisting of a DECLARE TABLE statement, which describes the table in DB2 SQL language, and a COBOL structure that describes the table using an 01-Level COBOL working storage structure (much like a typical copybook for a VSAM file).

Second, if the delimiters surround an SQL statement, the precompiler does a very basic syntax check to make sure that the column and table names are valid (that they're spelled correctly and that the columns and the table exist). Many DBAs and programmers think that this validation is done by reading the DB2 CATALOG, but they're wrong. Remember, the precompiler doesn't need DB2 or its CATALOG. DB2 might not even be installed on the machine. The DB2 Precompiler uses the top part of the DCLGEN to validate the SQL syntax.

The third, and most important, task performed by the DB2 Precompiler is to split the program into two parts: a COBOL and a DB2 part. All of the SQL that the programmer carefully embedded is stripped out of the program and put into its own partitioned data set (PDS) member, called a DBRM. A single program containing two languages, COBOL and SQL, goes into the DB2 Precompiler and two pieces come out. Twins, but fraternal twins — much like Arnold Schwarzenegger and Danny DeVito. Arnold looks just like his COBOL mother, and Danny looks just like his DB2 father. COBOL Arnold, with all of the SQL commented out, goes down one path in life. SQL Danny, containing only SQL, goes down a different path in life.

The twins, separated at birth, have a tendency to lose each other. To help the twins find each other later in life (in other words, at run time), the precompiler engraves each with identical tattoos. The tattoo is carried forward with COBOL Arnold, through compile and link edit, into the LOAD module in the LOAD library. The tattoo is part of the run-time executable code of the LOAD module. The same tattoo is carried forward with SQL Danny through BIND. BIND is to SQL what COMPILE is to COBOL. The purpose of COBOL COMPILE is to come up with run-time code for the COBOL. The purpose of BIND is to come up with run-time executable code for the SQL. Both sets of code bear identical tattoos (timestamps or consistency tokens).

So, the COBOL twin becomes a transportable load module in the COBOL LOADLIB and the SQL becomes a transportable DBRM in the DBRMLIB. Just as the COBOL twin had to be compiled, the DBRM twin has to go through BIND to create the run-time executable code for the DB2 portion of the COBOL program and put that executable code into the "right" DB2 subsystem.

ALL ABOUT BIND

BIND connects to the DB2 in which the program's LOAD module will run, reads the DBRM serially, and then performs three tasks.

The first of the BIND tasks is an authorization check. DB2 must make sure that the programmer has the BIND authority and the SQL authority to perform the requested SQL task (for example, updating the payroll master). When using standard authorization procedures, DB2 won't let you BIND a DBRM if you don't have the authority to execute the SQL that's in the DBRM. This is why you may have the authorization to BIND in development (accessing development tables) but don't have authorization to BIND in production, where the SQL accesses production tables. The second BIND task is a bit redundant. BIND, like precompile, must also check the syntax of the SQL, but the BIND check is more sophisticated. Instead of using the top, DECLARE TABLE portion of the DCLGEN, BIND uses the DB2 CATALOG table information to make sure that the column names are valid, that comparisons are numeric-to-numeric, and so on. This second syntax check occurs because you can't trust the one done by the precompiler because the precompiler check used the DCLGEN. You could have a DCLGEN and not have the DB2 table.

The third, and most important, BIND task is to come up with run-time instructions for the SQL in the DBRM. Each SQL statement is parsed and all of the possible (realistic) methods for retrieving the desired columns and rows from the table are weighed, measured, and evaluated based on possible estimated I/O, CPU, and SORT overhead. A ton of information is used as input to the BIND process, not just CATALOG information put there by running the RUNSTATS utility. BIND input includes, for example:

Indexes (what columns are in the indexes?)
Columns (how long is this column and how much room will it occupy in a SORT record?)
System resources (how big are the system resources, buffer pool, and RIDPOOL?)
Processors (how big are they and how many engines do they have?)
DB2 (what release is running?)
Parameters (what are the values of the BIND parameters?)
After all that input (and more) is weighed and compared, the cheapest, most costeffective access path is chosen, and the runtime instructions for that one path are created. (Interestingly, DB2 BIND sometimes generates instructions for more than one path.) This process is called optimization, and it's repeated for each SQL statement in the DBRM until all access paths are decided and the run-time instructions are created for each. As the optimizer decides on each path, writes are done to DB2.
BIND checks to see if you bound with the parameter EXPLAIN(YES); if so, it writes documentary evidence about the chosen path to the PLAN_TABLE and to the DSN_STATEMNT_TABLE for your edification.

BIND also writes a lot of information to multiple CATALOG tables, documenting the fact that the BIND did occur. In fact, the tattooed DBRM, which is not used at run time, is moved into the CATALOG. Objects chosen by the optimizer are documented in the CATALOG in cross-reference tables. And BIND parameters are recorded in the CATALOG also.

No comments: