Predicates in WHERE clause or JOIN clause? 

By: Alex Podlesny
Inspired by an interview question.
Or what to do when you get iced
from all 360 degrees.
Part One - Having Fun 
How easy it is to move a condition from WHERE clause to a JOIN clause? The answer is - very easy. Some of us might say that is is esthetically better, some of us would say that semantically join is for joining while WHERE clause invented to introduce predicates. While others even argue that there exist a relational algebra rule that should allows interchangeability of the predicates.
All of those points are right in their own way. Lets try to move predicates around and see what surprising features we can uncover. 
 
As a starting point we will have two tables the Product and the OrderDetails and one important stored procedure GetOrderDetails that returns order details data enriched with a relevant product information.

CREATE TABLE Product (
  ProductID INT,
  Name VARCHAR(50)
);
GO

CREATE TABLE OrderDetails(
  OrderID INT,
  ProductID INT,
  NumberOfItems INT
);
GO

CREATE PROC GetOrderDetails
   @OrderID INT AS
BEGIN
  --??? logic goes here
END
GO

We populate our table with following data and start our experiment:

INSERT INTO Product (ProductID, Name)
VALUES
  (1,'TV'),
  (2,'Tablet'),
  (3, 'Phone'),
  (4, 'Dress')

INSERT INTO OrderDetails (OrderID, ProductID, NumberOfItems)
VALUES
  (1, 1, 1), --TV
  (2, 2, 1), --Tablet
  (3, 3, 1), --Phone
  (4, 4, 1), --Dress
  (4, 1, 1)  --TV

SELECT 'All Data in Product', * FROM Product
SELECT 'All Data in OrderDetails',* FROM OrderDetails

 
First four disappointing samples use the inner join.
By default JOIN stays for an INNER JOIN. The "inner" qualifier usually get omitted by developers, this is normal practice but the a better practice is to spell-out the complete join description so that your fellow developer, who is working at 2:00AM in the morning will not have to remember all cool intricacies of T-SQL. Here they are, all four stored procedures:
 

-- 1 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'INNER - 1 - Product & OrderDetails - WHERE' AS Notes
  FROM Product AS p
    JOIN OrderDetails AS od
     ON p.ProductID = od.ProductID
  WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO

-- 2 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'INNER - 1 - Product & OrderDetails - JOIN' AS Notes
  FROM Product AS p
    JOIN OrderDetails AS od
     ON p.ProductID = od.ProductID
     AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO

-- 3 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'INNER - 2 - OrderDetails & Product - WHERE' AS Notes
  FROM OrderDetails AS od
    JOIN Product AS p
     ON p.ProductID = od.ProductID
  WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO

-- 4 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'INNER - 2 - OrderDetails & Product - JOIN' AS Notes
  FROM OrderDetails AS od
    JOIN Product AS p
     ON p.ProductID = od.ProductID
     AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO

all four procedures return the same expected result, it is even getting boring:

 
Lets sharpen our refactoring skills and use the left outer join.
The "outer" means (as we all know) that all rows from the left table not meeting the join condition are included in the result,  set is usually. Use of the "outer" qualifier is optional so that LEFT OUTER JOIN = LEFT JOIN, not going to mention about best practices this time...
So what will happen if we will use LEFT JOIN? Lets try...

ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'LEFT JOIN - 1 - Product & OrderDetails - WHERE' AS Notes
  FROM Product AS p
    LEFT JOIN OrderDetails AS od
     ON p.ProductID = od.ProductID
  WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO 

Nothing serious happens at first, we have the same result, thank you to predicate in WHERE clause: 
 
 
Should we move the predicate? Lets do it:
 
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'LEFT JOIN - 1 - Product & OrderDetails - JOIN' AS Notes
  FROM Product AS p
    LEFT JOIN OrderDetails AS od
     ON p.ProductID = od.ProductID
     AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
 
Surprised? Don't be, because this query returns all 4 Product records, as intended by LEFT OUTER JOIN and replaces all those non-matching order details data with NULLs.
This result listed below is not what we really want...

 
Lets go back to predicate in the WHERE clause and flip tables around, for those very smart you can try RIGHT OUTER JOIN instead,
and see how it works:
 
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'LEFT JOIN - 2 - OrderDetails & Product - WHERE' AS Notes
  FROM OrderDetails AS od
    LEFT JOIN Product AS p
     ON p.ProductID = od.ProductID
  WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
 
With this change we are back to expected two (2) records for Order number 4

 
Let's have more fun, let's move it again !
 
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'LEFT JOIN - 2 - OrderDetails & Product - JOIN' AS Notes
  FROM OrderDetails AS od
    LEFT JOIN Product AS p
     ON p.ProductID = od.ProductID
     AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
 
It is not a surprise any more, it is an introduced-unexpected-feature, now we are getting all the details for all orders, while product name is populated only for the Order #4.
This is very expensive feature in terms of performance impact and in terns of data exposure, new code will pull all records from OrderDetails table. Here is the result:
 
 
Part Two - Deal with it

As you can see, moving a given predicate from WHERE clause to a JOIN clause can dramatically alter expected result. Visually such a change is very hard to catch, especially when reviewing complex queries. Imagine if code will be also include cumbersome comments that migh lead reviewer to believe that behavior was actually intendedAn attempt to run such query against unreliable data in staging environment may not expose an error or expose it inconsistently, reducing testing efficiency and allowing such an ambiguous code to be released to production.

Here is how we can protect the user, help testing team, remove a burden from a reviewer and improve confidence in a code.

Let's cover the code with the Unit Test! Here is the code: 

USE DBTestDriven
GO
--************************************************************************************************
EXEC DBTD_DROP_PROC_IF_EXISTS 'UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder'
GO
--************************************************************************************************
CREATE PROC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder
   @v_Debug INT = 0 --by default do not return debud info
AS
BEGIN
   EXEC DBTD_UNIT_TEST 'OrdersBusinessLogic'
   EXEC DBTD_USE_TRANSACTION 'We are going to delete data from Product and
                              OrderDetails table and need transation to restore everything back'

   DELETE FROM Product --this is brutal
   DELETE FROM OrderDetails --this is brutal

   --add predictable set of data
   INSERT INTO Product ( ProductID, Name)
     VALUES (1,'TV'), (2,'Tablet'), (3, 'Phone'), (4, 'Dress')

   INSERT INTO OrderDetails ( OrderID, ProductID, NumberOfItems)
     VALUES 
      (1, 1, 1), --TV
      (2, 2, 1), --Tablet
      (3, 3, 1), --Phone
      (4, 4, 1), --Dress
      (4, 1, 2)  --TV

   CREATE TABLE #ExpectedResult(
     OrderID INT,
     ProductID INT,
     Name VARCHAR(50),
     NumberOfItems INT
   )

   CREATE TABLE #ActualResult(
     OrderID INT,
     ProductID INT,
     Name VARCHAR(50),
     NumberOfItems INT,
     Notes VARCHAR(250)
   )

   --set expectations
   INSERT INTO #ExpectedResult
     (OrderID, ProductID, Name, NumberOfItems)
     VALUES (4, 4, 'Dress', 1),
            (4, 1, 'TV',    2)

   --Run business logic that we are trying to verity.
   INSERT INTO #ActualResult
     EXEC GetOrderDetails @OrderID = 4

   --Check expectations
   IF @v_Debug != 0
   BEGIN
     --get debug info before running assert
     SELECT 'Product', * FROM Product
     SELECT 'OrderDetails',* FROM OrderDetails
     SELECT '#ActualResult', * FROM #ActualResult
     SELECT '#ExpectedResult',* FROM #ExpectedResult
   END

   EXEC DBTD_ASSERT_ROWSETS_ARE_EQUAL
     '','','#ActualResult',
     '','','#ExpectedResult',
     'USE EXPECTED COLUMNS',
     'something is wrong with GetOrderDetails procedure'
END
GO 

Run Unit test manually 
BEGIN TRAN
EXEC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder @v_Debug = 1
ROLLBACK
 
when running our code against last version of the procedure that user LEFT OUTER JOIN and predicate in the join clause we get following error: #ActualResult and #ExpectedResult recordsets should have same number of rows. something is wrong with GetOrderDetails procedure. 

 
you would know why Expected and Actual result is different after examining the table output listed below
 
Now lets use one of the earlier created versions of the GetOrderDetails procedure 
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
   SELECT
     od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
     'INNER - 1 - Product & OrderDetails - JOIN' AS Notes
   FROM Product AS p
     INNER JOIN OrderDetails AS od 
      ON p.ProductID = od.ProductID
      AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
 
and rerun the unit test 
BEGIN TRAN
EXEC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder @v_Debug = 1
ROLLBACK
 
as you can see below - everything is fine, we got our expected records, code is safe

 
 
 
Conclusion 
Wake up when you doing code review or answering an interview question, do not assume - read the code.
Or better - cover you code with unit tests.  
 
 

See Also

Thursday, July 16, 2015 2:24:00 PM

Database Continuous Integration - Chapter 2 - Define a Product 

In this chapter: aligning terminology and defining a product. 
 

If you landed on this page from a search engine, we would recommend starting with the first chapter

Let's define what is that we are trying to integrate continuously. At the first glance, it might seem obvious that we are talking about just the one thing - the database, and that is our product of integration. Question

Do we really talk about one thing? 

When we say - "database" - here is what people might think: 

  • DBA would think of physical servers with database software instances (like Oracle or SQL Server) installed on a few servers, a database created with files located on physical and logical drives, and populated with data;
  • Developer who works with database centered application might think of a few tables that exist somewhere in a database this application have access to;
  • Database designer might think of relationship, integrity, access control, particular database objects that achieve the required functionality;
  • User would think simply of data. Data loaded from raw files, collected from sensors, manually entered, bought or acquired in any other way. User needs data.
 

The magical "database" term removes obstacles and allows all stakeholders in their different roles work together, and at the same time it carries very high level of ambiguity. 

Let's try to remove ambiguity, set expectations and to focus on actual CI implementation for a particular database product.    

Please welcome the “DNA and The Creature” example:  

DNA   The Creature
On one side there are developers and designers who create databases, write database code, create packages and procedures The Terminology Wall  On the other side, there are Users who use data in a database and DBAs who support that database

Let say, for example, we have designed a database that describe a creature, the "The Creature" database. Two tables is all we need:   

CREATE DATABASE TheCreature 
GO  
 
CREATE TABLE CreatureAttributes( 
       AttributeID int, 
       Name nvarchar(50), 
       Value nvarchar(50) 
GO 
 
CREATE TABLE SignificantEvents( 
       EventID int, 
       EventDate datetime, 
       EventDescription nchar(10) 
GO 
 

Done - we got a product. So…  Coding is done; testing is done, we got approval from the client; we saved our code in source control system as one small file called A_Creature_DNA.sql

Then, this product (the SQL file) was distributed to many divisions of the company. Many departments installed this product internally on their production servers and started using their own “The Creature” databases.

Let say one office accumulated following data: 

CreatureAttributes table 

AttributeID Name Value
1 Class Mammal
2 Family Rabbit
3 Name Bunny
4 Diet Herbivore 
5 Lifespan Limit  12 years
6 Birthday 2012-01-01

 SignificantEvents table 

EventID
EventDate EventDescription
1
2014-01-01 4:00 PM
New creature is born of class Mammal
2 2014-01-01 4:01 PM New creature chooses to be a Rabbit 
3 2014-01-03 1:00 AM 
Creature tried to eat grass - and it likes it 
4 2014-01-03 2:00 PM We named it Bunny 
5
2015-02-03 7:00 PM 
Bunny ran away from Fox. Very scary...
6 2016-04-04 8:00 AM  Bunny met Dolly, they are friends now 
7 2019-05-08 9:00 PM  Bunny got to a hospital 
8 2019-05-09 5:00 AM  Bunny feels better and going home 
9 2020-08-23 9:00 AM Bunny goes to school 
 

CreatureThere also might be other department, who grew their rabbits or lions, or horses, or other unknown chimeras.  

From a developer Joe standpoint, who invented “The Creature” database, the final database product is the SQL code for those two tables. 

From user Jill standpoint, who uses this product daily, the “The Creature” database is in the actual database installed on the server and populated with real data.  

Both Joe and Jill have different needs from the database. Both use a “database” term in their way. In the end, they are working with two different database products - the database code and the database that this code has created.  

 

See Also:

 

Monday, January 12, 2015 6:26:00 PM

Database Continuous Integration 

By: Alex Podlesny

I want to begin a discussion about Database Continuous Integration (CI) and slowly unwind this concept within the database framework. 

You are welcome to start from any chapter, but same words might mean different things to people sitting in the same meeting room and working together as the same team. Just try to ask an Oracle and SQL Server DBA the simple question of "what is the database?", and you would understand. 

 
Monday, January 12, 2015 6:25:00 PM

Database Continuous Integration - Chapter 1 - The SDLC 

In this chapter: defining a database lifecycle, a flow of code and environmental differences.    
 

Power of SDLC

 
In small startups and big corporations, the software development process follows internal lifecycle adopted by the company. It can be one or two step process, it can be manual,  it might be evolving. The benefits of SDLC are hard to underestimate since they provide standard guidelines, minimize risk, improve quality, provide consistency and play a significant role in safeguarding actual business and its products.   
 

Flow of code  

 
Everything starts in the development camp. Here is where changes are made, queries are written, tables created, and databases designed.  Very Simple DB SDLC Flow
All changes, no matter how big or small, eventually flow to a testing, staging, or other non-production environments. Over there those changes are reviewed by someone other than the original inventor, tested, checked for compliance, verified, documented, accepted and scheduled for a production release. 
 
Scheduled releases are double checked, triple checked and eventually deployed to a production environment. 
 
In production, the newly introduced changes are confirmed, checked, rolled back when necessary, documented and left to be forgotten. 
 
As database projects mature, development priorities slowly overtaken by an invisible support and operation priorities; changes to the database design might still happen, but they  are  happening  on very limited scale. Many new changes tend to follow SDLC process, but that does not happen all the time. 
 

The life of a production database 

 
Life of Production DBLife turns into a real adventure in a production environment. Here, a new code released and updates installed. Hardware changed. Performance tuned.  Backups configured.  Recovery exercised to verify a backup process. Availability insured. New data loaded. Hardware-software-network issue resolved; configuration changed; security and permissions updated.  
 
After database becomes live, the flow of requests never stop. 
 
For many organizations, the changes to production databases might follow different SDLC (formal or informal) and never trickle back to the development environment. After a while, number of differences grow between development and production environments.
 

Development environments 

 
Oh yes, those development environments…  
 
The Development - is an environment where changes are happening. Regardless of the importance this environment has very little love. It is not favorite environment for developers, neither it is the preferred environment for testers nor it is an ideal environment for administrators. So why do we have this absence of love?  
 
Here is why:  
 
  • This environment refreshed with production data only once in a while; 
  • It is most likely underpowered and useless for query tuning;  
  • Developers can delete each other’s code and data at any time, well, mostly unintentionally;  
  • It is unreliable - too many changes in data and logic, too many contentions; 
  • It has bugs and unfinished features at all the time; 
  • Testing activities leave too much rubbish behind. 
 
Competition for database resources is one of the primary causes that hold development teams back. Competing needs affect the quality, stretch timeline, pose multiple limitations and consume a mountain of team’s time to sort out all unintentional impact.  
 
The Testing - environment gets a little more love. It updated more often; it is where developers would like to go to make sure that testers do not find their bugs. Continuous impact from testing efforts anticipated, but it is mitigated by frequent refreshes. 
 
The Pre-Production - or a User Acceptance Environment (because that is where users can try all new changes) is also an environment where developers and testers want to be. Pre-production environment is a favorite of DBAs because it is as close to production as it can get. From hardware standpoint, it is a sandbox for performance tuning and a final resort before delivering new changes to a live production database. 
 

See Also:

 

Monday, January 12, 2015 6:25:00 PM

Database Continuous Integration - Chapter 7 - Production CI 

In this chapter: taking continuous integration process to production. 
 
If you landed on this page from search engine, we would recommend starting with the first chapter 

Here is how CI practice can be implemented as backbone of the production process:  

  1. Maintain source code repository:
    • Follow the same principles outlined in Chapter 4, plus: 
    • Create separate Version Control System (VCS) folder (or branch) to store all objects scripted from production system; this folder becomes a "Base Source Code" for you database. 
    • Create separate VCS folder (or branch) for each release that is currently in the queue to go to production, all development changes represent a "Delta Source Code". The Delta is where all developers are contributing their code until feature is ready to go to the next step in the adopted SDLC process.  
       
  2. Commit changes often, never longer than a day:
    • Follow the same principles outlined in Chapter 4, plus: 
    • Set up automated process that script all production database objects and core metadata in a production database, and then checks all those scripts to VCS repository on a daily basis. 
    • Yes, every night automated process should collect and check-in production code into VCS. Over time, it becomes a history of all production changes that can be used to monitor the state of the production system.
       
  3. Every commit to the master branch should be built:
    • Follow the same steps outlined in Chapter 4, plus: 
    • All changes from the production environment should be incorporated into the build to help control assumptions. 
    • In addition to a brand-new-fresh-and-clean database artifact that needed for development, a production oriented CI process should produce a Delta Release artifact, which later is used to update production system automatically. 
       
  4. Build Automation:
    • Build automation should become the backbone of production delivery process with multi-layer, multi-environment delivery system.  
       
  5. Make sure that artifacts are available as installation packages
    • Follow the same principles outlined in Chapter 4 
    • Installation package should be easily consumable by existing and home grown solutions. Keep your options open, tools are changing very rapidly, make sure you have a way to adopt a new approach  
       
  6. Build process should be fast:
    • Follow the same principles outlined in Chapter 4, plus: 
    • Production CI process can quickly become multi dependent and layered, look for ways to improve delivery time for each part of the process and for the process as whole;  
    • Manage the risk of "never building final product", minimize the impact from future releases that currently are actively developed. Isolate, separate, branch, minimize and simplify each release. Try to achieve a "one deployment a day" goal, when you are there improve too many per day, if you are not there re-think re-engineer, change… 
       
  7. Automated Unit Testing:
    • Follow the same principles outlined in Chapter 4, plus: 
    • Unit Test should become paramount requirements. Any changes in code, no matter how small should have a unit test for that change.
    • Test Driven Development anybody? why not? 
    • Focus on code coverage, improve it with every commit to VCS. 
    • Introduce code review and sign-in for every pull request. 
    • Please, somebody, check my code! 
    • Introduce integration tests. Developers cannot mock forever. Test the actual code you build. 
    • Create and automated process for any steps that run multiple times, those steps should not be executed manually. 
       
  8. Available Build Results:
    • Follow the same principles outlined in Chapter 4, plus: 
    • Something fails all the time; it is human to make mistakes. 
    • Install TVs in the hallways, install them in the kitchen, hang them on an empty wall right by Steve Job's picture to increase visibility. Let your build process inspire your team.
       
  9. Keep the set of pre-production environments. Configure automated deployments to them
    • Create as many environments as needed, as many as you can afford;
    • Use virtualization, use instances;  
    • Automate provisioning;  
    • Reuse existing hardware, move where it is needed, convert it to build agents, if you do not need it - donate to another team; 
    • Configure automated deployment to all of your environments 
    • Use the same tools to deploy to production and to pre-production. You want to know what fails before it fails in production; 
    • Do not release anything if build fails, even if it is one-insignificant-tiny-tiny unit test, know that this unit test was there for a reason; 
    • If something failed - stop, think, create a unit test to check for failing conditions, change your process to prevent future failures; 

      What to deploy? What artifacts?
      • the primary deployable artifacts in the production CI is the Delta Source Code that contains scripts to load well-tested code or well known by now data. 
         
  10. Tune the process to the point of continuous deployment to production
    • At first, define what is the "Auto-Deployable Release" that team can accept. 
    • Create guidelines to identify Manual Release. Manual release - is the release that should not be deployed automatically. Make guidelines available for every team member. 
    • Try to get code reviews, documentation, sign-offs, approvals and other paperwork while new product is moving through development pipeline. By the time, it reaches the UAT environment you would know if the release is auto- or manually-deployable.  
    • Do not push the release to production if anything fails in a build, even if it is one-insignificant-tiny unit test. 
    • Use the same tools to deploy to production and to pre-production environments. Deployment tools should use the same technology, the same workflow, and the same process to deliver changes. You want to catch failures before they take down production environment;    
    • Use the same tools to deploy automated and manually released with the main difference being a step that requires human interaction - push a button to deploy Manual Release  to prod. 

 

Be realistic - Not everything could (nor should) be automated - just tune the process to the point of continuous deployment to production of all the changes that can be automated. 

 

 

Sunday, January 4, 2015 4:32:00 PM

Database Continuous Integration - Chapter 6 - From Dev to Prod and Back 

By: Alex ​Podlesny

In this chapter: from development to production and from production to development workflow. 

If you landed on this page from search engine, we would recommend starting with the first chapter

Back in Chapter One, where we have discussed SDLC, we have introduced the realities of the production world. 

Many businesses built around one database or, perhaps, multiple databases, they grow together, expand and evolve through well know development cycles:  

  1. Develop new feature 
  2. Move it to production  
  3. Use a new feature until it becomes old  
  4. Come up with new ideas  
  5. Back to Step 1 

The big difference between CI for a product development and CI needed to support corporate production environment is a product of integration: "The Creature DNA" v.s. "The Creature" (check Chapter Two for more information). While "Creature DNA" can be sitting on a shelf and can be assembled and reassembled any time. The actual "The Creature" database could not be turned off, reassembled or changed, especially in the high-availability projects. Ironically over lifetime, very close to real life scenario, the Creature can get new properties that original DNA did not even had. Maintenance, performance tuning, constant flow of new data and hotfixes, hardware-software-network issue resolution, upgrades, configuration changes can quickly introduce mountains of differences in how both behave. 

If technologists want to deliver new functionality to running, potentially high-availability database, they need to look into the idea of incremental upgrades. Upgrades that slowly transform existing system into an adaptive and always evolving production environment with new functionality and without impact.  

When idea of continuous updates and continuous delivery of changes accepted, the next step is to set up a backflow of changes, unknown to developers, from the production environment back to the development. 

In such a transformed place:

  • the product of CI for a production environment can be a simple set of scripts that introduce new functionality - an upgrade
  • the goal of CI stay the same - is to make sure that by the time those scripts are ready to be deployed to production, there are:
    • no known integration issues 
    • a minimal as possible risk of impact 

Production lifecycle should include steps to push all production changes continually back to development environments and close the potential code difference gap between development and productions environments: 

Dev --> Prod Prod --> Dev
  1. develop new feature 
  2. move it to production
  3. use new feature until it becomes old  
  4. come up with new ideas  
  5. back to Step 1 
  1. recognize a change in production 
  2. script that change 
  3. deliver this change back to source code repository in development environment
  4. back to Step 1

 

With dev2prod and prod2dev vision let’s move to the next chapter and examine production CI 

 

Sunday, January 4, 2015 4:30:00 PM

Database Continuous Integration - Chapter 5 - Artifacts and Installation Packages 

By: Alex ​Podlesny

In this chapter: database artifacts. 

If you landed on this page from search engine, we would recommend starting with the first chapter

Almost at every step of a CI process there would be some sort of a Database Artifacts generated; they can be different and similar, and all fulfill their individual purpose. Here a few core samples:  

SQL Scripts - a group of database SQL scripts. Can be separated in following categories: 

Artifact1. The complete list of scripted-out database objects and metadata. If run in an empty database server, it creates brand new database will all of its objects and initial set of data;
2. A package that add new functionality or changes existing functionality to an existing database;
3. A package that inserts new or manipulate existing data within existing database. 

An installation, in this case, can be a simple execution of the scripts on target database server. While script execution might not be a viable approach for databases with many objects and metadata, it can be quite efficient to deliver hotfixes and incremental updates, especially to production environments. 

Back Ups Or DB Files - different flavors of backup processes available for various database engines. Some of them work for a particular use case better than other. They usually divide into following categories: 

  • Cold backup - is done when database server is down, and actual database files can be copied to another location (preferred approach). In Oracle, it might require server shutdown while in the SQL Server it can be achieved by detaching the database in order to gain access to actual database files;  
  • Hot backup - performed while database is in use.

An installation, in case of backups, can be completed by the set of scripts that perform automated restore on a target database server. While installation would be an excellent solution for application development and automation of unit tests, it has limited use for deploying changes to running production and running test environments. 

Virtual Servers - a set of fresh virtual clones can be used as a starting point for a CI process to install fresh and clean databases.  

An installation, in this case, consist of an on-demand virtual machine provisioning by a consuming system. 

Solution and a type of artifact do not need to be rigid, a mix of the strategies can be chosen to satisfy development and integration tasks that appear at a time.

 

Sunday, January 4, 2015 4:27:00 PM

Database Continuous Integration - Chapter 4 - Looking through development lenses 

By: Alex Podlesny

In this section: adopting Continuous Integration as core of the development process  

If you landed on this page from search engine, we would recommend starting with the first chapter

From an application standpoint, database is essentially a resource. As long as the application has access to a given database it'll work. Looking back into second chapter of our discussion, essentially this resource is a some “Creature” created base "Creature DNA".  

So, if we can create brand-new-fresh-and-clean database, backup this database and keep files as artifacts, then this files can be used by other processes to restore and reuse that brand-new-fresh-and-clean database in the target environment. 

If this is achievable, then data-centric application would have the luxury to use clean non-polluted database with the latest tested code. 

Here is how to implement CI practice as a backbone of the development process:  

  1. Maintain source code repository:
    • As the first step, installing in-house (or using a hosted solution) SVN, Git, Mercurial, TFS or any other system would do.  
    • RepositoryAt the next step, all database code should be scripted. Make scripts that create tables, views, triggers, stored procedures, functions, synonyms, schemas, roles, permissions,  data types, constraints, and every other thing that an application might need. Together with scripts that create database objects, you need scripts to populate application-specific metadata and a few seed records. 
      Each of those scripts should be checked-in into code repository.  
      The goal here is to make the clean database populated with default set of metadata; 
       
  2. Commit changes often, never longer than a day:
    • CommitAll developers should commit code changes to source code repository as soon as possible, providing comments and explanations why changes made, why new objects added or why do they removed. 
      Changes to default metadata should be checked-in to a documented as well.
    • There is no big task that could not be split into committable small sub-parts, developers should not hold features and code hostage; 
       
  3. Every commit to the master branch should be built:
    • Out of many branches that developers use in the source code repository a master branch should always be build when new changes are committed;  
    • But what does it mean to build? how can it be built?
      These are good questions, and we cover all terms in more detail in our future post. For now let assume that we prepare a package that have all the scripts necessary to create brand-new-fresh-and-clean database;
      To get that done create set of build scripts that automatically download all committed changes from source code repository, run those database scripts to produce brand-new-fresh-and-clean database, and finally backup this database;
       
  4. Build Automation:
    • AutomateMany build automation tools exist on the market today, like TeamCity, Jenkins, Hudson, and others, some of them are free. Those tools make the life of automating a build processes very easy and enjoyable. 
       
  5. Make sure that built artifacts are available as installation packages
    • Save time for your fellow team members, instead of  writing miles of deployment documentation, create installation packages ready for automated deployments and available to team members.   
    • What sort of installation package to create? Are there many different kinds?
    • These are superb questions, and we cover them in more details in our future posts, for now let assume that we prepare an archive with following files:
            1) a package that have all the scripts necessary to create a brand-new-fresh-and-clean database
            2) script that call this package to create brand-new-fresh-and-clean database on the target server   
    • Installation package should be easily consumable by existing commercial, open source or simple home grown tools
       
  6. Build process should be fast:
    • Build server should continuously monitor source code repository for new changes and start the build process as soon as possible. 
      A few minutes to build a product is excellent, half an hour is questionable, an hour is unacceptable.  
    • System might be configured to check for changes, compile the solution, run all unit tests, prepare and publish artifacts - all in One Giant Process. 
    • Alternatively, it may be configured with many small independent sub-processes that create their artifacts.   
       
  7. Automated Unit Testing:
    • Automated Unit TestingAfter build is done; product should be checked for conformity to expectations and requirements by running a set of automated unit tests.
      Unit tests should be created by developers, since they are those who build a feature, and they are first who verifying that new functionality. However, that might not always be the case; integrators and testers should automate their testing as well. 
    • Adopt Test Driven Development practices.    
    • Unit Test should be stored in the source code repository and be available to the build servers alongside database scripts used to create database artifacts. 
       
  8. Available Build Results:
    • Everyone in a team should know that build failed. The earlier team knows, the faster they can fix it. 
      Preventing downstream environment from getting corrupted code, and uncovering integration issues is at the core of a continuous integration practice 
    • Any existing on the marked CI and deployment servers would make build results available to users. They can be configured to share the status in many different ways: from email to text message, from the web page to flashing red light in the hallway. 
    • Everyone should know when build failed, what code brought it down, when was it fixed and how; 
       

Some might argue that following two steps are not relevant when product is just a brand-new-fresh-and-clean database intended to be used by developers. 

For now here they are, the extra steps: 

  1. Keep the set of pre-production environments. Configure automated deployments to them 
  2. Tune the process to the point of continuous deployment to production  
 
 

See Also:

 

Sunday, January 4, 2015 4:25:00 PM

Database Continuous Integration - Chapter 3 - What is Continuous Integration? 

By: Alex Podlesny

In this chapter: a general definition of Continuous Integration term.  

If you landed on this page from search engine, we would recommend starting with the first chapter 

Continuous Integration - What is it? How can we get there? Can we buy it off the shelf? 

Continuous integration (CI) is the practice of making sure that changes are integrated continuously into your product so that many issues, integration problems, and impact can be discovered and avoided earlier in the SDLC process. 

Here are main principles of continuous integration process: 

  • Maintain source code repository; 
  • Commit changes often, never longer than a day;  
  • Every commits to the master branch should be built; 
  • Employ Build Automation; 
  • Make sure that artifacts are available as installation packages;
  • Automated Unit Testing; 
  • Build process should be fast; 
  • Keep the set of pre-production environments. Configure automated deployments to them;
  • Make Build Results Available; 
  • Tune the process to the point of continuous deployment to production.

If you choose a single tool, you may find over time that some of the steps are hard to achieve, or that they become an impediment to your business, slowing it down, impacting cost. If it is where you are: look for other tools or use multiple tools or different solutions. Be open. 

To get continuous integration right, you need to persist and be a leader! You will have a long path to team building and nurturing a CI culture. You will spend much time selling the idea to your peers, superiors, and the team. Know, that not every team member adopts  a plan. Take it one step at time. 

Getting CI working for your company can be a fun quest, it should not be bloody conquest. Don’t get upset, give everybody their time, persist, and become an example of excellence.

CI sound cool, but it might not be good for your project - to use it or not - is a decision that you need to make.  

See Also:

Sunday, January 4, 2015 4:19:00 PM

Naming Convention Enforcer  

For instance you want to enforce naming convention where objects should not start with "spData" prefix. Here is the simple unit test that is searching through object names and definitions in target database and tells if there is any naming convention issue:

 

EXEC DBTD_DROP_PROC_IF_EXISTS 'UT_NAMINGCONVENTION_ObjectPrefixCheck'

GO

 

CREATE PROC UT_NAMINGCONVENTION_ObjectPrefixCheck

       @v_Debug BIT = 0

AS

BEGIN

       DECLARE @v_SearchString VARCHAR(128) = '%spData%',

              @v_TargetDB          VARCHAR(128) = 'MyDatabase'

 

       DECLARE @v_SearchResult NVARCHAR(MAX),

              @v_SQL NVARCHAR(MAX)

                = ' INSERT INTO #ObjectsList(Id,Name)

                     SELECT id, name FROM sys.sysobjects

                     WHERE Name LIKE ''' + @v_SearchString + ''''

 

       CREATE TABLE #ObjectsList(

              Id            INT,

              Name   NVARCHAR(255)

       )

 

       CREATE TABLE #DefinitionSearchResult(

              [Object_ID] INT,

              ObjectName VARCHAR(500),--Object name with database and schema prefix

              [Type]  VARCHAR(50),

              Name  SYSNAME,       --Just an object name

              SearchString NVARCHAR(MAX)

       );

 

       --search through definition

       EXEC DBTestDriven.dbo.DBTD_SEARCH_ObjectsDefinition

              @v_SearchString = @v_SearchString,

              @v_TargetDB = @v_TargetDB,

              @v_Exclude = NULL,   --no exclusion at this time

              @v_CleanOldDefinitionSearchResult = 0,--don’t clean for now

              @v_SearchResult = @v_SearchResult OUTPUT

 

       --search through names

       EXEC DBTestDriven.dbo.DBTD_SP_EXECUTESQL

              @v_TargetDatabase = @v_TargetDB,

              @v_SQLCode = @v_SQL

 

       IF @v_Debug = 1

       BEGIN

              SELECT * FROM #DefinitionSearchResult

              SELECT * FROM #ObjectsList

              SELECT @v_SearchResult AS [@v_SearchResult]

       END

 

       EXEC DBTD_ASSERT_TABLE_HAS_NO_RECORDS

            '#ObjectsList',

            '(A) Naming convention issues'

       EXEC DBTD_ASSERT_TABLE_HAS_NO_RECORDS

            '#DefinitionSearchResult',

            '(B) Naming convention issues'

END

GO

 

And now lets run this unit test  

       --just run a unit test

       BEGIN TRAN

       EXEC UT_NAMINGCONVENTION_ObjectPrefixCheck @v_Debug = 1

       ROLLBACK

 

See Also

 

Friday, October 10, 2014 9:54:00 AM
Page 1 of 2 1 2 > >>