blue-rocket

SQL Saturdays – Checking for Existence

Welcome back to SQL Saturday’s. Today I wanted to discuss a T-SQL trick I frequently use to search for the existence of records.

SQL Server has a fun function calls EXISTS() that I actually don’t use that often. Why? Because most of my existence checking involves multiple joined tables and verifying if records in one exist in another. Whether I’m doing an INSERT, DELETE, or just a simple SELECT, I need to compare these tables and verify the existence (or non-existence) of a record. Using the AdventureWorks database (2008 R2 version), here’s how I do it.

In this scenario, I have received an audit request. The auditing team is reviewing all our person records, but their files only show 20 records. They know this isn’t right because AdventureWorks has over 200 employees, not to mention the customer records the database should contain. So, as the DBA, I am tasked with finding all the records in our Person table that do not exist in the audit team’s file.

I created the audit team file with the below code.

SELECT TOP 20 BusinessEntityID, PersonType, NameStyle, Title, FirstName, MiddleName, LastName, Suffix
INTO #MyTemp
FROM Person.Person
WHERE PersonType = ‘IN’
ORDER BY Title;
–Get the first 20 people who don’t have a title

Before I start coding, I create a dirty backup in the form of a database snapshot to protect the integrity of my AdventureWorks database. That way, if I accidentally update or delete records, I can restore back to the original db.

The long, difficult way of finding the missing audit team records would be to compare BusinessEntityIDs or names. So I could do a SELECT on the temp table, copy over the IDs (if they exist in the file) or the names and then just do a WHERE clause that says “WHERE FirstName NOT IN (
) AND LastName NOT IN ( ). But not only is that tedious, it’s dangerous too. Records that I should receive back will disappear.

The audit team records I pulled out contain the first names of Mandar, Alma, Warren, Kelvin, Crystal, Francis, and Isabella with Laura and Crystal repeated, so I deleted the repeats. The last names are Samant, Son, Tang, Pal, Xie, Cai, Zeng, Zhang, Wang, Chen, Li, Liu, Huang, Wu, Hernandez, Lin, Zhou, Ye, Zhao, and Cox. BusinessEntityID 10261 has a name of Isabella Adams. Adams is not one of the last names in the above list, so there’s a reasonable expectation that I should see it in the comparison results. (I always locate one or two records that I can use to verify all my queries).

Here’s my query, ordered by LastName and FirstName to make locating Isabella Adams easier:

SELECT *
FROM Person.Person
WHERE FirstName NOT IN (‘Mandar’,’Alma’,’Warren’,’Kelvin’,’Crystal’,
‘Francis’,’Isabella’)
AND LastName NOT IN (‘Samant’,’Son’,’Tang’,’Pal’,’Xie’ ,’Cai’,’Zeng’,
‘Zhang’,’Wang’,’Chen’,’Li’,’Liu’,’Huang’, ‘Wu’,’Hernandez’,’Lin’,’Zhou’,
‘Ye’,’Zhao’,’Cox’)
ORDER BY LastName, FirstName;

And the results are:

No Isabella Adams here.

She’s nowhere in my results list. In fact, if I add the line “AND BusinessEntityID = 10261″ to the WHERE clause, I get a zero record resultset, which means my existence query is poorly written.

My life is made a lot easier since the audit team file contains my table’s primary key, BusinessEntityID. So I upload the file to my SQL Server database. Now that I have that information as a table, I have one of two solutions to find the true non-matching records. I can do a subquery or an OUTER JOIN. The performance of one over the other really depends on individual database design. My personal preference is for JOINs over subqueries, all other things being equal. But here’s how I do both.

The SubQuery Version uses a subquery in the WHERE clause that pulls only the BusinessEntityID from #MyTemp and compares it to the BusinessEntityID in Person.Person. In effect, I am saying “pull the records from my Person table where that key doesn’t exist in the temp table.”

SELECT *
FROM Person.Person
WHERE BusinessEntityID NOT IN
(SELECT BusinessEntityID FROM #MyTemp)
ORDER BY LastName, FirstName;

Result set:

Look! There’s Isabella Adams.

The OUTER JOIN version does basically the same thing as the subquery, but I feel it is a lot cleaner. With this code, I am saying, “Join these two tables on the BusinessEntityID column, but only give me the results from the Person table that don’t match anything in the temp table.

SELECT p1.*
FROM Person.Person p1
LEFT OUTER JOIN #MyTemp p2
ON p1.BusinessEntityID = p2.BusinessEntityID
WHERE p2.BusinessEntityID IS NULL
ORDER BY p1.LastName, p1.FirstName;

Result set:

Oh, and here she is too!

In both instances, I get back Isabella Adams, which means that these queries are more accurate than doing the search the hard way. I love OUTER JOINs and subqueries for these types of checks because it makes my life so much easier. They’re faster to type too.

What type of code do you use for existence checks?

Brandie Tarvin

Brandie Tarvin

Brandie Tarvin is an author and tie-in writer and a copy editor. In addition to her original fiction, she has written SQL Server articles, Shadowrun: The Role Playing Game sourcebook material and fiction as well as a piece for Hasbro’s Transformers. She currently lives in Florida with her family and is owned by two cats.

Latest Releases
Interesting Links
Browse the archives
Skip to content