Blog

SQL Script: Distinct Column Values Across All Tables

Sometimes I wonder if some of the more random scripts I post are useful to anyone but me so if this one is useful to you definitely drop a comment below!

Check it out:  Our team was trying to confirm what our best practice was when creating flag columns in the data warehouse.  As usual someone made an appeal to history: "What have we done in the past?" (Note: this isn't really a valid way to establish a best practice but whatever Heather. Yes! Movie tie in! Bonus points to myself for it being an 80's movie.  Heathers! Pic and link below).

Back to the story:  What values do we typically use for flag columns in the data warehouse? 0 or 1? Y or N? Yes or No? T or F? True or False?  What have we done in the past?  (Another note: It's the data warehouse.  The goal is to make it easy for the users.  This eliminates 0/1. I'd also eliminate Y/N. Personal preference: True/False).

With a little information_schema hackery we can generate a nice little SQL script to get the answer for us.  The basis is something like this:

SELECT  c.TABLE_SCHEMA
      , c.TABLE_NAME
      , c.COLUMN_NAME
      , p.rows
      , 'select ''' + c.TABLE_SCHEMA + ''' as SchemaName, ''' + c.TABLE_NAME
        + ''' as TableName, ''' + c.COLUMN_NAME
        + ''' as ColumnName, count(1) as RowCnt, convert(varchar(200), '
        + c.COLUMN_NAME + ') as ExistingValues from ' + c.TABLE_SCHEMA + '.'
        + c.TABLE_NAME + ' group by ' + c.COLUMN_NAME + ' UNION ALL'
FROM    INFORMATION_SCHEMA.COLUMNS AS c
        JOIN INFORMATION_SCHEMA.tables AS t
            ON c.TABLE_CATALOG = t.TABLE_CATALOG
               AND c.TABLE_SCHEMA = t.TABLE_SCHEMA
               AND c.TABLE_NAME = t.TABLE_NAME
        JOIN sys.partitions AS p
            ON c.TABLE_SCHEMA = OBJECT_SCHEMA_NAME(p.object_id)
               AND c.TABLE_NAME = OBJECT_NAME(p.object_id)
               AND p.index_id < 2
WHERE  t.TABLE_TYPE = 'base table' 
        AND COLUMN_NAME LIKE '%flag%'
ORDER BY p.rows DESC

This script will run fairly quickly.  A few seconds or so.  The last column of the results will contain the SQL script you need to run to get the actual values.

WARNING! Now here's where you need to be cautious!  Review the results of the query above.  Take a look at the row counts and consider whether doing an aggregation against some of your biggest tables is really necessary (or resource efficient).  Customize the script as your needs dictate.

Okay, cut and paste the select statement from the original results into a new window.  It will looks like this (sorry you'll have to format your results yourself):

SELECT  'dbo' AS SchemaName
      , 'TableA' AS TableName
      , 'LikesHeathersFlag' AS ColumnName
      , COUNT(1) AS RowCnt
      , CONVERT(VARCHAR(200), LikesHeathersFlag) AS ExistingValues
FROM    dbo.TableA
GROUP BY LikesHeathersFlag
UNION ALL
SELECT  'dbo' AS SchemaName
      , 'TableB' AS TableName
      , 'CancelFlag' AS ColumnName
      , COUNT(1) AS RowCnt
      , CONVERT(VARCHAR(200), CancelFlag) AS ExistingValues
FROM    dbo.TableB
GROUP BY CancelFlag
UNION ALL
SELECT  'dbo' AS SchemaName
      , 'TableC' AS TableName
      , 'AnotherFlag' AS ColumnName
      , COUNT(1) AS RowCnt
      , CONVERT(VARCHAR(200), AnotherFlag) AS ExistingValues
FROM    dbo.TableC
GROUP BY AnotherFlag
UNION ALL

If you run this as is you'll get a syntax error.  Scroll to the bottom of the query and remove the final "UNION ALL" (as well as any tables/columns you don't want to include in the results).

In my case I also added a temp table so I could do faster analysis of the results.  My final code looked like this:

SELECT  'dbo' AS SchemaName
      , 'TableA' AS TableName
      , 'LikesHeathersFlag' AS ColumnName
      , COUNT(1) AS RowCnt
      , CONVERT(VARCHAR(200), LikesHeathersFlag) AS ExistingValues
INTO #FlagValues
FROM    dbo.TableA
GROUP BY LikesHeathersFlag
UNION ALL
SELECT  'dbo' AS SchemaName
      , 'TableB' AS TableName
      , 'CancelFlag' AS ColumnName
      , COUNT(1) AS RowCnt
      , CONVERT(VARCHAR(200), CancelFlag) AS ExistingValues
FROM    dbo.TableB
GROUP BY CancelFlag
UNION ALL
SELECT  'dbo' AS SchemaName
      , 'TableC' AS TableName
      , 'AnotherFlag' AS ColumnName
      , COUNT(1) AS RowCnt
      , CONVERT(VARCHAR(200), AnotherFlag) AS ExistingValues
FROM    dbo.TableC
GROUP BY AnotherFlag

 

Really? A zero or one in a data warehouse flag column? You must be joking.

Heathers

Tags: , , , , , , , , , , , , , ,

No comments yet.

Leave a Reply