Sometimes I wonder if some of the more random scripts I post are useful to anyone but me so if this one is useful to you definitely drop a comment below!
Check it out: Our team was trying to confirm what our best practice was when creating flag columns in the data warehouse. As usual someone made an appeal to history: “What have we done in the past?” (Note: this isn’t really a valid way to establish a best practice but whatever Heather. Yes! Movie tie in! Bonus points to myself for it being an 80’s movie. Heathers! Pic and link below).
Back to the story: What values do we typically use for flag columns in the data warehouse? 0 or 1? Y or N? Yes or No? T or F? True or False? What have we done in the past? (Another note: It’s the data warehouse. The goal is to make it easy for the users. This eliminates 0/1. I’d also eliminate Y/N. Personal preference: True/False).
With a little information_schema hackery we can generate a nice little SQL script to get the answer for us. The basis is something like this:
SELECT c.TABLE_SCHEMA , c.TABLE_NAME , c.COLUMN_NAME , p.rows , 'select ''' + c.TABLE_SCHEMA + ''' as SchemaName, ''' + c.TABLE_NAME + ''' as TableName, ''' + c.COLUMN_NAME + ''' as ColumnName, count(1) as RowCnt, convert(varchar(200), ' + c.COLUMN_NAME + ') as ExistingValues from ' + c.TABLE_SCHEMA + '.' + c.TABLE_NAME + ' group by ' + c.COLUMN_NAME + ' UNION ALL' FROM INFORMATION_SCHEMA.COLUMNS AS c JOIN INFORMATION_SCHEMA.tables AS t ON c.TABLE_CATALOG = t.TABLE_CATALOG AND c.TABLE_SCHEMA = t.TABLE_SCHEMA AND c.TABLE_NAME = t.TABLE_NAME JOIN sys.partitions AS p ON c.TABLE_SCHEMA = OBJECT_SCHEMA_NAME(p.object_id) AND c.TABLE_NAME = OBJECT_NAME(p.object_id) AND p.index_id < 2 WHERE t.TABLE_TYPE = 'base table' AND COLUMN_NAME LIKE '%flag%' ORDER BY p.rows DESC
This script will run fairly quickly. A few seconds or so. The last column of the results will contain the SQL script you need to run to get the actual values.
WARNING! Now here’s where you need to be cautious! Review the results of the query above. Take a look at the row counts and consider whether doing an aggregation against some of your biggest tables is really necessary (or resource efficient). Customize the script as your needs dictate.
Okay, cut and paste the select statement from the original results into a new window. It will looks like this (sorry you’ll have to format your results yourself):
SELECT 'dbo' AS SchemaName
, 'TableA' AS TableName
, 'LikesHeathersFlag' AS ColumnName
, COUNT(1) AS RowCnt
, CONVERT(VARCHAR(200), LikesHeathersFlag) AS ExistingValues
FROM dbo.TableA
GROUP BY LikesHeathersFlag
UNION ALL
SELECT 'dbo' AS SchemaName
, 'TableB' AS TableName
, 'CancelFlag' AS ColumnName
, COUNT(1) AS RowCnt
, CONVERT(VARCHAR(200), CancelFlag) AS ExistingValues
FROM dbo.TableB
GROUP BY CancelFlag
UNION ALL
SELECT 'dbo' AS SchemaName
, 'TableC' AS TableName
, 'AnotherFlag' AS ColumnName
, COUNT(1) AS RowCnt
, CONVERT(VARCHAR(200), AnotherFlag) AS ExistingValues
FROM dbo.TableC
GROUP BY AnotherFlag
UNION ALL
If you run this as is you’ll get a syntax error. Scroll to the bottom of the query and remove the final “UNION ALL” (as well as any tables/columns you don’t want to include in the results).
In my case I also added a temp table so I could do faster analysis of the results. My final code looked like this:
SELECT 'dbo' AS SchemaName
, 'TableA' AS TableName
, 'LikesHeathersFlag' AS ColumnName
, COUNT(1) AS RowCnt
, CONVERT(VARCHAR(200), LikesHeathersFlag) AS ExistingValues
INTO #FlagValues
FROM dbo.TableA
GROUP BY LikesHeathersFlag
UNION ALL
SELECT 'dbo' AS SchemaName
, 'TableB' AS TableName
, 'CancelFlag' AS ColumnName
, COUNT(1) AS RowCnt
, CONVERT(VARCHAR(200), CancelFlag) AS ExistingValues
FROM dbo.TableB
GROUP BY CancelFlag
UNION ALL
SELECT 'dbo' AS SchemaName
, 'TableC' AS TableName
, 'AnotherFlag' AS ColumnName
, COUNT(1) AS RowCnt
, CONVERT(VARCHAR(200), AnotherFlag) AS ExistingValues
FROM dbo.TableC
GROUP BY AnotherFlag