Recently, I was working on one of the performance tuning assignment with an esteemed client. The size of their databases were growing tremendously. Growth in the database size is directly proportionate to disk space requirement. More the database size will be, more disk space will be required.
When we talk about the disk space for Production, it’s not only specific to one server. There are multiple factors that needs to be considered such as Primary server, Secondary servers, Backup servers and Backups (Full, Differential, Log backups).
Long time ago, I blogged about similar topic. Read the full article here.
While we were discussing the various possibilities to reduce the database size, client brought up a very interesting point in front of me. He said “My database has many huge tables (with size beyond 500 GB’s). Most of them have multiple columns which has null values across the table.”
Although, client was aware that there are many such tables and columns but there was not definite list that client has. This has opened a whole new arena for me and gave a unique insights.
I started thinking to build some logic to fetch the list of all such columns and tables. You will be amazed to hear but it’s true. There were more than 4K such columns belonging to 500+ tables. We estimated that we could save TB’s of disk space by marking all such columns as SPARSE.
More on SPARSE columns can be read here at Microsoft official documentation.
In this article we’ll see how to get the list of columns which has null values across all the rows of the table.
Note: Running the query for all the tables together would take lot of time. It is recommended to run the query for set of tables in batches parallely in multiple instances of SSMS or with the help of jobs.
In the next article, we’ll talk about how to estimate the storage savings by removing such columns or marking them as SPARSE.
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'tables_with_null_values_across') BEGIN CREATE TABLE tables_with_null_values_across ( TableName VARCHAR(200) , TotalRows BIGINT , ColumnName VARCHAR(200) ) END /* DROP TABLE IF EXISTS #nullable_columns; DROP TABLE IF EXISTS #tables_with_nullable_columns; */ SET NOCOUNT ON SELECT A.object_id , B.name AS TableName , A.name AS ColumnName , ROW_NUMBER() OVER(PARTITION BY A.object_id ORDER BY A.name ASC) AS RowID INTO #nullable_columns FROM sys.columns A INNER JOIN sys.tables B ON B.object_id = A.object_id AND B.type = 'U' LEFT JOIN tables_with_null_values_across D ON D.TableName = QUOTENAME(B.name) AND D.ColumnName = QUOTENAME(A.Name) WHERE A.is_nullable = 1 AND D.TableName IS NULL -- AND B.name IN ('', '') -- Note: Supply the table names in the filter clause for B.name in order to run the query in batches of tables. SELECT DISTINCT A.object_id , B.name AS TableName , IDENTITY(INT, 1, 1) AS RowID INTO #tables_with_nullable_columns FROM #nullable_columns A INNER JOIN sys.tables B ON B.object_id = A.object_id DECLARE @TableName AS SYSNAME , @ColumnName AS SYSNAME , @Object_ID AS INT , @Table_RowID AS INT , @Column_RowID AS INT , @Column_Distinct_Values_Count AS BIGINT , @Total_Rows AS BIGINT SET @Table_RowID = 1; WHILE EXISTS (SELECT 1 FROM #tables_with_nullable_columns WHERE RowID = @Table_RowID) BEGIN SELECT @Object_ID = object_id , @TableName = TableName FROM #tables_with_nullable_columns WHERE RowID = @Table_RowID; SET @Column_RowID = 1; WHILE EXISTS (SELECT 1 FROM #nullable_columns WHERE object_id = @Object_ID AND RowID = @Column_RowID) BEGIN SELECT @ColumnName = ColumnName FROM #nullable_columns WHERE object_id = @Object_ID AND RowID = @Column_RowID; DECLARE @SQLString NVARCHAR(500); SET @SQLString = N'SELECT @Column_Distinct_Values_Count = COUNT(DISTINCT ' + QUOTENAME(@ColumnName) + ') , @Total_Rows = COUNT(1) FROM ' + QUOTENAME(@TableName) + ' WITH (NOLOCK)'; BEGIN TRY EXECUTE sp_executesql @SQLString , N'@Total_Rows BIGINT OUTPUT, @Column_Distinct_Values_Count BIGINT OUTPUT' , @Total_Rows = @Total_Rows OUTPUT , @Column_Distinct_Values_Count = @Column_Distinct_Values_Count OUTPUT ; END TRY BEGIN CATCH END CATCH IF (@Column_Distinct_Values_Count = 0) BEGIN INSERT INTO tables_with_null_values_across (TableName, TotalRows, ColumnName) VALUES (QUOTENAME(@TableName), @Total_Rows, QUOTENAME(@ColumnName)) END SET @Column_RowID = @Column_RowID + 1; END SET @Table_RowID = @Table_RowID + 1; END DROP TABLE IF EXISTS #nullable_columns; DROP TABLE IF EXISTS #tables_with_nullable_columns;