Clustered Index Key – Natural or Surrogate?

Background

This article will follow a practical approach to choose the right candidate for clustered index. It’ll investigate through the best practices recommended for clustered index, with the help of an experiment.

Primary Key and Clustered Index

Primary Key and Clustered Index are different from each other. Clustered Index should not be confused with Primary Key. Primary Key can be created with / without Clustered / Nonclustered Index.

Description of terminologies used

Natural Clustered Key: Column or set of columns forming the clustered index, naturally part of table having some business meaning associated with it.

Surrogate Clustered Key: Artificial column such as Identity or uniqueidentifier column, forming the clustered index. We’ve used Identity column as Surrogate Clustered Key in our experiment.

Clustered Index best practices

It should be as narrow as possible in terms of the number of bytes it stores.
It should be unique to avoid SQL Server to add Uniquefier to duplicate key values.
It should be static, ideally, never updated like Identity column.
It should be ever-increasing, like Identity column to avoid fragmentation and improve the write performance.

Why should we always have a Clustered Index?

Clustered Index out-performs NonClustered Index, since it stores the data for every row. Data page is stored in the leaf node of the Clustered Index.
Data is sorted based on clustered index key values and then stored.
The only time the data is stored in sorted order when the table contains a clustered index. Sort operation is most costly operator in SQL Server.
Since data is stored in sorted order, hence sorting on the clustered key column avoids the sort operator, and makes it best choice for sorting.
It helps the Database engine to lock a particular row, instead of a Page or a Table. Row level locking can be only achieved if table has Clustered Index. SQL Server applies lock on Row, Page or Table to ensure consistency.

What should be our Clustered Index?

By looking at the best practices, it’s very hard to find a Natural Key that can satisfy all the four recommendations.

Prima facie, it seems Surrogate Key, such as an Identity column seems to be the only best candidate for Primary Key.

Wait, if Surrogate Key is the best candidate then how to deal with following overheads?

How to minimize Sorting overhead?

Clustered Index column should be referred in Sort operation in the same sort order as defined in the index. Surrogate Key for e.g. Identity column is hardly the sorting requirement.

How to minimize Lock escalation – Page or Table lock?

All the DELETE and UPDATE should happen on Clustered Index Key column. There is hardly a requirement to delete / update based on Surrogate Key.

Alternate approach could be – take all the clustered key column values in a temp table and then using join or sub-query use it in delete / update statement. This is an additional overhead.

How to avoid Key Lookup?

Create nonclustered covering index. This is another overhead.

What we’ll achieve through this experiment?

We’ll compare the following parameters. It’ll be then left to us to decide what best suits to us, depending upon the requirement and workload.

Execution Time
- Avg. Read Execution Time
- Avg. Write Execution Time
Index size and read/write count
- Index size (KB)
- num_of_writes
- num_of_reads
Lock/Latch count/wait count/wait time
- row_lock_count
- page_lock_count
- page_lock_wait_count
- page_lock_wait_in_ms
- index_lock_promotion_attempt_count
- page_latch_wait_count
- page_latch_wait_in_ms
- tree_page_latch_wait_count
- tree_page_latch_wait_in_ms
Index usage stats
- user_seeks
- user_scans
- user_updates

Workload considered for this test

IoT (Internet of Things) and IIot (Industrial Internet of Things) domains are leading domains. These domains deals with huge volume of data. Data volume is huge in both read as well write, since it deals with machines.

A machine has multiple components and every components has multiple sensors and every sensor sends multiple messages per seconds/milliseconds.

Timestamp is the key attribute for the calculation of the KPI’s to show the real-time analytics.

We’ll have a very simple example with one table, that holds the Coffee Vending Machine’s Sensor Data. Every second, 1000’s of records of multiple machines gets written into this table. The scenario here pertains to an application which gets refreshed every 10 seconds and reads past few minutes of data to render analytics over the UI.

Following aspects are considered in the workload:

Committed Read to reproduce the locking
Uncommitted Read / Dirty Read
Insert new rows
Update existing rows to simulate the Lock Escalation
Concurrency with the help of 2 jobs each for both read and write

Script of table schema

CREATE TABLE Coffee_Vending_Machine_Data
(
    ID              NUMERIC IDENTITY(1,1) NOT NULL
    , Date_Time         DATETIME
    , MachineID         INT
    , Sensor1_Value     INT
    , Sensor2_Value     INT
    , Sensor3_Value     INT
    , Sensor4_Value     INT
)

Script to load table with lot of data

SET NOCOUNT ON
 
DECLARE @RowID      INT = 1
    , @MachineID        INT
    , @Date_Time        DATETIME
    , @Sensor1_Value    INT
    , @Sensor2_Value    INT
    , @Sensor3_Value    INT
    , @Sensor4_Value    INT
 
WHILE (@RowID <= 1773885)
    BEGIN
        SET @Date_Time  = GETDATE();
 
        SET @MachineID  = ROUND((RAND() * 10), 0)
        SET @Sensor1_Value  = ROUND((RAND() * 100), 0)
        SET @Sensor2_Value  = ROUND((RAND() * 100), 0)
        SET @Sensor3_Value  = ROUND((RAND() * 100), 0)
        SET @Sensor4_Value  = ROUND((RAND() * 100), 0)
 
        INSERT INTO Coffee_Vending_Machine_Data (Date_Time, MachineID, Sensor1_Value, Sensor2_Value, Sensor3_Value, Sensor4_Value)
        VALUES (@Date_Time, @MachineID, @Sensor1_Value, @Sensor2_Value, @Sensor3_Value, @Sensor4_Value)
 
        SET @RowID  =   @RowID + 1;
    END

Other prerequisites

Script of table to log the execution time

CREATE TABLE log_simulation_job_execution
(
	LogID				INT NOT NULL IDENTITY(1,1)
	, JobName			VARCHAR(100)
	, Start_DateTime	DATETIME
	, End_DateTime		DATETIME
	, Duration_MS		AS DATEDIFF(MS, Start_DateTime, End_DateTime)
)

Script of Read Stored Procedure

CREATE PROCEDURE [dbo].[usp_read_sensor_data]
(
	@Start_Date_Time	DATETIME
)
AS
	BEGIN
		SET NOCOUNT ON;
		DECLARE @Job_Start_DateTime	DATETIME
			, @Job_End_DateTime	DATETIME;
		SET @Job_Start_DateTime = GETDATE();
		SELECT AVG(Sensor1_Value)
			, MIN(Sensor2_Value)
			, MAX(Sensor3_Value)
			, AVG(Sensor4_Value)
			, COUNT(1)
		FROM Coffee_Vending_Machine_Data (NOLOCK)
		WHERE MachineID = FLOOR(RAND() * 10)
			AND Date_Time BETWEEN @Start_Date_Time AND DATEADD(MINUTE, FLOOR(RAND() * 10), @Start_Date_Time);
		SET @Job_End_DateTime = GETDATE();
		INSERT INTO log_simulation_job_execution
		(
			JobName
			, Start_DateTime
			, End_DateTime
		) 
		VALUES
		(
			'Simulate_Read_Load'
			, @Job_Start_DateTime
			, @Job_End_DateTime
		);
	END

Script of Write Stored Procedure

CREATE PROCEDURE [dbo].[usp_write_sensor_data]
AS
	BEGIN
		SET NOCOUNT ON;
 
 		DECLARE @Job_Start_DateTime	DATETIME
			, @Job_End_DateTime	DATETIME;
		SET @Job_Start_DateTime = GETDATE();
		DECLARE @RowID				INT = 1
			, @MachineID			INT
			, @Date_Time			DATETIME
			, @Sensor1_Value		INT
			, @Sensor2_Value		INT
			, @Sensor3_Value		INT
			, @Sensor4_Value		INT;
		DECLARE @Sensor1_Value_Avg  INT
			, @Sensor2_Value_Avg    INT
			, @Sensor3_Value_Avg    INT
			, @Sensor4_Value_Avg    INT
			, @Machine_ID_Min		INT
			, @Machine_ID_Max		INT;
		SELECT @Sensor1_Value_Avg	=	AVG(Sensor1_Value)
			, @Sensor2_Value_Avg    =	AVG(Sensor2_Value)
			, @Sensor3_Value_Avg    =	AVG(Sensor3_Value)
			, @Sensor4_Value_Avg    =	AVG(Sensor4_Value)
			, @Machine_ID_Min		=	MIN(MachineID)
			, @Machine_ID_Max		=	MAX(MachineID)
		FROM Coffee_Vending_Machine_Data;
 
		WHILE (@RowID <= 1000)
			BEGIN
				SET @Date_Time  = GETDATE();
 
				SET @MachineID  = ROUND((RAND() * 10), 0)
				SET @Sensor1_Value  = ROUND((RAND() * 100), 0)
				SET @Sensor2_Value  = ROUND((RAND() * 100), 0)
				SET @Sensor3_Value  = ROUND((RAND() * 100), 0)
				SET @Sensor4_Value  = ROUND((RAND() * 100), 0)
 
				INSERT INTO Coffee_Vending_Machine_Data (Date_Time, MachineID, Sensor1_Value, Sensor2_Value, Sensor3_Value, Sensor4_Value)
				VALUES (@Date_Time, @MachineID, @Sensor1_Value, @Sensor2_Value, @Sensor3_Value, @Sensor4_Value)
				SET @RowID  =   @RowID + 1;
			END
		-- This UPDATE statement may not make sense, but still keeping it here, to properly simulate the READ/WRITE overhead.
		UPDATE Coffee_Vending_Machine_Data
		SET Sensor1_Value		=	Sensor1_Value + @Sensor1_Value_Avg
			, Sensor2_Value		=	Sensor2_Value + @Sensor2_Value_Avg
			, Sensor3_Value		=	Sensor3_Value + @Sensor3_Value_Avg
			, Sensor4_Value		=	Sensor4_Value + @Sensor4_Value_Avg
		WHERE MachineID BETWEEN @Machine_ID_Min AND @Machine_ID_Max
			AND @Date_Time BETWEEN @Job_Start_DateTime AND GETDATE();
		SET @Job_End_DateTime = GETDATE();
		INSERT INTO log_simulation_job_execution
		(
			JobName
			, Start_DateTime
			, End_DateTime
		) 
		VALUES
		(
			'Simulate_Write_Load'
			, @Job_Start_DateTime
			, @Job_End_DateTime
		);
	END

Script of Read Job 1

USE [msdb]
GO
/****** Object:  Job [Simulate_Read_Load] ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:  JobCategory [[Uncategorized (Local)]] ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'Simulate_Read_Load', 
		@enabled=0, 
		@notify_level_eventlog=0, 
		@notify_level_email=0, 
		@notify_level_netsend=0, 
		@notify_level_page=0, 
		@delete_level=0, 
		@description=N'No description available.', 
		@category_name=N'[Uncategorized (Local)]', 
		@owner_login_name=<your owner login name>, @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [1] ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'1', 
		@step_id=1, 
		@cmdexec_success_code=0, 
		@on_success_action=1, 
		@on_success_step_id=0, 
		@on_fail_action=2, 
		@on_fail_step_id=0, 
		@retry_attempts=0, 
		@retry_interval=0, 
		@os_run_priority=0, @subsystem=N'TSQL', 
		@command=N'DECLARE @Date_Time	DATETIME
SELECT @Date_Time = MIN(Date_Time)
FROM Coffee_Vending_Machine_Data
EXEC [dbo].[usp_read_sensor_data] @Start_Date_Time = @Date_Time;
', 
		@database_name=N'MyDB', 
		@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every 10 second', 
		@enabled=1, 
		@freq_type=4, 
		@freq_interval=1, 
		@freq_subday_type=2, 
		@freq_subday_interval=10, 
		@freq_relative_interval=0, 
		@freq_recurrence_factor=0, 
		@active_start_date=20200412, 
		@active_end_date=99991231, 
		@active_start_time=0, 
		@active_end_time=235959, 
		@schedule_uid=N'e041a22f-3c9f-4009-8974-5496f4536bf7'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

Script of Read Job 2

USE [msdb]
GO
/****** Object:  Job [Simulate_Read_Load_2] ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:  JobCategory [[Uncategorized (Local)]]    Script Date: 4/25/2020 5:58:59 PM ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'Simulate_Read_Load_2', 
		@enabled=0, 
		@notify_level_eventlog=0, 
		@notify_level_email=0, 
		@notify_level_netsend=0, 
		@notify_level_page=0, 
		@delete_level=0, 
		@description=N'No description available.', 
		@category_name=N'[Uncategorized (Local)]', 
		@owner_login_name=<your owner login name>, @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [1]  ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'1', 
		@step_id=1, 
		@cmdexec_success_code=0, 
		@on_success_action=1, 
		@on_success_step_id=0, 
		@on_fail_action=2, 
		@on_fail_step_id=0, 
		@retry_attempts=0, 
		@retry_interval=0, 
		@os_run_priority=0, @subsystem=N'TSQL', 
		@command=N'DECLARE @Date_Time	DATETIME
SELECT @Date_Time = MIN(Date_Time)
FROM Coffee_Vending_Machine_Data
EXEC [dbo].[usp_read_sensor_data] @Start_Date_Time = @Date_Time;
', 
		@database_name=N'MyDB', 
		@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every 10 second', 
		@enabled=1, 
		@freq_type=4, 
		@freq_interval=1, 
		@freq_subday_type=2, 
		@freq_subday_interval=10, 
		@freq_relative_interval=0, 
		@freq_recurrence_factor=0, 
		@active_start_date=20200412, 
		@active_end_date=99991231, 
		@active_start_time=0, 
		@active_end_time=235959, 
		@schedule_uid=N'e041a22f-3c9f-4009-8974-5496f4536bf7'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

Script of Write Job 1

USE [msdb]
GO
/****** Object:  Job [Simulate_Write_Load] ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:  JobCategory [[Uncategorized (Local)]] ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'Simulate_Write_Load', 
		@enabled=0, 
		@notify_level_eventlog=0, 
		@notify_level_email=0, 
		@notify_level_netsend=0, 
		@notify_level_page=0, 
		@delete_level=0, 
		@description=N'No description available.', 
		@category_name=N'[Uncategorized (Local)]', 
		@owner_login_name=<your owner login name>, @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [1] ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'1', 
		@step_id=1, 
		@cmdexec_success_code=0, 
		@on_success_action=1, 
		@on_success_step_id=0, 
		@on_fail_action=2, 
		@on_fail_step_id=0, 
		@retry_attempts=0, 
		@retry_interval=0, 
		@os_run_priority=0, @subsystem=N'TSQL', 
		@command=N'EXEC [dbo].[usp_write_sensor_data]', 
		@database_name=N'MyDB', 
		@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every 10 seconds', 
		@enabled=1, 
		@freq_type=4, 
		@freq_interval=1, 
		@freq_subday_type=2, 
		@freq_subday_interval=10, 
		@freq_relative_interval=0, 
		@freq_recurrence_factor=0, 
		@active_start_date=20200412, 
		@active_end_date=99991231, 
		@active_start_time=0, 
		@active_end_time=235959, 
		@schedule_uid=N'9a6a8f7f-f2b6-4fb5-8e2d-4058b1cb71b5'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

Script of Write Job 2

USE [msdb]
GO
/****** Object:  Job [Simulate_Write_Load_2]    Script Date: 4/25/2020 5:58:27 PM ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object:  JobCategory [[Uncategorized (Local)]] ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'Simulate_Write_Load_2', 
		@enabled=0, 
		@notify_level_eventlog=0, 
		@notify_level_email=0, 
		@notify_level_netsend=0, 
		@notify_level_page=0, 
		@delete_level=0, 
		@description=N'No description available.', 
		@category_name=N'[Uncategorized (Local)]', 
		@owner_login_name=<your owner login name>, @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object:  Step [1]  ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'1', 
		@step_id=1, 
		@cmdexec_success_code=0, 
		@on_success_action=1, 
		@on_success_step_id=0, 
		@on_fail_action=2, 
		@on_fail_step_id=0, 
		@retry_attempts=0, 
		@retry_interval=0, 
		@os_run_priority=0, @subsystem=N'TSQL', 
		@command=N'EXEC [dbo].[usp_write_sensor_data]', 
		@database_name=N'MyDB', 
		@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every 10 seconds', 
		@enabled=1, 
		@freq_type=4, 
		@freq_interval=1, 
		@freq_subday_type=2, 
		@freq_subday_interval=10, 
		@freq_relative_interval=0, 
		@freq_recurrence_factor=0, 
		@active_start_date=20200412, 
		@active_end_date=99991231, 
		@active_start_time=0, 
		@active_end_time=235959, 
		@schedule_uid=N'9a6a8f7f-f2b6-4fb5-8e2d-4058b1cb71b5'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

Stats script 1 – Index size / avg. row size

SELECT tn.[name] AS [Table name], ix.[name] AS [Index name]
	, SUM(sz.[used_page_count]) * 8 AS [Index size (KB)]
	, SUM(sz.row_count) AS [Total Rows]
	, CAST((SUM(sz.[used_page_count]) * 8.00) / SUM(sz.row_count) AS NUMERIC(10,4)) AS [Avg Row Size (KB)]
FROM sys.dm_db_partition_stats AS sz
	INNER JOIN sys.indexes AS ix 
		ON sz.[object_id] = ix.[object_id] 
		AND sz.[index_id] = ix.[index_id]
	INNER JOIN sys.tables tn 
		ON tn.OBJECT_ID = ix.object_id
WHERE tn.[name] = 'Coffee_Vending_Machine_Data'
GROUP BY tn.[name], ix.[name]
ORDER BY tn.[name]

Stats script 2 – Index operational and usage stats

SELECT ix.[name] AS [Index name], stat.row_lock_count, stat.row_lock_wait_count
	, stat.row_lock_wait_in_ms, stat.page_lock_count, stat.page_lock_wait_count, stat.page_lock_wait_in_ms
	, stat.index_lock_promotion_attempt_count, stat.index_lock_promotion_count, stat.page_latch_wait_count
	, stat.page_latch_wait_in_ms, stat.page_io_latch_wait_count, stat.page_io_latch_wait_in_ms
	, stat.tree_page_latch_wait_count, stat.tree_page_latch_wait_in_ms, stat.tree_page_io_latch_wait_count
	, stat.tree_page_io_latch_wait_in_ms
	, ustat.user_seeks, ustat.user_scans, ustat.user_lookups, ustat.user_updates
FROM sys.dm_db_index_operational_stats(DB_ID(),OBJECT_ID('Coffee_Vending_Machine_Data'),-1,0) stat
	LEFT JOIN sys.dm_db_index_usage_stats ustat
		ON ustat.object_id = stat.object_id
		AND ustat.index_id = stat.index_id
	INNER JOIN sys.indexes AS ix 
		ON stat.[object_id] = ix.[object_id] 
		AND stat.[index_id] = ix.[index_id]
	INNER JOIN sys.tables tn 
		ON tn.OBJECT_ID = ix.object_id

Stats script 3 – Virtual file stats

SELECT files.name, stat.num_of_writes, stat.num_of_bytes_written, stat.num_of_reads, stat.num_of_bytes_read 
FROM sys.dm_io_virtual_file_stats(DB_ID(),NULL) stat
	INNER JOIN sys.sysfiles files
		ON files.fileid = stat.file_id

Stats script 4 – Avg. execution time

SELECT JobName, COUNT(1) AS [No of execution], AVG(Duration_MS) AS [Avg Duration (MS)] 
FROM log_simulation_job_execution (NOLOCK)
GROUP BY JobName

Server Configuration used for this test

This test has been performed on the server with following configurations.

OS : Windows Server 2019 Datacenter 10.0 <X64> (Build 17763: ) (Hypervisor)
SQL Server : SQL Server 2019 Developer Edition (64-bit)
RAM : 4 GB
Cores : 2 Virtual Cores
Disk Performance (Estimated)
- IOPS limit : 500
- Throughput limit (MB/s) : 60
On-Premises/Cloud : Azure Cloud

Pre-test stats

Note: Keep all the jobs Disabled.

Execution#1 with Surrogate Clustered Key (without nonclustered index)

Create the index using below query.

CREATE CLUSTERED INDEX IX_Coffee_Vending_Machine_Data_ID 
ON Coffee_Vending_Machine_Data (ID)

Enable all the jobs and disable it after 2 minutes. Once all the jobs are disabled, run the stats select queries.

Execution#2 with Surrogate Clustered Key (with nonclustered non covering index)

Drop the existing index using below query and run the stats select queries.

DROP INDEX IX_Coffee_Vending_Machine_Data_ID ON Coffee_Vending_Machine_Data
TRUNCATE TABLE log_simulation_job_execution

Create the below indexes, enable all the jobs and disable it after 2 minutes. Once all the jobs are disabled run the stats select queries.

CREATE CLUSTERED INDEX IX_Coffee_Vending_Machine_Data_ID ON Coffee_Vending_Machine_Data (ID)
CREATE NONCLUSTERED INDEX IX_Coffee_Vending_Machine_Data_MachineID_Date_Time 
ON Coffee_Vending_Machine_Data (MachineID, Date_Time)

Execution#3 with Surrogate Clustered Key (with nonclustered covering index)

Drop the existing indexes using below query and run the stats select queries.

DROP INDEX IX_Coffee_Vending_Machine_Data_MachineID_Date_Time ON Coffee_Vending_Machine_Data
DROP INDEX IX_Coffee_Vending_Machine_Data_ID ON Coffee_Vending_Machine_Data
TRUNCATE TABLE log_simulation_job_execution

Create the below indexes, enable all the jobs and disable it after 2 minutes. Once all the jobs are disabled run the stats select queries.

CREATE CLUSTERED INDEX IX_Coffee_Vending_Machine_Data_ID ON Coffee_Vending_Machine_Data (ID)
CREATE NONCLUSTERED INDEX IX_Coffee_Vending_Machine_Data_MachineID_Date_Time 
ON Coffee_Vending_Machine_Data (MachineID, Date_Time) 
INCLUDE (Sensor1_Value, Sensor2_Value, Sensor3_Value, Sensor4_Value)

Execution#4 with Natural Clustered Key

Drop the existing indexes using below query and run the stats select queries.

DROP INDEX IX_Coffee_Vending_Machine_Data_MachineID_Date_Time ON Coffee_Vending_Machine_Data
DROP INDEX IX_Coffee_Vending_Machine_Data_ID ON Coffee_Vending_Machine_Data
TRUNCATE TABLE log_simulation_job_execution

Create the below indexes, enable all the jobs and disable it after 2 minutes. Once all the jobs are disabled run the stats select queries.

CREATE CLUSTERED INDEX IX_Coffee_Vending_Machine_Data_MachineID_Date_Time 
ON Coffee_Vending_Machine_Data (MachineID, Date_Time)

Execution#5 without any Index

Drop the existing index using below query and run the stats select queries.

DROP INDEX IX_Coffee_Vending_Machine_Data_MachineID_Date_Time ON Coffee_Vending_Machine_Data
TRUNCATE TABLE log_simulation_job_execution

Enable all the jobs and disable it after 2 minutes. Once all the jobs are disabled run the stats select queries.

Findings

Execution#4 with Natural Clustered Key won with GOLD. It stood 1st in the Ranking in our experiment.
Execution#3 with Surrogate Clustered Key (with nonclustered covering index) was runner up with SILVER. It stood 2nd in the Ranking in our experiment. It took the additional storage space which in-turn is an overhead to Disk IO.
Execution#5 without any Index stood BRONZE. It stood 3rd in the Ranking in our experiment.
Execution#1 and Execution#2 both lost the game in our experiment. Both of them stood 4th in the Ranking in our experiment.

Conclusion

There is no panacea solution. It’s always advisable to evaluate every recommendation / best practice and its trade-offs holistically, before actually applying it.

Sometimes it’s wise decision not to have any index (including clustered index), than having additional overhead of unused / poorly performing index.

SQL Server Carpenter

Clustered Index Key – Natural or Surrogate?

Background

Primary Key and Clustered Index

Description of terminologies used

Clustered Index best practices

Why should we always have a Clustered Index?

What should be our Clustered Index?

What we’ll achieve through this experiment?

Workload considered for this test

Script of table schema

Script to load table with lot of data

Other prerequisites

Script of table to log the execution time

Script of Read Stored Procedure

Script of Write Stored Procedure

Script of Read Job 1

Script of Read Job 2

Script of Write Job 1

Script of Write Job 2

Stats script 1 – Index size / avg. row size

Stats script 2 – Index operational and usage stats

Stats script 3 – Virtual file stats

Stats script 4 – Avg. execution time

Server Configuration used for this test

Pre-test stats

Execution#1 with Surrogate Clustered Key (without nonclustered index)

Execution#2 with Surrogate Clustered Key (with nonclustered non covering index)

Execution#3 with Surrogate Clustered Key (with nonclustered covering index)

Execution#4 with Natural Clustered Key

Execution#5 without any Index

Findings

Conclusion

One thought on “Clustered Index Key – Natural or Surrogate?”

Leave a comment Cancel reply

Background

Primary Key and Clustered Index

Description of terminologies used

Clustered Index best practices

Why should we always have a Clustered Index?

What should be our Clustered Index?

What we’ll achieve through this experiment?

Workload considered for this test

Script of table schema

Script to load table with lot of data

Other prerequisites

Script of table to log the execution time

Script of Read Stored Procedure

Script of Write Stored Procedure

Script of Read Job 1

Script of Read Job 2

Script of Write Job 1

Script of Write Job 2

Stats script 1 – Index size / avg. row size

Stats script 2 – Index operational and usage stats

Stats script 3 – Virtual file stats

Stats script 4 – Avg. execution time

Server Configuration used for this test

Pre-test stats

Execution#1 with Surrogate Clustered Key (without nonclustered index)

Execution#2 with Surrogate Clustered Key (with nonclustered non covering index)

Execution#3 with Surrogate Clustered Key (with nonclustered covering index)

Execution#4 with Natural Clustered Key

Execution#5 without any Index

Findings

Conclusion

Share this:

Related

One thought on “Clustered Index Key – Natural or Surrogate?”

Leave a comment Cancel reply