This article is the part of Does it matter ? series.
Storage Architecture in SQL Server
It refers to and includes following components.
- Primary Data File (.mdf) / Secondary Data File (.ndf)
- Log File (.ldf)
- File Groups
Pages : are lowest storage unit in SQL Server and each page is of size 8 KB.
Extents : Each extent is the collection of 8 Pages. Extents are of two types :
- Mixed Extent : Shared by multiple tables. Holds Index as well as Data pages.
- Uniform Extent : Belong to only one table. Can either hold Data Page or Index page.
1 MB is 128 pages in SQL Server terminology.
You cannot have a row with size more than 8 KB. Columns with variable length Data Types such as Varchar and NVarchar and LOB types such as Image, Text, NText, XML are given exception to 8 KB rule and are not considered in 8 KB. LOB types can hold up to 2 GB of data in a cell.
Columns & Rows : It’s well known to everyone so I’m not giving any brief about it.
Partitioning : can be very useful if you are dealing with high volume of data. Specially in Read Loads.
Consider a scenario of a table with 100 Billion Rows of past 10 years. Reading from single partition of 100 Billion Rows can be very costly. Suppose each year has approx. 10 Billion Rows. We can Partition the table by year using Date column to have 10 partitions. Now when we will query this table then the read will happen from the respective partitions instead of whole set of 100 Billion Rows.
Decision for column to be used for partitioning must be taken at the database design phase.
Data File : Stores the data in the form of Pages logically organized in extents.
Log File : Stores log of all the modification done using DDL or DML statements. This is the main organ of SQL Server that ensures data written is ACID compliant.
File Group : Holds Data Files (Primary & Secondary). Each database has one default File Group but there could be multiple File Groups. File Groups can be mapped to different drives.
Physical structure and heirarchy
A Database can have multiple File Group, a File Group can have multiple Data File, a Data File can have multiple Table, a table can have multiple Partitions, a partition can have multiple Rows and each Row can have multiple Columns. A Page can hold one or more rows.
Involve DBA during Database Design
We have discussed each component of Storage Architecture. Now you must be wondering why I’m discussing these topic which generally falls under DBA bucket.
I must stress that role of DBA is not limited to the Backup, Restore, User and Role Management. They must be involved on the day one of database design. Their feedbacks must be taken into the consideration by the Database Architect.
Decisions on number of File Groups, number of Data Files, which Data File will be mapped to which File Group and which Table / Index will be mapped to which Data File should be taken at the time of working out the database architecture.
H/W Infrastructure Capacity Planning
I’ve observed decision on Storage Architecture is majorly taken post Production, and that too when performance issues and struggles are faced.
Available server infrastructure resources, such as, number of CPU Cores, number of Disk Drives, Disk speed/rpm/IO etc. must be taken into consideration when you are having multiple File Groups and Data Files in Production.
I’ve also seen SIR (System Infrastructure Requirements) document, which talks about the H/W need is made simply because its a requirement by Infra team / Clients. It is majorly made based on hypothetical assumptions, without any rationale.
If Storage Architecture and Capacity Planning can be done during the design phase, then SIR can be made on valid assumptions, with much better authenticity.
In my earlier blog on Part 1 : Does database modelling matters ? I mentioned similarities between Cupboard and Database.
We never order to manufacture the cupboard without provisioning the infrastructure for it. I mean to say we first decide the place where the cupboard shall be placed. This helps us to decide what would be length, breadth, hight, weight and material of the cupboard.
If we manufacture cupboard without provisioning the infrastructure, then there is high probability that, the cupboard shall not fit, and unfortunately become useless. Or if we decide to live with it, then we shall accept the struggles to live with it.
Same concept applies to Databases. Provisioning the infrastructure and physical architecture is very much important, else we should be mentally prepared, for the struggles in the future.
Too much of cupboard discussion. Let us conclude else we may become carpenter. Just kidding !
Now you have sufficient clues to search the hidden treasure of greater insights of SQL Server Storage Architecture. Start exploring it.
Stay tuned for the next article on Does it matter ? series.