Why is it recommended to store SQL server BLOBs in separate tables?

566 Asked by bruce_8968 in Salesforce , Asked on Jul 19, 2021

This highly-upvoted SO answer recommends to put images in separate tables, even if there is only a 1:1 relationship with another table:If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee photo in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee photo, too, as part of your queries. Why? I was under the impression that SQL Server only stores a pointer to some dedicated BLOB data structure in the table, so why bother to manually create another layer of indirection? Does it really improve performance significantly? If yes, why should store SQL server BLOBs in separate tables?

Answered by Bhaanumati shukla

Reason why should store SQL server BLOBs in separate tables:

They allow for BLOBs to be a part of the data model and participate in Transactions while not wasting space in the buffer pool (i.e. memory). The BLOB data is still included in backups, which does make them take up more space and take longer to backup and to restore. How large are these images, and how many do you expect to have? While I mostly agree with @sp_BlitzErik, I think there are some scenarios where it is ok to do this, and so it would help to have a clearer picture of what is actually being requested here. Some options to consider that alleviate most of the negative aspects pointed out by Erik are:

FILESTREAM (starting in SQL Server 2008)
FileTables (starting in SQL Server 2012)

Both of these options are designed to be a middle-ground between storing BLOBs either fully in SQL Server or fully outside (except for a string colun to retain the path). They allow for BLOBs to be a part of the data model and participate in Transactions while not wasting space in the buffer pool (i.e. memory). The BLOB data is still included in backups, which does make them take up more space and take longer to backup and to restore. However, I have a hard time seeing this as a true negative given that if it is part of the app then it needs to be backed up somehow, and having only a string column containing the path is completely disconnected and allows for BLOBs files to get deleted with no indication of that in the DB (i.e. invalid pointers / missing files). It also allows for files to be "deleted" within the DB but still exist on the file system which will need to eventually be cleaned up (i.e. headache). But, if the files are HUGE, then maybe it is best to leave entirely outside of SQL Server except for the path column. That helps with the "inside or outside" question, but does not touch on the single table vs multiple table question. I can say that, beyond this specific question, there are certainly valid cases for splitting tables into groups of columns based on usage patterns. Often when one has 50 or more columns there are some that are accessed frequently and some that are not. Some columns are written to frequently while some are mostly read. Separating frequently access vs infrequently accessed columns into multiple tables having a 1:1 relationship is quite often beneficial because why waste the space in the Buffer Pool for data you probably aren't using (similar to why storing large images in regular VARBINARY(MAX) columns is a problem)? You also increase the performance of the frequently access columns by reducing the row size and hence fitting more rows onto a data page, making reads (both physical and logical) more efficient. Of course, you also introduce some inefficiency by needing to duplicate the PK, and now sometimes you need to join the two tables, which also complicates (even if only slightly) some queries.

So, there are several approaches you could take, and what is best depends on your environment and what you are trying to accomplish.
I was under the impression that SQL Server only stores a pointer to some dedicated BLOB data structure in the table
Not so simple. You can find some good info here, What is the Size of the LOB Pointer for (MAX) Types Like Varchar, Varbinary, Etc?, but the basics are:
TEXT, NTEXT, and IMAGE datatypes (by default): 16 byte pointer
VARCHAR(MAX), NVARCHAR(MAX), VARBINARY(MAX) (by default): If the data can fit in the row, then it will be placed there

If the data is less than approx. 40,000 bytes (the linked blog post shows 40,000 as the upper limit but my testing showed a slightly higher value) AND if there is room on the row for this structure, then there will be between 1 and 5 direct links to LOB pages, starting at 24 bytes for the first link to the first 8000 bytes, and going up by 12 bytes per each additional link for each additional set of 8000 bytes, up to 72 bytes max.If the data is over approx. 40,000 bytes OR there is not enough room to store the appropriate number of direct links (e.g. only 40 bytes left on the row and a 20,000 byte value needs 3 links which is 24 bytes for the first plus 12 for the two additional links for 48 bytes total required in-row space), then there will just be a 24 byte pointer to a text tree page which contains the links to the LOB pages).

Why is it recommended to store SQL server BLOBs in separate tables?

Your Answer