How to resolve "String or binary data would be truncated" error copying nvarchar(10) to char(10)?

500    Asked by BenPHILLIPS in SQL Server , Asked on Apr 16, 2021

I am inserting values from one SQL table/column into another. The datatypes are different for these columns due to other reasons but I do not understand why nvarchar(10) source and char(10) destination results in an error sometimes in SQL Server 2014: String or binary data would be truncated. len(sourcecol) = 10 and data length(sourcecol) = 20. Could it be because of some invisible spaces/characters stored in the source column which is of type nvarchar? How solve the”String or binary data would be truncated" error?

Answered by Anusha Acharya

To fix this error, patch to SQL Server 2016 SP2, CU6 or newer (including SQL Server 2017), and then turn on trace flag 460. You can enable it at the query level or at the server level. First, let's see the error happen: let's create a table with small fields, and then try to insert more data than it holds.


Without having an example of the data as well as the table DDL from the O.P., it is difficult to say for certain what the exact cause of this error is for the O.P. However, this behavior (and hence problem) can happen for others for the following reason: Some Code Pages (which are determined by the Collation of each CHAR / VARCHAR field) allow for double-byte characters in order to map more than the 256 characters that can fit into a single byte. Since the max length of a string field is really a matter of bytes instead of characters, a CHAR / VARCHAR field with a max length of 10 can only hold 10 bytes, which can hold anywhere from 5 - 10 characters. With the source field being NVARCHAR(10), those 10 characters could map to characters that require more than 1 byte in the VARCHAR field. All it takes to get this error is 9 "regular" single-byte characters and 1 double-byte character which would require 11 actual bytes.

So, check the Collation of the CHAR(10) field in the destination table. Once you know the name of the Collation for that field, that will indicate what Code Page is being used, and if it is either 932, 936, 949, or 950, then this is quite likely the source of the problem. And in this case, just alter the destination field to have a max length of 20 (to be safe in case all 10 characters are double-byte). I would recommend VARCHAR(20), but if you really, really like blank padding, then do CHAR(20) . A slightly more detailed explanation follows:

It is commonly understood that VARCHAR (i.e. 8-bit Extended ASCII) data is single-byte and NVARCHAR (i.e. Unicode) data is double-byte. This understanding of string data (and of encodings) is always true when working with the US-English alphabet (and with quite a few others, at least most "active" languages). And while it is true more often than not when working with most languages, it is certainly not always true. The ways in which this understanding is not true for Unicode data is beyond the scope of this Question, and will not be detailed here.

But when it comes to our old friend Mr. VARCHAR, there are some situations that would allow for characters to take up 2 bytes instead of one. Yes, you read that correctly. But how? Well, prior to the existence of Unicode, some cultures with more than 255 characters in their alphabet still wanted to work with their native alphabet. Since that is not possible within a single byte (with a range of 256 values), they came up with Double-Byte Character Sets (DBCS). These are handled as different Code Pages, and Windows and SQL Server support four of them:

  • 932 = Japanese (Shift-JIS within Microsoft, Windows-31J outside of Microsoft)
  • 936 = Chinese Simplified (GB2312)
  • 949 = Korean
  • 950 = Chinese Traditional (Big5)

Don't let the term "Double-Byte" confuse you with the implication that they are just like NVARCHAR which only works in 16-bit sequences. These Code Pages are actually variable-length encodings, similar to UTF-8, and will use a single byte (8 bits) for at least the first 128 values (0 - 127) and some amount of the (128 - 255 range).

How does any of this fit into the truncate error? Well, the max length of both NVARCHAR and VARCHAR datatypes is really expressed in terms of bytes, not characters. Meaning, VARCHAR(10) is 10 bytes max, even if less than 10 characters fit into that 10 bytes. Likewise, NVARCHAR(10) is 20 bytes max, even if less than 10 characters fit into that 20 bytes.

With that in mind, we know that converting from Unicode to a Code Page will attempt to map to the same characters. In most Code Pages, those characters are all 1 byte in size. But in the 4 DBCS Code Pages, there are quite a few characters that exist (and hence can be mapped to) and are 2 bytes (else they wouldn't exist). The problem here is that a DBCS Code Page is being used (for the destination), and at least one of the characters being mapped is taking up 2 bytes within the VARCHAR type. The following is a working example of this behavior.

Test Setup

First, run this code to set up the test. The default Collation of the database in which you run this test does not matter. Here we are creating a temporary table to hold each of the first 65,536 Code Points of the main Unicode character set (characters beyond the initial 65,536 require two Code Points and are hence 4 bytes each, but again, out of scope here since they would not change the behavior related to the current issue 

  SET NOCOUNT ON; IF (OBJECT_ID(N'tempdb..#Source') IS NOT NULL) BEGIN DROP TABLE #Source; END; CREATE TABLE #Source ([Character] NVARCHAR(1)); ;WITH nums (num) AS ( SELECT TOP (65536) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM [master].[sys].[columns] sc1 CROSS JOIN [master].[sys].[columns] sc2 ) INSERT INTO #Source ([Character]) SELECT NCHAR(num - 1) FROM nums; SELECT * FROM #Source;

Test 1: Latin1 character set (Code Page 1252)

Running the following will produce no errors. It just works. But it only translates the first 256 values as that is all that can fit into any Single-Byte Character Set (SBCS).

  IF (OBJECT_ID(N'tempdb..#Destination_CP1252') IS NOT NULL) BEGIN DROP TABLE #Destination_CP1252; END; CREATE TABLE #Destination_CP1252 ([Character] VARCHAR(1) COLLATE Latin1_General_100_CI_AS); INSERT INTO #Destination_CP1252 ([Character]) SELECT [Character] FROM #Source; SELECT [Character], LEN([Character]) AS [NumberOfCharacters], DATALENGTH([Character]) AS [NumberOfBytes], CONVERT(VARBINARY(2), [Character]) AS [BinaryValue] FROM #Destination_CP1252;

Test 2: Japanese (Shift-JIS) character set (Code Page 932)

IF (OBJECT_ID(N'tempdb..#Destination_CP932') IS NOT NULL) BEGIN DROP TABLE #Destination_CP932; END; CREATE TABLE #Destination_CP932 ([Character] VARCHAR(1) COLLATE Japanese_Unicode_CI_AS); INSERT INTO #Destination_CP932 ([Character]) SELECT [Character] FROM #Source;
Running the code shown above will produce the following error:
Msg 8152, Level 16, State 2, Line 1
String or binary data would be truncated.
The statement has been terminated.
If this is an issue of a Double-Byte Character Set (DBCS) conversion, then increasing the field size by 1 byte should fix it. First, let's just make sure that nothing was actually inserted into the table (so that we will know for certain that the next INSERT statement is what put the data in):
SELECT * FROM #Destination_CP932; -- no rows
Great. Now run the following:
ALTER TABLE #Destination_CP932 ALTER COLUMN [Character] VARCHAR(2) COLLATE Japanese_Unicode_CI_AS; INSERT INTO #Destination_CP932 ([Character]) SELECT [Character] FROM #Source;
No errors. Woo hoo! Let's see which characters converted:
SELECT [Character], LEN([Character]) AS [NumberOfCharacters], DATALENGTH([Character]) AS [NumberOfBytes], CONVERT(VARBINARY(2), [Character]) AS [BinaryValue] FROM #Destination_CP932 WHERE 1 = 1 --AND DATALENGTH([Character]) > 1 --AND [Character] <> '?'
If you scroll through the results, you should pay attention to the [NumberOfBytes] and [BinaryValue] fields.
To see just the double-byte values, uncomment the following line and re-run:
AND DATALENGTH([Character]) > 1
To see the full range of values in Code Page 932, re-comment-out the DATALENGTH where condition and uncomment the following line and re-run:
AND [Character] <> '?'
My count shows 9483 characters. And to be fair, add 1 to account for the actual? The character that was excluded, and we get a grand total of 9484 characters represented in a VARCHAR field.





Your Answer

Interviews

Parent Categories