How sql server string concatenation done for large strings?

517    Asked by CameronOliver in AWS , Asked on Apr 30, 2021

I am trying to concatenates strings in T-SQL like 'This is a test ' + CHAR(13) + CHAR(10) + ' string' When the string is long (I do not know the limit), I get an error: Some part of your SQL statement is nested too deeply. Rewrite the query or break it up into smaller queries. How can I work around this? How is the SQL statement "nested"? Should I switch to 0x???? notation if the string length is exceeding some value?

This question was asked 10 years earlier on https://social.msdn.microsoft.com/Forums/sqlserver/en-US/e7244cf6-049e-4b8a-b230-226fdd96a694/large-string-concatenation?forum=transactsql without getting a satisfactory answer edit

  • The total string to insert (with the concatenations and ':s) is 369711 chars long.
  • There are 6301 instances of char(13) and the same of char(10).
  • There are 19050 instances of +, not all necessarily used as concatenation operator.
  • What is SQL Server string concatenation?
Answered by Ankit Chauhan

  • An operator in a string expression that concatenates two or more character or binary strings, columns, or a combination of strings and column names into one expression (a string operator). For example SELECT 'book'+'case'; returns bookcase . Transact-SQL Syntax Conventions.

  It might depend on what is meant by "too long":
  • Length: If you mean strings of many characters, then try converting some of them to a MAX type (i.e. NVARCHAR(MAX) or VARCHAR(MAX) ). For example:
    CONVERT(VARCHAR(MAX), 'This is a test ') + CHAR(13) + CHAR(10) + ' string'
    Concatenations: If you mean operations involving many concatenations, there is definitely a maximum number that can be done in a single statement. I tested this a while ago and was going to blog about it but got distracted  I just reviewed my testing and while there are a few different ways to concatenate, the simplest way (which is what you are doing) has a limit of around 3300 per statement. I say "around" 3300 because it varies based on some other factors related to memory. Sometimes I can get 3310, then I tried 3311 and it works, try again and it fails, try again with 3310 and it fails, try again with 3309 and it works.
    If you have over this amount then you have no choice but to split the operation into multiple steps (i.e. multiple SET / SELECT statements). Although, the error for reaching the max here is:
    Msg 8631, Level 17, State 1, Line XXXX
    Internal error: Server stack limit has been reached. Please look for potentially deep nesting in your query, and try to simplify it.
    And you aren't getting exactly that, so not entirely sure yet until you do more testing, or at least provide the error number. In either case, you can reduce the number of concatenations by combining the "
    " and "
    " into a single variable or VARBINARY literal. The following example shows that both methods produce the same output:
    PRINT 'This is a test ' + CHAR(13) + CHAR(10) + ' string'; PRINT '-----------------------------'; PRINT 'This is a test ' + CONVERT(CHAR(2), 0x0D0A) + ' string'; PRINT '-----------------------------'; DECLARE @CRLF CHAR(2) = CHAR(13) + CHAR(10); PRINT 'This is a test ' + @CRLF + ' string';
     The benefit of the variable is that it is shorter, which helps keep large scripts a little more readable and not as large. But, you can't highlight one or more random lines and execute, unless you also highlight the DECLARE but that might not always work. The benefit of the VARBINARY literal is that you can easily execute random groupings of statements, or move statements to other scripts / section of this script and not worry about forgetting the DECLARE. In either case, this might reduce a few hundred of your concatenations (assuming you already have around 3300 of them), but it doesn't prevent the error if you get a string value that requires hundreds more concatenations.
    It's also quite possible that there's a different and/or simpler approach to what you are ultimately trying to accomplish that doesn't rely on concatenations. So it would help if you could please update the question to specify why you are doing this in the first place There are 6301 instances of char(13) and the same of char(10).
    Given three concatenations per set of "CR" + "LF" (i.e. + CHAR(13) is one, + CHAR(10) is two, and + '...' is the third), that is 18,903 concatenations right there. Yep, that's a bit over 3300  I want the insert into statements to be oneliners.
    Ok. But, why? What do you actually gain from this? Embedded CRLF's are just fine for inserting.
    Also:
    What is the source of this data? Is it coming from SQL Server?
    Are the strings VARCHAR or NVARCHAR? In the example in the question, it is only VARCHAR, yet in a comment on the question, you give an example of converting a VARBINARY literal into NVARCHAR.
    In either case, doing either of the two approaches I noted above to reduce concatenations by one of the two pieces per CRLF won't help here as that will still leave you with 12,602 concatenations. So, you have two options:
    Use a replacement / escape sequence In a comment on the question, Martin Smith suggested using {crlf}. My preference would be one of the following:
    CHAR(31) : This is the non-printable "Unit Separator" character. It's only a single byte, quite unlikely to be in your string data, and won't get converted to something else if you accidentally change encondings. Unfortunately, it's not very readable as it won't even show up as a space, hence the words it separates will appear to have nothing between them.
    PRINT REPLACE('one' + CHAR(31) + 'two', CHAR(31), CHAR(13)+CHAR(10));
    '
    ' : This is the almost universal "newline" escape sequence. This is two bytes and definitely more readable.
    PRINT REPLACE('one
    two', '
    ', CHAR(13)+CHAR(10));
    In either case, you just INSERT INTO Schema.Table (Column) VALUES (REPLACE(....));
    Convert entire string to VARBINARY : this results in a very portable, single-line value. This isn't readable, but there won't be any unexpected character conversions either, and you can use a standard ANSI / code page 1252 file encoding, whereas you would need to use either UTF-8 (with signature / BOM) or Unicode / UTF-16 (with signature / BOM) for the file encoding if the string data is Unicode / NVARCHAR.
    You mentioned in a comment on the question something about having a byte array. If the source of the data is .NET, then the strings data is already UTF-16 LE (i.e. NVARCHAR) so simply doing INSERT INTO Schema.Table (Column) VALUES (0x{hex_from_byte[]}); will work just fine. If the destination is a VARCHAR column, then you can either:
    convert in .NET using something along the lines of: Encoding.GetEncoding(1252).GetBytes(_StringVariable)
    convert in T-SQL using: INSERT INTO Schema.Table (Column) VALUES (CONVERT(NVARCHAR(MAX), 0x{hex_from_byte[]}));. Please note that we are converting to NVARCHAR because the hex representation is UTF-16 LE. The return value from the CONVERT function will then implicitly convert to VARCHAR as it is inserted into the column.


Your Answer

Interviews

Parent Categories