Unicode non-breaking space is not considered white space?

728    Asked by BenjaminMoore in Salesforce , Asked on Jul 3, 2021

Can anyone confirm that Unicode u00A0 non-breaking space is not considered "whitespace" by Apex and is not detected by trim(), deleteWhitespace, or regex? I'm surprised by regex since I thought s was supposed to include non-breaking spaces. Of the methods below, only replaceAll with the character code works.

String x = 'u00A0' + 'Test'; String y = x.unescapeUnicode(); system.debug('### y trim length: ' + y.trim().length()); system.debug('### y deleteWhitespace length: ' + y.deleteWhitespace().length()); system.debug('### y replaceall regex length: ' + y.replaceAll('\s', '').length()); system.debug('### y replaceall unicode length: ' + y.replaceAll('\u00A0', '').length());


Answered by Carl Paige

  The non-breaking space is not whitespace, according to Java. Apex Code uses the same rules as the Java Pattern class to solve u00a0, which specifies s as follows:

s A whitespace character: [ 	

]
Where " " is 0x20, is 0x09,
is 0x0A, is 0x0B, is 0x0C, and
is 0x0D. No other characters are defined as whitespace, despite Unicode having a number of them.

Your Answer

Interviews

Parent Categories