What does the String class Salesforce and the comparison operation are doing internally?
I've found that certain string comparisons in Apex consume far more CPU time than would naturally be expected - specifically when at least one of the strings is very long ('long' here means over 100000 characters, but the effect may be significant for shorter strings).
A sample script which illustrates the issue is as follows:
Listacs = [select Id, Name, BillingStreet, BillingCity, BillingPostalCode, BillingState, BillingCountry, BillingLatitude, BillingLongitude from Account
where BillingLatitude != null
limit 1000];
final String COMP_STR = 'Test-Test';
final Integer COMP_STR_LEN = COMP_STR.length();
String str = '';
Integer mcount = 0;
CpuTimer t1 = new CpuTimer();
for (Account ac : acs) {
str += '"' + ac.BillingStreet + '","' +
ac.BillingCity + '","' +
ac.BillingPostalCode + '","' +
ac.BillingState + '","' +
ac.BillingCountry + '",' +
ac.BillingLatitude + ',' +
ac.BillingLongitude + 'n';
if (str == COMP_STR) {
++mcount;
}
}
t1.showCpuTime('CPU time: ');
public class CpuTimer {
Integer cpuTime;
public CpuTimer() {
cpuTime = Limits.getCpuTime();
}
public void showCpuTime(String txt) {
Integer newCpuTime = Limits.getCpuTime();
System.debug(txt + (newCpuTime - cpuTime));
cpuTime = newCpuTime;
}
}
Running the above script in a Developer Org using 'Execute Anonymous' showed results of 9098, 8833, 9189, 7224, 9335 and 6787 milliseconds, i.e. between 6.5 and 9.5 seconds with an average of 8.4 seconds. With a very minor change, checking the length of the string before making the comparison:
if (str.length() == COMP_STR_LEN && str == COMP_STR) {
The results are very different: 86, 63, 96, 106, 90 and 104 milliseconds, i.e. an average of 0.091 seconds.
Is the String comparison, indeed, the reason for the big difference in consumed CPU time and, if so, is the 'fix' of checking the length first the best way to mitigate it?
I have also found that the results seem to depend on how the long string value is built - using an equally long string built by a single statement (using the Apex repeat() method) results in much faster comparisons than a string built by many concatenations. Does this possibly give clues about what the String class and the comparison operation are doing internally?
Also, is there any truth in the theory I heard: that the String comparison is doing this to prevent 'timing attacks', i.e. to prevent an attacker from determining the length of a String value by sending various length strings and noting which of these gives a significantly better response time?
There's some insight here into why comparing the string class Salesforce lengths is probably NOT a 100% reliable way of improving these case-insensitive string comparisons. Try the following script:
String s1 = 'Strasse';
String s2 = 'Straße';
System.debug(s1.length());
System.debug(s2.length());
System.debug((s1 == s2));
Here the German 'ß' (Eszett) character is considered 'equal' to the pair of characters 'ss' so even though the lengths of the two strings are different the strings themselves are deemed 'equal' (or at least 'equalIgnoreCase').
@sfdcfox might this give a reason why the case insensitive string comparison can't simply compare the first X characters of the strings where X is the shorter of the two lengths?
Are there any languages other than German where this 'variable-length-equality' works?
Oddly, I find that strings "Straße" and "Strasse" are NOT equal according to Java's equalsIgnoreCase().