When To Use Set vs. List In SOQL? Which one is better?

854    Asked by ankur_3579 in Salesforce , Asked on Apr 24, 2021

I have always assumed that it is better to use a Set in my query filters instead of a List. For example: Set parentIds = generateParentIds(); List children = [SELECT Id FROM Child__c WHERE Parent__c IN :parentIds];

  • My reasoning was:

    Set can be more easily (and quickly) generated from a Map via keySet(). Set ought to consume less memory than List (no longer sure this is true). null values can more easily be removed from a Set. Set cannot contain any duplicate values.

    What are the actual benefits of using Set instead of List? What are the drawbacks? What factors help decide? For example, I know if the field is nullable, then the ability to remove null is much more important because it helps avoid a table scan.

    Answered by Aashna Saito

    Here we have set vs list comparison below:


    Set vs List: Filter Speed

              If you can use Map.keySet(), you'll get a Set back in about 1/10th of the time versus a loop over those same records and adding all the values. Of course, the Map(List) constructor only works on Id, not on related ID values, which means you'd have to fall back to a loop. However, in cases where you just want to use the ID of a list of records for the purpose of a query, you can just use the list directly:
    Account[] accounts = [SELECT Id FROM Account]; Contact[] contacts = [SELECT AccountId FROM Contact WHERE AccountId IN :accounts];
    This takes linear time that's far smaller than constructing a Map/Set or a List of IDs. Alternatively, you can often use a semi join sub-selects to the same effect:
    Contact[] contacts = [SELECT AccountId FROM Contact]; Account[] accounts = [SELECT Id FROM Account WHERE Id IN (SELECT AccountId FROM Contact WHERE Id IN :contacts)];

    Assuming the semi join is not prohibited (e.g. you cannot use it with an OR filter), this is the fastest method available. If not, then binding to list directly is faster, followed by a Map Set, followed by a plain Set or List. In other words, while generateParentIds() seems to be handy, it's actually slower than simply baking the logic directly into the query, if possible. If you have to choose a List or a Set, there's no clear benefit to either, as they'll take the exact same amount of time to filter records.

    Set vs List: Size

    Given a List and a Set that each contain the same, unique values, both collections will be the same size, byte for byte. Once you start having duplicate values, then the set becomes more useful. However, for the volume of values you'd expect from a typical collection, there's little meaningful difference between a list and a set. For example, in a trigger of 200 records, the maximum size of a list of ID values should be about 3,000 bytes. As you can see, a handful of ID values don't use much storage. You will not experience any benefits at all until you have non-unique values, and even then, the savings don't add up unless there's just a few values repeated many times. Set vs List: Storage Speed

    Here's where things get tricky. At very small sizes object sizes, and very small data sizes, a Set and a List are about equal in terms of speed, assuming you're using native objects. For example, storing values in to a Set of ID values is about the same speed as using a List, assuming there's not many values.

    Consider the following code:

            Integer counter = 0; Set values = new Set(); // Id[] values = new Id[0]; while(counter++

    Using this very simple loop:

            <code>  while(counter++<1000000 xss=removed></pre><pre><code>for(Case record:Trigger.new) { accountIds.add(record.AccountId); } accountIds.remove(null);</code>

    The clear winner is the Set. While it's harder to benchmark this sort of code, I do know that repeatedly checking a condition each iteration is linearly more expensive than having to check just once. The results can be pretty impressive on large lists (~100 ms or more in some extreme cases).

    Set vs List: tl;dr

    At normal volumes that you'd find within a trigger context and most types of page contexts, there's no clear benefit to using either a List or a Set in terms of memory or speed, with the sole exception that a Set can identify and remove bad values in near-constant time (far better than a List can do). So, if you need to avoid a specific value, use a Set, if you need a specific order or possibly duplicate values, use a List, and if neither apply, just use whatever you feel like.

    A query that uses IN, with the same value more than once, does not cause an error, so there's no point in worrying about uniqueness in normal filters. However, if you're truly concerned about query performance, consider using a semi join sub-select, which is far superior in terms of speed than either the List or the Set. Obviously, this doesn't work on records not committed in the database, so often you will need to fall back to using either type of collection. I'm just advocating that you not worry about which you use as long as it gets the correct results.

    Set vs List: Which One is Better

    If the requirement is to have only unique values then Set is your best bet as any implementation of Set maintains unique values only. If there is a need to maintain the insertion order irrespective of the duplicity then List is the best option.What are the actual benefits of using Set instead of List?Set guarantees there is no duplicate object in it. The list doesn't so you can have multiple entries of "equal" objects in a list. There are millions of things that you can use set and it will make your life much easier, for example, a set of countries, a set of usernames, etc.

    <br>

    Your Answer

    Interviews

    Parent Categories