Large Data Volumes (LDV) in Salesforce

Large Data Volumes (LDV) in Salesforce is an imprecise, elastic term. These large data volumes(LDV) can lead to sluggish performance, including slower queries, slower search and list views, and slower sandbox refreshing.

Why is LDV important?

Under the hood

Salesforce Platform Structure

Metadata Table: describe custom objects and fields
Data Table: containing the physical records in your org
Virtual Database: A virtualization layer which combines the data and metadata to display the underlying records to the User
Platform takes the SOQL entered by the user which is converted to native SQL under the hood to perform necessary table joins and fetch the required information.

How does Search Work?

Record takes 15 minutes to be indexed after creation.
Salesforce searches the Index to find records.
Found rows are narrowed down using security predicates to form Result Set.
Once Result-Set reaches a certain size, additional records are discarded.
ResultSet is used to query the main database to fetch records.

Skinny Tables

What is a skinny table?

Condensed table combining a shortlist of standard and custom fields for faster DB record fetches.

Key Facts

Salesforce stores standard and custom field data in separate DB Tables
Skinny table combines standard and custom fields together
Data fetch speed is improved with more rows per fetch.
Do not contain soft-deleted records
Best for > 1M records
Immediately updated when master tables are updated
Usable on standard and custom objects
Created by Salesforce Support

Indexing Principles

What is an index?

Sorted column, or combination of columns which uniquely identify rows of data.
Index contains sorted column and pointers to the rows of data.

Standard vs Custom Index

Best Practices

How should we improve performance under Large Volumes?

Aim to use indexed fields in the WHERE clause of SOQL queries
Avoid using NULLS in queries as index cannot be used
Only use fields present in skinny table
Use query filters which can highlight < 10% of the data
Avoid using wildcards in queries, such as % as this prevents use of an index
Break complex query into simple singular queries to use indexes
Select ONLY required fields in SELECT statement

Ownership Skew

When you have more than 10,000 records for a single object owned by a single owner.

Why does this cause problems?

Share Table Calculations: When you move a user in the Role Hierarchy, sharing calculations need to take place on large volumes of data to grant and revoke access.
Moving users around the hierarchy, causes the sharing rules to be re-calculated for both the user in the hierarchy, and any users above this user in the role hierarchy.

How can we avoid this?

Data Migration: Work with client to divide records up across multiple real end-users
Integration User: Avoid making integration user the record owner
Leverage Lead and Case assignment rules
If unavoidable: assign records to a user is an isolated role at the top of the Role Hierarchy

Ownership Skew

When you have more than 10,000 records for a single object owned by a single owner.

Why does this cause problems?

Share Table Calculations: When you move a user in the Role Hierarchy, sharing calculations need to take place on large volumes of data to grant and revoke access.
Moving users around the hierarchy, causes the sharing rules to be re-calculated for both the user in the hierarchy, and any users above this user in the role hierarchy.

How can we avoid this?

Data Migration: Work with client to divide records up across multiple real end-users
Integration User: Avoid making integration user the record owner
Leverage Lead and Case assignment rules
If unavoidable: assign records to a user is an isolated role at the top of the Role Hierarchy

Sharing Considerations

Org Wide Defaults

What should I do