Best Pagination Strategies for Large Datasets: A Developer’s Decision Guide

Introduction

Pagination is one of those deceptively simple problems that becomes increasingly complex as your application scales. What starts as a straightforward LIMIT and OFFSET query can evolve into a performance bottleneck affecting millions of users. The choice of pagination strategy directly impacts database load, response times, user experience, and infrastructure costs. Yet many engineering teams default to offset-based pagination without fully understanding the alternatives or the long-term consequences of that choice.

This guide examines the major pagination strategies available to developers working with large datasets—from traditional offset pagination to cursor-based and keyset approaches. We'll explore the technical mechanics of each strategy, their performance characteristics at scale, and the specific scenarios where each excels. By understanding the trade-offs between consistency, performance, and implementation complexity, you'll be equipped to make informed architectural decisions that align with your application's requirements and growth trajectory.

Understanding the Pagination Problem

At its core, pagination solves the problem of returning large result sets in manageable chunks. Without pagination, a query returning millions of records would consume excessive memory, overwhelm network bandwidth, and render user interfaces unresponsive. The solution seems straightforward: break the results into fixed-size pages that can be fetched on demand. However, this simple concept introduces several technical challenges that become acute at scale.

The first challenge is maintaining consistent state across page requests. When users navigate from page 1 to page 2, they expect continuity—no duplicates, no missing records. But databases are dynamic: new records are inserted, existing ones are updated or deleted, and concurrent users are modifying data simultaneously. Your pagination strategy must account for this fluidity while providing a coherent browsing experience. The degree of consistency required varies by use case: displaying search results can tolerate some inconsistency, while paginating through financial transactions cannot.

The second challenge is performance degradation at depth. Fetching page 1 is typically fast, but what about page 1,000 or page 100,000? Some pagination strategies exhibit linear performance degradation, where accessing later pages becomes progressively slower. Others maintain constant performance regardless of depth. This distinction matters enormously when building features like infinite scroll, bulk data exports, or administrative interfaces that regularly access deep pages. Understanding how your database executes pagination queries under load is critical to avoiding production incidents.

Offset-Based Pagination: The Traditional Approach

Offset-based pagination is the most intuitive and widely implemented strategy. It works by specifying two parameters: a limit (page size) and an offset (number of records to skip). To fetch page 3 with 20 items per page, you query with LIMIT 20 OFFSET 40. This maps naturally to numbered page controls and is supported natively by virtually every relational database and ORM. The simplicity of implementation explains its ubiquity—developers can add pagination to any query with minimal effort.

However, offset-based pagination has a critical performance flaw that emerges with large offsets. When you execute LIMIT 20 OFFSET 100000, the database must scan through 100,000 rows before returning the 20 you requested. It reads, sorts, and then discards those 100,000 rows on every request. This creates O(n) complexity where n is the offset value. As users navigate deeper into result sets, query execution time grows linearly. On a table with millions of records, accessing page 10,000 might take seconds while page 1 returns in milliseconds.

The consistency characteristics of offset pagination also present challenges. Consider a user viewing page 5 of a sorted list. If 10 new records are inserted at the top of the list while they're viewing, clicking "next" will show them 10 records they've already seen on the previous page—duplicates. Conversely, if records are deleted, they'll skip over some records entirely. This "shifting window" problem is inherent to offset pagination because offsets are positional, not anchored to specific records.

Despite these limitations, offset pagination remains appropriate for many scenarios. When dealing with relatively small datasets (under 100,000 records), the performance penalty is negligible. For interfaces that display page numbers and support jumping to arbitrary pages (e.g., "Go to page 57"), offset pagination provides the required functionality. It's also the simplest strategy to implement with existing ORMs and frameworks, making it a reasonable default choice when you're iterating quickly and performance at scale isn't yet a concern.

// Typical offset-based pagination implementation
interface PaginationParams {
  page: number;
  pageSize: number;
}

async function getUsers(params: PaginationParams) {
  const { page, pageSize } = params;
  const offset = (page - 1) * pageSize;
  
  // This query becomes slower as offset increases
  const users = await db.query(
    'SELECT * FROM users ORDER BY created_at DESC LIMIT $1 OFFSET $2',
    [pageSize, offset]
  );
  
  // Total count requires a separate query
  const totalCount = await db.query('SELECT COUNT(*) FROM users');
  
  return {
    data: users,
    pagination: {
      page,
      pageSize,
      totalPages: Math.ceil(totalCount / pageSize),
      totalCount
    }
  };
}

Cursor-Based Pagination: Consistent Performance at Scale

Cursor-based pagination addresses offset pagination's performance problems by using opaque tokens to mark positions in result sets. Instead of specifying numeric offsets, clients provide a cursor—typically an encoded identifier from the last record of the previous page. The server decodes this cursor and uses it to construct a query that starts after that specific record. This eliminates the need to scan and discard rows, providing O(1) complexity regardless of how deep into the dataset you navigate.

The cursor itself is usually a base64-encoded string containing enough information to reconstruct the query position—commonly a primary key or combination of fields. For example, cursor eyJpZCI6MTAwMDAwfQ== might decode to {"id":100000}, allowing the server to query WHERE id > 100000 ORDER BY id LIMIT 20. Because the database can use an index to seek directly to that position, performance remains constant whether you're fetching the first page or the millionth. This makes cursor pagination ideal for infinite scroll interfaces, real-time feeds, and any scenario where users navigate sequentially through data.

Cursor pagination also provides better consistency guarantees than offset pagination. Since cursors reference specific records rather than positions, insertions and deletions above your current position don't affect the results. If you're viewing records 40-60 and new records are inserted at position 10, clicking "next" will still correctly show records 61-80 without duplicates or gaps. This stability is crucial for user experiences where data changes frequently, such as social media feeds or notification lists.

// Cursor-based pagination implementation
interface CursorPaginationParams {
  cursor?: string;
  pageSize: number;
}

interface CursorPaginationResult<T> {
  data: T[];
  nextCursor: string | null;
  hasMore: boolean;
}

async function getUsersCursor(
  params: CursorPaginationParams
): Promise<CursorPaginationResult<User>> {
  const { cursor, pageSize } = params;
  
  let query = 'SELECT * FROM users ORDER BY id DESC';
  let queryParams: any[] = [pageSize + 1]; // Fetch one extra to check if more exist
  
  if (cursor) {
    const decodedCursor = JSON.parse(
      Buffer.from(cursor, 'base64').toString('utf-8')
    );
    query += ' WHERE id < $2';
    queryParams.push(decodedCursor.id);
  }
  
  query += ' LIMIT $1';
  
  const users = await db.query(query, queryParams);
  const hasMore = users.length > pageSize;
  const data = hasMore ? users.slice(0, -1) : users;
  
  const nextCursor = hasMore
    ? Buffer.from(JSON.stringify({ id: data[data.length - 1].id })).toString('base64')
    : null;
  
  return { data, nextCursor, hasMore };
}

Keyset Pagination: The High-Performance Alternative

Keyset pagination, also known as "seek method" pagination, represents the optimal approach for sequential navigation through large ordered datasets. Like cursor pagination, it avoids offset scanning by using the values from the last record to determine where to continue. However, instead of opaque cursors, keyset pagination uses explicit column values—typically those in the ORDER BY clause—making the approach more transparent and cacheable. The query structure follows the pattern WHERE (sort_column, id) > (last_value, last_id) ORDER BY sort_column, id LIMIT n.

The performance characteristics of keyset pagination are exceptional. By leveraging composite indexes on the sort columns and tie-breaker (usually the primary key), databases can seek directly to the starting position using index scans. This provides consistent O(log n) complexity for finding the start position and O(page_size) for fetching records—essentially constant time from the application's perspective. A properly indexed keyset query executes in single-digit milliseconds whether you're fetching the first page or the ten-millionth, making it suitable for the largest datasets.

Keyset pagination excels in scenarios with clear, stable sort orders. Time-series data sorted by timestamp, user lists sorted by username, or any dataset with a natural ordering benefits immensely. The approach also provides excellent consistency: since you're always querying relative to specific values rather than positions, concurrent inserts and deletes don't create the duplicate/skip problems of offset pagination. This makes keyset pagination ideal for data synchronization tasks, audit log pagination, and ETL processes that need to reliably iterate through large datasets.

The primary limitation of keyset pagination is its inflexibility. Unlike offset pagination, you cannot jump to arbitrary pages—navigation is strictly sequential (next/previous). This rules out traditional page number interfaces ("Go to page 42"). Additionally, implementing keyset pagination with complex multi-column sorts requires careful attention to SQL construction and index design. The WHERE clause must properly handle the composite comparison, which varies depending on sort directions (ASC vs DESC) and can become intricate with multiple sort columns.

# Keyset pagination implementation in Python
from typing import Optional, List, Tuple
from dataclasses import dataclass

@dataclass
class KeysetParams:
    page_size: int
    last_created_at: Optional[str] = None
    last_id: Optional[int] = None

async def get_posts_keyset(params: KeysetParams) -> List[dict]:
    """
    Fetch posts using keyset pagination on (created_at DESC, id DESC)
    Requires composite index: CREATE INDEX idx_posts_keyset ON posts(created_at DESC, id DESC)
    """
    query = """
        SELECT id, title, created_at, author_id
        FROM posts
    """
    
    query_params = [params.page_size]
    
    if params.last_created_at and params.last_id:
        # Keyset condition for (created_at DESC, id DESC)
        # Needs to handle the composite comparison correctly
        query += """
            WHERE (created_at, id) < ($2, $3)
        """
        query_params.extend([params.last_created_at, params.last_id])
    
    query += """
        ORDER BY created_at DESC, id DESC
        LIMIT $1
    """
    
    results = await db.fetch(query, *query_params)
    
    return [
        {
            'id': row['id'],
            'title': row['title'],
            'created_at': row['created_at'],
            'author_id': row['author_id']
        }
        for row in results
    ]

Time-Based and Hybrid Approaches

Time-based pagination is a specialized form of keyset pagination optimized for temporal data streams. Instead of generic cursors or composite keys, it uses timestamps as the pagination anchor: WHERE created_at < $last_timestamp ORDER BY created_at DESC LIMIT n. This approach is ubiquitous in event logs, audit trails, messaging systems, and activity feeds where data has an inherent chronological ordering. The simplicity of using a single timestamp column makes implementation straightforward and database indexes highly effective.

However, pure time-based pagination has a significant edge case: records with identical timestamps. If 50 records share the same millisecond timestamp and your page size is 20, you risk missing records or creating duplicates. The solution is a hybrid approach combining timestamp and ID: WHERE (created_at, id) < ($last_timestamp, $last_id). This provides the intuitive semantics of time-based pagination while guaranteeing uniqueness through the ID tie-breaker. Many production systems use this hybrid pattern, especially for high-throughput event processing where timestamp collisions are common.

Choosing the Right Strategy: A Decision Framework

Selecting a pagination strategy requires evaluating multiple dimensions: dataset size, access patterns, consistency requirements, UI constraints, and implementation complexity. No single strategy is universally optimal—each represents trade-offs between these competing concerns. Start by characterizing your specific requirements across these dimensions before committing to an approach.

Dataset size is the most obvious factor. For tables under 10,000 rows, offset pagination's simplicity typically outweighs its performance drawbacks—the difference between 5ms and 8ms won't affect user experience. Between 10,000 and 1,000,000 rows, you enter a gray zone where offset pagination works for shallow pages but degrades noticeably at depth. Beyond 1,000,000 rows, offset pagination becomes problematic for any use case requiring deep pagination, making cursor or keyset approaches necessary for acceptable performance.

Access patterns determine which pagination features you actually need. If users primarily navigate sequentially (next/previous, infinite scroll), cursor or keyset pagination provides superior performance without sacrificing functionality. If users need random access ("jump to page 127"), offset pagination is required—though you might limit random access to a maximum page depth (e.g., first 100 pages) and use keyset for deep pagination. Analytic dashboards often need random access, while social feeds rarely do. Match the strategy to actual user behavior, not hypothetical requirements.

Consistency requirements vary by domain. Financial transactions require strict consistency—users must never skip or duplicate records during pagination. Social media feeds can tolerate inconsistency since content changes constantly anyway. E-commerce product catalogs fall somewhere in between. Cursor and keyset pagination provide better consistency than offset pagination, but achieving true consistency requires additional mechanisms like snapshot isolation or filtering by state timestamps. Understand your domain's tolerance for inconsistency before optimizing for it.

UI constraints often dictate strategy choices. Traditional page number interfaces (1, 2, 3, ... 10) require offset pagination or storing extensive metadata for keyset pagination. Infinite scroll maps naturally to cursor pagination. "Load more" buttons work with any strategy. API design matters too: GraphQL connections spec recommends cursor-based pagination, while REST APIs often use offset pagination for its simplicity. Consider whether your pagination strategy needs to support multiple UI patterns or external API consumers with specific expectations.

// Decision framework implemented as a strategy selector
type PaginationStrategy = 'offset' | 'cursor' | 'keyset';

interface DatasetCharacteristics {
  estimatedRows: number;
  insertionFrequency: 'low' | 'medium' | 'high';
  requiresRandomAccess: boolean;
  requiresStrictConsistency: boolean;
  hasStableSortOrder: boolean;
}

function selectPaginationStrategy(
  characteristics: DatasetCharacteristics
): PaginationStrategy {
  const { 
    estimatedRows, 
    requiresRandomAccess, 
    requiresStrictConsistency,
    hasStableSortOrder 
  } = characteristics;
  
  // Small datasets: simplicity wins
  if (estimatedRows < 10000 && !requiresStrictConsistency) {
    return 'offset';
  }
  
  // Random access requirement forces offset (or hybrid with limits)
  if (requiresRandomAccess) {
    return 'offset'; // Consider limiting depth
  }
  
  // Large datasets with stable sort order: keyset is optimal
  if (estimatedRows > 100000 && hasStableSortOrder) {
    return 'keyset';
  }
  
  // Default to cursor for flexibility and consistency
  return 'cursor';
}

Implementation Patterns and Best Practices

Regardless of which pagination strategy you choose, several implementation patterns improve robustness and performance. First, always include a tie-breaker in your sort order—typically the primary key. Without this, records with identical sort values can appear in non-deterministic order between requests, causing duplicates or skips. The pattern ORDER BY created_at DESC, id DESC ensures total ordering even when timestamps collide.

Second, fetch one extra record beyond your page size to determine whether more pages exist. Instead of LIMIT 20, use LIMIT 21 and check if 21 records were returned. This eliminates the need for expensive COUNT(*) queries just to determine if a "next" button should be displayed. For offset pagination where total counts are needed, consider caching the count and refreshing it periodically rather than computing it on every request. Accept that the total might be slightly stale—it usually doesn't matter.

Third, set reasonable limits on pagination parameters. Don't allow clients to request pageSize=1000000 or offset=9999999. Enforce maximum page sizes (typically 50-100) and maximum offset depths. For APIs, document these limits clearly. When limits are exceeded, return meaningful error messages guiding clients toward appropriate values. This prevents both accidental misuse and intentional abuse that could degrade database performance for all users.

// Pagination parameter validation middleware
interface PaginationConfig {
  defaultPageSize: number;
  maxPageSize: number;
  maxOffset?: number; // For offset pagination
}

function validatePaginationParams(
  params: any,
  config: PaginationConfig
): { page: number; pageSize: number } | Error {
  let pageSize = parseInt(params.pageSize) || config.defaultPageSize;
  let page = parseInt(params.page) || 1;
  
  if (pageSize > config.maxPageSize) {
    return new Error(
      `pageSize cannot exceed ${config.maxPageSize}. Consider using cursor-based pagination for large result sets.`
    );
  }
  
  if (pageSize < 1) {
    return new Error('pageSize must be at least 1');
  }
  
  if (page < 1) {
    return new Error('page must be at least 1');
  }
  
  if (config.maxOffset) {
    const offset = (page - 1) * pageSize;
    if (offset > config.maxOffset) {
      return new Error(
        `Offset ${offset} exceeds maximum allowed depth. Cannot access pages beyond ${Math.floor(config.maxOffset / pageSize)}.`
      );
    }
  }
  
  return { page, pageSize };
}

Common Pitfalls and How to Avoid Them

The most common pitfall is missing or inefficient database indexes. Offset pagination requires an index on the ORDER BY columns to avoid full table scans. Keyset pagination requires a composite index matching the sort order and keyset columns. Without proper indexes, even "fast" pagination strategies perform poorly. Use EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL) to verify that your pagination queries use index scans, not sequential scans. If you see "Seq Scan" in the query plan for large tables, you're missing an index.

Another frequent mistake is encoding too much information in cursors, making them brittle to schema changes. If your cursor contains {id, username, created_at, status, favoriteColor}, changing any of these fields breaks existing cursors in client caches or bookmarks. Keep cursors minimal—ideally just the fields absolutely required to reconstruct the query position. Version your cursor format ({version: 1, id: 12345}) so you can evolve the structure over time while maintaining backward compatibility.

Many implementations fail to handle edge cases properly. What happens when the record referenced by a cursor has been deleted? The query might return no results or throw an error. Better implementations detect this condition and fall back gracefully—perhaps returning the page that would have followed that record. Similarly, when paginating in reverse (previous page), the query structure must change to maintain correct ordering. The naive approach of reversing ORDER BY direction often produces incorrect results; you need to both reverse the ORDER BY and invert the comparison operator in the WHERE clause.

State management in cursor-based pagination can also cause issues. Clients must track the current cursor and include it in the next request. Single-page applications should store this in component state or URL query parameters. For URL storage, ensure cursors are URL-safe—base64url encoding handles this well. Be cautious with extremely long cursors in URLs; some proxies and browsers impose URL length limits. If cursors exceed a few hundred characters, consider storing them server-side and returning short opaque tokens instead.

Key Takeaways

Match strategy to scale: Use offset pagination for datasets under 10,000 rows, cursor/keyset for anything larger or with high-frequency updates. The performance difference becomes dramatic above 100,000 rows.
Always include a tie-breaker: Add the primary key as the final sort column in every pagination query to ensure deterministic ordering and prevent duplicates or missing records.
Index for your strategy: Offset pagination needs indexes on ORDER BY columns; keyset pagination requires composite indexes matching your keyset fields. Verify index usage with EXPLAIN plans before deploying.
Limit parameter ranges: Enforce maximum page sizes (50-100) and consider limiting offset depth to prevent abuse and maintain predictable performance under load.
Design for the UI pattern: Sequential navigation (infinite scroll, next/previous) maps to cursor/keyset pagination. Random access (page numbers) requires offset pagination or hybrid approaches with depth limits.

Conclusion

Pagination strategies exist on a spectrum from simple to optimal, and the right choice depends on your specific constraints. Offset pagination offers implementation simplicity at the cost of performance degradation—acceptable for small datasets and rapid prototyping, but problematic at scale. Cursor and keyset pagination provide consistent performance regardless of depth, making them essential for large datasets, but they require more careful implementation and forgo random access capabilities.

The decision isn't always permanent. Many successful systems start with offset pagination and migrate to cursor or keyset approaches as they scale. The key is understanding the trade-offs before you encounter performance problems in production. By matching your pagination strategy to your dataset size, access patterns, and consistency requirements, you ensure a responsive user experience while maintaining system performance as your data grows. Invest time in proper indexing, handle edge cases gracefully, and validate your approach with realistic load testing—your future self and your users will appreciate the foresight.

References

PostgreSQL Documentation: "LIMIT and OFFSET" - https://www.postgresql.org/docs/current/queries-limit.html
MySQL Documentation: "LIMIT Query Optimization" - https://dev.mysql.com/doc/refman/8.0/en/limit-optimization.html
GraphQL Cursor Connections Specification - https://relay.dev/graphql/connections.htm
Markus Winand: "SQL Performance Explained" - Chapter on pagination and keyset pagination (seek method)
Use The Index, Luke: "Paging Through Results" - https://use-the-index-luke.com/sql/partial-results/fetch-next-page
Slack Engineering Blog: "Scaling to Billions: How Slack Handles Pagination" (cursor-based approach for message history)
Twitter API Documentation: Cursor-based pagination implementation in REST APIs - https://developer.twitter.com/en/docs/twitter-api/pagination
Percona Database Performance Blog: "Efficient Pagination in MySQL" discussing keyset vs offset performance
Django REST Framework Documentation: Pagination styles and implementation patterns - https://www.django-rest-framework.org/api-guide/pagination/
Joe Celko: "SQL for Smarties" - Advanced SQL Programming (coverage of pagination techniques)