When the data volume reaches the scale of millions or tens of millions, how can a leaderboard be implemented by Java?

Introduction

Over the years, many development teams stumbled when trying to implement leaderboard functionality. Today, I'd like to share several different leaderboard Java implementation strategies — ranging from simple to complex, from single-machine solutions to distributed architectures — in the hope of helping you make better decisions in real-world projects.

Some java developers might think, "Isn't it just a leaderboard? Just sort the data in the database!" But in reality, it's far more complicated than that. When the data volume reaches millions or even tens of millions, a simple database query can easily become the system's bottleneck.

In the following sections, I’ll walk you through several approaches in detail. Hopefully, this will be useful to you.

Direct Database Sorting

Use Case: Small data volume (less than 100,000 records), low real-time requirements

This is the simplest and most straightforward solution — it's usually the first method that comes to every developer’s mind.

Here’s a sample code snippet:

public List<UserScore> getRankingList() {
    String sql = "SELECT user_id, total_score FROM user_scores ORDER BY total_score DESC LIMIT 100";
    return jdbcTemplate.query(sql, new UserScoreRowMapper());
}

Advantages:

Simple to implement
Low code maintenance cost
Suitable for scenarios with small data volume

Disadvantages:

Performance degrades sharply with large data volumes
Each query requires a full table scan
High database pressure under heavy concurrency

Cache Plus Scheduled Task

Use Case: Medium data volume (hundreds of thousands of records), acceptable delay of a few minutes

This approach builds on the first solution by introducing a caching mechanism.

Example code:

@Scheduled(fixedRate = 10000) // Executes every 10 seconds
public void updateRankingCache() {
    List<UserScore> rankings = userScoreDao.getTop100Scores();
    redisTemplate.opsForValue().set("ranking_top_100_list", rankings);
}

public List<UserScore> getRankingList() {
    return (List<UserScore>) redisTemplate.opsForValue().get("ranking_top_100_list");
}

Advantages:

Reduces database load
Fast query speed (O(1))
Relatively simple to implement

Disadvantages:

Data is delayed (depends on task frequency)
Higher memory usage
Rankings are not updated in real time

The architecture diagram is as follows:

Redis Sorted Set

Use Case: Large data volume (millions), requiring real-time updates

Redis’s Sorted Set is a powerful tool for implementing leaderboards.

Example code:

public void addUserScore(String userId, double score) {
    redisTemplate.opsForZSet().add("top_ranking", userId, score);
}

public List<String> getTopUsers(int topN) {
    return redisTemplate.opsForZSet().reverseRange("top_ranking", 0, topN - 1);
}

public Long getUserRank(String userId) {
    return redisTemplate.opsForZSet().reverseRank("top_ranking", userId) + 1;
}

Advantages:

High performance (O(log(N)) time complexity)
Supports real-time updates
Naturally supports pagination
Can retrieve user ranking

Disadvantages:

Limited memory on a single Redis instance
Need to consider Redis persistence
Additional handling required in distributed environments

Sharding + Redis Cluster

Use Case: Ultra-large scale data (tens of millions or more), high concurrency scenarios

When a single Redis instance cannot meet the requirements, a sharding solution can be adopted.

Example code:

public void addUserScore(String userId, double score) {
    RScoredSortedSet<String> set = redisson.getScoredSortedSet("ranking:" + getShard(userId));
    set.add(score, userId);
}

private String getShard(String userId) {
    // Simple hash-based sharding
    int shard = Math.abs(userId.hashCode()) % 16;
    return "shard_" + shard;
}

Here we take Redisson client as an example.

Advantages:

Strong horizontal scalability
Supports ultra-large scale data
Stable performance under high concurrency

Disadvantages:

High architectural complexity
Cross-shard queries are difficult
Need to maintain sharding strategy

The architecture diagram is as follows:

Precomputation + Layered Caching

Use Case: Leaderboard updates infrequently, but with very high traffic

This approach combines pre-computation with multi-level caching.

Example code:

@Scheduled(cron = "0 0 * * * ?") // Runs once every hour
public void precomputeRanking() {
    Map<String, Integer> rankings = calculateRankings();
    redisTemplate.opsForHash().putAll("ranking:hourly", rankings);
    
    // Synchronize to local cache
    localCache.putAll(rankings);
}

public Integer getUserRank(String userId) {
    // 1. Check local cache first
    Integer rank = localCache.get(userId);
    if (rank != null) return rank;
    
    // 2. Then check Redis
    rank = (Integer) redisTemplate.opsForHash().get("ranking:hourly", userId);
    if (rank != null) {
        localCache.put(userId, rank); // Backfill local cache
        return rank;
    }
    
    // 3. Finally check the database
    return userScoreDao.getUserRank(userId);
}

Advantages:

Extremely high access performance (local cache O(1))
Reduces pressure on Redis
Suitable for read-heavy, write-light scenarios

Disadvantages:

Poor data real-time accuracy
High resource consumption for precomputation
High implementation complexity

The architecture diagram is as follows:

Real-Time Computation + Stream Processing

Use Case: Social platforms requiring real-time updates with massive data volume

This solution uses stream processing technologies to implement a real-time leaderboard. Below is an example using Apache Flink:

DataStream<UserAction> actions = env.addSource(new UserActionSource());

DataStream<Tuple2<String, Double>> scores = actions
    .keyBy(UserAction::getUserId)
    .process(new ProcessFunction<UserAction, Tuple2<String, Double>>() {
        private MapState<String, Double> userScores;

        public void open(Configuration parameters) {
            MapStateDescriptor<String, Double> descriptor =
                new MapStateDescriptor<>("userScores", String.class, Double.class);
            userScores = getRuntimeContext().getMapState(descriptor);
        }

        public void processElement(UserAction action, Context ctx, Collector<Tuple2<String, Double>> out) {
            double newScore = userScores.getOrDefault(action.getUserId(), 0.0) + calculateScore(action);
            userScores.put(action.getUserId(), newScore);
            out.collect(new Tuple2<>(action.getUserId(), newScore));
        }
    });

scores.keyBy(0)
      .process(new RankProcessFunction())
      .addSink(new RankingSink());

Advantages:

True real-time updates
Handles extremely high concurrency
Supports complex computation logic

Disadvantages:

High architectural complexity
High operations and maintenance cost
Requires a specialized engineering team

The architecture diagram is as follows:

Comparison and Selection of Solutions

Solution	Data Volume	Real-Time	Complexity	Suitable Scenarios
Database Sorting	Small	Low	Low	Personal projects, small-scale applications
Cache + Scheduled Tasks	Medium	Medium	Medium	Small to medium applications, delay is acceptable
Redis Sorted Set	Large	High	Medium	Large applications requiring real-time updates
Sharding + Redis Cluster	Extra Large	High	High	Ultra-large applications with extremely high traffic
Precomputation + Layered Cache	Large	Medium-High	High	Read-heavy, write-light scenarios with massive traffic
Real-Time Computation + Stream Processing	Extra Large	Real-Time	Very High	Social platforms requiring real-time ranking

Summary

When choosing a leaderboard implementation strategy, we need to consider the following key factors:

Data Volume: The size of the dataset directly influences the choice of solution.

Real-Time Requirements: Is second-level update necessary, or is minute- or even hour-level delay acceptable?

Concurrency: What is the expected level of system access or traffic?

Development Resources: Does the team have sufficient technical capability to maintain a complex solution?

Business Requirements: Is the leaderboard computation logic simple or complex?

For most small to medium-sized applications, Cache plus Scheduled Tasks) or Redis Sorted Set is usually sufficient.

If the business grows rapidly, it can gradually evolve to Sharding + Redis Cluster.

For real-time update scenarios, such as social media platforms, Pre-computation + Layered Cache or Real-Time Computation + Stream Processing should be considered—though they require solid technical preparation and robust architectural design.

Finally, regardless of which solution is chosen, it’s crucial to implement proper monitoring and performance testing.

As a high-frequency feature, leaderboard performance directly impacts user experience.

It is strongly recommended to conduct stress testing in real-world environments and adjust the solution based on the test results.

Search This Blog

Blog related to Java springboot development tips, tutorials, and real-world experience etc.

When the data volume reaches the scale of millions or tens of millions, how can a leaderboard be implemented by Java?

Comments

Post a Comment

Popular posts from this blog

Why Do Remote Java Transmission Objects Need to Be Serialized?

Usage of MD5 Encryption and Decryption Technology in Java Application Development

For storing mobile phone numbers of 3 billion global users, should the data type be int, string, varchar, or char? And why?