[Article sharing] How Reddit do View Counting?

View Counting at Reddit

This is quite a good article talking about how Reddit do view counting.



The problem has these following requirement:

  • Counts must be real time or near-real time. No daily or hourly aggregates.
  • Each user must only be counted once within a short time window.
  • The displayed count must be within a few percentage points of the actual tally.
  • The system must be able to run at production scale and process events within a few seconds of their occurrence. (Remarks: Reddit is No.8 in global visit count)

Actually such problem is categorized as “cardinality estimation problem“. (https://en.wikipedia.org/wiki/Count-distinct_problem)



A naive implementation of this solution would be to store the unique user set as a hash table in memory, with the post ID as the key.
However, it is not practical because several popular posts have over one million unique viewers and the memory & CPU usage for such solution would be too costly.

Turn out the engineers in Reddit solve it by using a combination of two algorithms for different scaling level:
1) Linear probabilistic counting
2) HyperLogLog(HLL)-based counting

Both of such algorithms used some tricky magic so that they can use extremely little memory to do the counting. (e.g: count 1M IDs using just 12 KB space)
I would not go through the details of the magic box but if you are interested you can see this article: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory .


Some people had done a demo of the different counting techniques by counting the number of distinct words in all of Shakespeare’s works using three different counting techniques.

And the result is below:

Counter Bytes Used Count Error
HashSet 10447016 67801 0%
Linear probabilistic counting 3384 67080 1%
HyperLogLog 512 70002 3%



Next time if you encountered unique item counting estimation problem, you may consider using the “Linear probabilistic counting” and “HyperLogLog” techniques to help you solve the problem.


理論上你固然可以教曉所有人獨立思考,但其實這想法在落地執行上不太practical﹑太低modeling transform efficiency。
那其實破除「愛國愛民」﹑「中國人」的思想糾纏,以香港人為本位perspective去塑造香港人,其實一樣可以教人們獨立思想,why not??




每年一哭喪,點點燭光,so called 為左團結俾中共睇到平反六四既「決心」。
其實每年六四維園既一個「六四節」又可以有幾多bargaining power去令中共覺得你班人有d咩影響力?

btw我challenge支聯會﹑維園六四節本身,但唔代表我要證明有d咩其他形式係better than u。







rest-in-contract – nodejs module for REST API Contract server


Project Page

Project Status

Currently, the project is in beta version (v0.x.x).

The basic Contract Server module is done to support basic usage of API Contract stubbing & testing. But some builtin feature is not done yet. (e.g: Suppoting more middleware functions in the contract script)

Since it is still beta version, we are not finalized the v1.0 in-the-box features yet.


  • Add Unit tests
  • Update documents
  • Database Storage
  • Authentication
  • Support Plugins
  • Java/nodejs test integration client
  • Study on integration with Swagger

What is rest-in-contract

rest-in-contract is a product to let you embrace Consumer-driven contracts. It is REST in nature so that it fits for integrating with all kind of programming languages. For more detail about Project rest-in-contract, you may have a look in our Project rest-in-contract’s Homepage for detail introduction.

Slideshare: Basic Concepts & Flows



Hello world

Starting server:

Project rest-in-contract

Project rest-in-contract

Project Page

Related Projects


What is rest-in-contract

Consumer-driven contracts

rest-in-contract is a product to let you embrace Consumer-driven contracts. It is REST in nature so that it fits for integrating with all kind of programming languages.

Story for REST API providers/consumers in Consumer-driven contracts

For REST API providers:

REST API providers can write API contracts to describe their REST API request/response formats. They can then use the contracts to do contract testing against their API implementations.

For REST API consumers:

REST API consumers can use the API contracts to setup stubs for local testing or drafting of API contracts.


Slideshare: Basic Concepts & Flows


How does rest-in-contract different from other Consumer-driven contracts solution?

REST in nature, cross language, easy integration

There are many Consumer-driven contracts solution existing but many of them are SDK libraries or embedded solution for stubbing or testing which may fixed in a certain language. rest-in-contract is designed in a prespective that we do not want a language fix-in solution.

rest-in-contract is a node modeule to setup a lightweight agent server which API providers/consumers can kick to start in their environment easily. No matter doing contract testing(For provider) or API stubbing(For consumer), you can always do them by calling rest-constract agent server’s REST API. That’s why rest-in-contract is a cross language solution for Consumer-driven contracts.

Thanks for REST in nature, it is very easy to do integration with DevOps. No need maven or gradle build. You just need node.js(v7+) installed in your environment to kick the server. All later interactions are REST API call which you can call with curl or any other HTTP client tools.

Contract as file

Some Consumer-driven contracts solutions may let you wiring stub servers by SDK methods. Hence, the contract is writen as embedded code. Such way has difficulties for supporting different kind of programming languages.

Instead, we think that contract should be defined in a less coupling way that can be separated from your business logic codes.

The contract in rest-in-contract is described in JS script format which exporting an Contract object. We supporting some middleware function call in the contract. It also support regular expression, jsonpath etc.

An example of contract file would be like this:


Although the contract file is written in javascript in syntax, but you can just treat them as general files and stored in your projects. It is because that your code/application would never necessary to directly interact with the contracts. You can always pass them to the Contract Server to let it do its job.


What are the possible architecture configurations of rest-in-contract?

Architecture Components

Contract Server is a server instance which supporting Contract testing and stubbing by REST API. It is typically storing and reading contracts in local storage.

Contract Agent Server is actually a Contract Server. The different is that it read contracts from remote contract repository instead of local storage.

Architecture Configurations

We imagined that that there may be two architecture configurations of using rest-in-contract.

  1. Centralized Contract Server architecture

In this mode, there would be a centralized Contract Server which serving all API provider and consumers. It is the centralized storage server for persisting all contracts in database.

API providers & consumers would setup their own Contract Agent Server in their own environment which get the contracts remotely from centralized Contract Server through REST API. Then they would use their local Contract Agent Server to do contract testing or stubbing.

  1. Decentralized Contract Server architecture

In this mode, API providers would keep contracts in their own way. For examples, if their application is on Github, they may put the contracts under a folder in their source codes. API providers can kick Contract Server in local environment to do contract testing.

On the other side, API consumers can checkout the project from Github in order to get the API contracts. Then API consumers can kick Contract Server in local environment to do stubbing.



Story walkthrough

Beginning of the Story: John(API Provider) wrote an application Foo which provide REST API.

John wrote an application “Foo” which the base application URL is “http://example.com/foo“. It has a version path “/v1.0”. And the API endpoint url is “/hello”.

The API has such format:



The request body has an attribute name with a string value.


Next Story: Mary(API Consumer) is writing an application bar which want to consume API from John’s API.

Mary want John to enhance his API to include a new integer attribute “age” in request and output it in response like this:



Hence, Mary and John have a discussion and drafted an API contract like this:


This single contract would be used by both John and Mary. For John, he would use this contract to do contract testing against its implementation. For Mary, she would use this contract to generate stub for local testing.

To support both use case of contract testing and stubbing, they need to define value(stub(...), test(...)) in the contract. The meaning of value(stub(...), test(...)) in contract is that, the certain values which would be used for generating stub and contract testing are different.

Why different values for stubbing and testing?

For Mary, she wants stub. The stub would accept any “name” attribute which is in [a-zA-Z ]* pattern. It means that Mary can send a request to the stub with “name” set as “Susan” or “Sam” or any other valid names…… And the response should correctly showing the same name in the request. The “name” attribute should support a flexible value so that Mary can test more dynamically instead of a always hard coded value. Hence, it is represented by a regular expression pattern regex("[a-zA-Z ]*").

For John, he wants API test. The API test just need to define a test value which is used for testing. (Actually he can use regular expression pattern as well, but here is just for demo) Hence, a hardcoded test value “John” is used. And the response value is also hardcoded test value.


Next Story: John use the contract to generate testing endpoint to test against the API contract

Firstly, John start a local Contract Server with port 8000.

And then he create(register) an App in Contract Server by this REST API call:

Assume the app ID is “80a69a44-3f3b-48c1-a7d1-b34b89117e75”.

For this app, John create(register) an App version “2.0” in Contract Server by this REST API call:

And then John create(register) the API contract in Contract Server by this REST API call:

(Assume the contract ID is “94923fbd-9092-4a46-ad65-0d8a2e2f551e”)

And then John can update “foo” ‘s API version 1 to include this API contract. He can do it by calling this REST API:

And then John can use Contract Server’s REST API to trigger contract testing against app “Foo” in his local environment (Port 8001). Once triggered, Contract Server would build mock requests according to the API contract and then send to the target API endpoint. The testing result and the whole Request/Response context information would be returned.

The Contract Test REST API would be like this:


The Response may look like this:

Next Story: Mary use the contract to kick start stub server

Thie time, Mary start a local Contract Server with port 8000. And then she create the App(John’s app “Foo”), Version and Contracts just like the last story. Again, it is done by REST API calls.

But at the last step, thie time she would not call the API contract testing operation ( /api/v1/apps/{{appId}}/wiretests ). Instead, she want to wire a stub server. Hence, she would call the API wire stub operation ( /api/v1/apps/{{appId}}/wirestubs ).

After that, a stub server which responses according to the API contracts is started at local port 8001.

And then, Mary can test the hello API with the stub server. She can send a request like this:

And she would get this response:

After Mary’s testings, Mary can shutdown the stub server by this API:

Or, she can shutdown the whole local Contract Server instead.


Credit to Spring-Cloud-Contract & WireMock

We have to give credit to Spring-Cloud-Contract & Wiremock because this project is inspired by them. We like Spring-Cloud-Contract’s design and usage of value(stub(...), test(...)) so we bring it in rest-in-contract. We also appreciate Wiremock which showing us an well-made product for case study/feature analysis to help us make our own new product.