[Article sharing] How Reddit do View Counting?

View Counting at Reddit

This is quite a good article talking about how Reddit do view counting.

 

 

The problem has these following requirement:

  • Counts must be real time or near-real time. No daily or hourly aggregates.
  • Each user must only be counted once within a short time window.
  • The displayed count must be within a few percentage points of the actual tally.
  • The system must be able to run at production scale and process events within a few seconds of their occurrence. (Remarks: Reddit is No.8 in global visit count)

Actually such problem is categorized as “cardinality estimation problem“. (https://en.wikipedia.org/wiki/Count-distinct_problem)

 

 

A naive implementation of this solution would be to store the unique user set as a hash table in memory, with the post ID as the key.
However, it is not practical because several popular posts have over one million unique viewers and the memory & CPU usage for such solution would be too costly.

Turn out the engineers in Reddit solve it by using a combination of two algorithms for different scaling level:
1) Linear probabilistic counting
2) HyperLogLog(HLL)-based counting

Both of such algorithms used some tricky magic so that they can use extremely little memory to do the counting. (e.g: count 1M IDs using just 12 KB space)
I would not go through the details of the magic box but if you are interested you can see this article: Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory .

 

Some people had done a demo of the different counting techniques by counting the number of distinct words in all of Shakespeare’s works using three different counting techniques.

And the result is below:

Counter Bytes Used Count Error
HashSet 10447016 67801 0%
Linear probabilistic counting 3384 67080 1%
HyperLogLog 512 70002 3%

 

 

Next time if you encountered unique item counting estimation problem, you may consider using the “Linear probabilistic counting” and “HyperLogLog” techniques to help you solve the problem.

再「愛國愛民」只會害了香港人

支聯會講咩「愛國愛民」,我是很反對的。
 
在當下處境,不太深思的市民很多時很容易將黨國概念混淆。
如果純粹個人獨立思考問題的話,你當然可以愛國不愛黨,愛之深責之必,你愛國愛民就沒有問題。
 
但若你的角色是一個政治組織,目標是推動民主運動﹑進程,對抗中共;那麼你就一定要考慮成個compaign﹑論述modeling問題。
你需要有一個理想願景以及最終論述model,而同時要有bridging的論述model。
 
政治論述其實並非只純粹作為特定處境下道德倫理model,而更是處境轉化過程中的持續塑造。
政治組織其實是持續地去塑造﹑牧養香港人。
 
而「愛國愛民」,其實會令更多人糾纏於大中華主義,較易淪於愛黨愛國的陷阱。
理論上你固然可以教曉所有人獨立思考,但其實這想法在落地執行上不太practical﹑太低modeling transform efficiency。
 
又反過來說,如果你說教人「愛國愛民」但都可以同時教人們獨立思考;
那其實破除「愛國愛民」﹑「中國人」的思想糾纏,以香港人為本位perspective去塑造香港人,其實一樣可以教人們獨立思想,why not??
 
利申:我係香港人

六四我一定唔會去維園

六四我一定唔會去維園。

那不代表我忘記了當年中共用坦克軍隊屠殺學生﹑平民的事。
不代表我不care。
我care中共﹑港共既獨裁政權問題,我care香港人前途問題,我care水深火熱中既香港。

每年一哭喪,點點燭光,so called 為左團結俾中共睇到平反六四既「決心」。
其實每年六四維園既一個「六四節」又可以有幾多bargaining power去令中共覺得你班人有d咩影響力?
你叫「決心」,中共就覺得你班人每年洩下氣維穩做得唔錯。

btw我challenge支聯會﹑維園六四節本身,但唔代表我要證明有d咩其他形式係better than u。
如果冇乜價值,可以唔做,可以再諗,唔一定自動justify左支聯會﹑維園六四節等等。

幾年前我o係左膠朋友facebook講依d,佢朋友(係支聯會既乜乜乜)就話我有意見點解唔加入支聯會提意見改善。
咁你班支聯會既人話要中共平反六四﹑結束一黨專政….咁點解唔加入共產黨提意見改善??

btw提提去維園既人,去既話「記得帶錢嚟」。(from李卓人既呼籲)

六四香港

年復年,香港人都是用記念﹑傳承六四事件本身的角度去註釋六四。
隨著香港政局越見黑暗,其實幾年前開始,已開始有人質疑,香港人對於六四是否應該有更切合當下處境的香港註釋?
李怡先生在上年六四晚會說,過往六四運動記念中,其實都是用上中國人的身份,而沒有用香港人的身份角度出發。

當年六四,香港人對中共不信任,對香港前途擔憂。
香港人曾寄望過香港及中國的民主前途,而隨六四屠城換來的卻是無情的扼殺。
其實今時今日也有類似之處,不少香港人其實都很關注及擔憂香港前途問題,不過迎面而來的是屢屢的虛話謊言﹑極權強硬打壓。

六四對香港人來說並不只是一場在北京天安門發生的屠城歷史事件,同時也是香港人在香港前途路上的歷史事件。
在年復年的記念的同時,香港人不應忘記我們仍在香港前途路上掙扎,為自由而戰。
在香港大環境的危急存亡春秋旦夕間,香港人當下應該將六四情感轉化,更多的從香港人身份角度去反思﹑投放精力於對香港人更逼切的香港前途問題上的自由之戰。

rest-in-contract – nodejs module for REST API Contract server

rest-in-contract

Project Page

Project Status

Currently, the project is in beta version (v0.x.x).

The basic Contract Server module is done to support basic usage of API Contract stubbing & testing. But some builtin feature is not done yet. (e.g: Suppoting more middleware functions in the contract script)

Since it is still beta version, we are not finalized the v1.0 in-the-box features yet.

Roadmaps

  • Add Unit tests
  • Update documents
  • Database Storage
  • Authentication
  • Support Plugins
  • Java/nodejs test integration client
  • Study on integration with Swagger

What is rest-in-contract

rest-in-contract is a product to let you embrace Consumer-driven contracts. It is REST in nature so that it fits for integrating with all kind of programming languages. For more detail about Project rest-in-contract, you may have a look in our Project rest-in-contract’s Homepage for detail introduction.

Slideshare: Basic Concepts & Flows

 


Samples

Hello world

Starting server:

Project rest-in-contract

Project rest-in-contract

Project Page

Related Projects

 

What is rest-in-contract

Consumer-driven contracts

rest-in-contract is a product to let you embrace Consumer-driven contracts. It is REST in nature so that it fits for integrating with all kind of programming languages.

Story for REST API providers/consumers in Consumer-driven contracts

For REST API providers:

REST API providers can write API contracts to describe their REST API request/response formats. They can then use the contracts to do contract testing against their API implementations.

For REST API consumers:

REST API consumers can use the API contracts to setup stubs for local testing or drafting of API contracts.

 

Slideshare: Basic Concepts & Flows

 

How does rest-in-contract different from other Consumer-driven contracts solution?

REST in nature, cross language, easy integration

There are many Consumer-driven contracts solution existing but many of them are SDK libraries or embedded solution for stubbing or testing which may fixed in a certain language. rest-in-contract is designed in a prespective that we do not want a language fix-in solution.

rest-in-contract is a node modeule to setup a lightweight agent server which API providers/consumers can kick to start in their environment easily. No matter doing contract testing(For provider) or API stubbing(For consumer), you can always do them by calling rest-constract agent server’s REST API. That’s why rest-in-contract is a cross language solution for Consumer-driven contracts.

Thanks for REST in nature, it is very easy to do integration with DevOps. No need maven or gradle build. You just need node.js(v7+) installed in your environment to kick the server. All later interactions are REST API call which you can call with curl or any other HTTP client tools.

Contract as file

Some Consumer-driven contracts solutions may let you wiring stub servers by SDK methods. Hence, the contract is writen as embedded code. Such way has difficulties for supporting different kind of programming languages.

Instead, we think that contract should be defined in a less coupling way that can be separated from your business logic codes.

The contract in rest-in-contract is described in JS script format which exporting an Contract object. We supporting some middleware function call in the contract. It also support regular expression, jsonpath etc.

An example of contract file would be like this:


 

Although the contract file is written in javascript in syntax, but you can just treat them as general files and stored in your projects. It is because that your code/application would never necessary to directly interact with the contracts. You can always pass them to the Contract Server to let it do its job.

 

What are the possible architecture configurations of rest-in-contract?

Architecture Components

Contract Server is a server instance which supporting Contract testing and stubbing by REST API. It is typically storing and reading contracts in local storage.

Contract Agent Server is actually a Contract Server. The different is that it read contracts from remote contract repository instead of local storage.

Architecture Configurations

We imagined that that there may be two architecture configurations of using rest-in-contract.

  1. Centralized Contract Server architecture

In this mode, there would be a centralized Contract Server which serving all API provider and consumers. It is the centralized storage server for persisting all contracts in database.

API providers & consumers would setup their own Contract Agent Server in their own environment which get the contracts remotely from centralized Contract Server through REST API. Then they would use their local Contract Agent Server to do contract testing or stubbing.

  1. Decentralized Contract Server architecture

In this mode, API providers would keep contracts in their own way. For examples, if their application is on Github, they may put the contracts under a folder in their source codes. API providers can kick Contract Server in local environment to do contract testing.

On the other side, API consumers can checkout the project from Github in order to get the API contracts. Then API consumers can kick Contract Server in local environment to do stubbing.

 


 

Story walkthrough

Beginning of the Story: John(API Provider) wrote an application Foo which provide REST API.

John wrote an application “Foo” which the base application URL is “http://example.com/foo“. It has a version path “/v1.0”. And the API endpoint url is “/hello”.

The API has such format:

Request:

Response:

The request body has an attribute name with a string value.

 

Next Story: Mary(API Consumer) is writing an application bar which want to consume API from John’s API.

Mary want John to enhance his API to include a new integer attribute “age” in request and output it in response like this:

Request

Response:

Hence, Mary and John have a discussion and drafted an API contract like this:


 

This single contract would be used by both John and Mary. For John, he would use this contract to do contract testing against its implementation. For Mary, she would use this contract to generate stub for local testing.

To support both use case of contract testing and stubbing, they need to define value(stub(...), test(...)) in the contract. The meaning of value(stub(...), test(...)) in contract is that, the certain values which would be used for generating stub and contract testing are different.

Why different values for stubbing and testing?

For Mary, she wants stub. The stub would accept any “name” attribute which is in [a-zA-Z ]* pattern. It means that Mary can send a request to the stub with “name” set as “Susan” or “Sam” or any other valid names…… And the response should correctly showing the same name in the request. The “name” attribute should support a flexible value so that Mary can test more dynamically instead of a always hard coded value. Hence, it is represented by a regular expression pattern regex("[a-zA-Z ]*").

For John, he wants API test. The API test just need to define a test value which is used for testing. (Actually he can use regular expression pattern as well, but here is just for demo) Hence, a hardcoded test value “John” is used. And the response value is also hardcoded test value.

 

Next Story: John use the contract to generate testing endpoint to test against the API contract

Firstly, John start a local Contract Server with port 8000.

And then he create(register) an App in Contract Server by this REST API call:

Assume the app ID is “80a69a44-3f3b-48c1-a7d1-b34b89117e75”.

For this app, John create(register) an App version “2.0” in Contract Server by this REST API call:

And then John create(register) the API contract in Contract Server by this REST API call:

(Assume the contract ID is “94923fbd-9092-4a46-ad65-0d8a2e2f551e”)

And then John can update “foo” ‘s API version 1 to include this API contract. He can do it by calling this REST API:

And then John can use Contract Server’s REST API to trigger contract testing against app “Foo” in his local environment (Port 8001). Once triggered, Contract Server would build mock requests according to the API contract and then send to the target API endpoint. The testing result and the whole Request/Response context information would be returned.

The Contract Test REST API would be like this:

Request:

The Response may look like this:

Next Story: Mary use the contract to kick start stub server

Thie time, Mary start a local Contract Server with port 8000. And then she create the App(John’s app “Foo”), Version and Contracts just like the last story. Again, it is done by REST API calls.

But at the last step, thie time she would not call the API contract testing operation ( /api/v1/apps/{{appId}}/wiretests ). Instead, she want to wire a stub server. Hence, she would call the API wire stub operation ( /api/v1/apps/{{appId}}/wirestubs ).

After that, a stub server which responses according to the API contracts is started at local port 8001.

And then, Mary can test the hello API with the stub server. She can send a request like this:

And she would get this response:

After Mary’s testings, Mary can shutdown the stub server by this API:

Or, she can shutdown the whole local Contract Server instead.

 


Credit to Spring-Cloud-Contract & WireMock

We have to give credit to Spring-Cloud-Contract & Wiremock because this project is inspired by them. We like Spring-Cloud-Contract’s design and usage of value(stub(...), test(...)) so we bring it in rest-in-contract. We also appreciate Wiremock which showing us an well-made product for case study/feature analysis to help us make our own new product.