7.5 KiB
NIP-45
Event Counts
draft optional
Relays may support the verb COUNT, which provides a mechanism for obtaining event counts.
Motivation
Some queries a client may want to execute against connected relays are prohibitively expensive, for example, in order to retrieve follower counts for a given pubkey, a client must query all kind-3 events referring to a given pubkey only to count them. The result may be cached, either by a client or by a separate indexing server as an alternative, but both options erode the decentralization of the network by creating a second-layer protocol on top of Nostr.
Filters and return values
This NIP defines the verb COUNT, which accepts a query id and filters as specified in NIP 01 for the verb REQ. Multiple filters are OR'd together and aggregated into a single count result.
["COUNT", <query_id>, <filters JSON>...]
Counts are returned using a COUNT response in the form {"count": <integer>}. Relays may use probabilistic counts to reduce compute requirements.
In case a relay uses probabilistic counts, it MAY indicate it in the response with approximate key i.e. {"count": <integer>, "approximate": <true|false>}.
["COUNT", <query_id>, {"count": <integer>}]
Whenever the relay decides to refuse to fulfill the COUNT request, it MUST return a CLOSED message.
HyperLogLog
Relays may return an HyperLogLog value together with the count, hex-encoded.
["COUNT", <query_id>, {"count": <integer>, "hll": "<hex>"}]
This is so it enables merging results from multiple relays and yielding a reasonable estimate of reaction counts, comment counts and follower counts, while saving many millions of bytes of bandwidth for everybody.
Algorithm
This section describes the steps a relay should take in order to return HLL values to clients.
- Upon receiving a filter, if it is eligible (see below) for HyperLogLog, compute the deterministic
offsetfor that filter (see below); - Initialize 256 registers to
0for the HLL value; - For all the events that are to be counted according to the filter, do this:
- Read the byte at position
offsetof the eventpubkey, its value will be the register indexri; - Count the number of leading zero bits starting at position
offset+1of the eventpubkeyand add1; - Compare that with the value stored at register
ri, if the new number is bigger, store it.
- Read the byte at position
That is all that has to be done on the relay side, and therefore the only part needed for interoperability.
On the client side, these HLL values received from different relays can be merged (by simply going through all the registers in HLL values from each relay and picking the highest value for each register, regardless of the relay).
And finally the absolute count can be estimated by running some methods I don't dare to describe here in English, it's better to check some implementation source code (also, there can be different ways of performing the estimation, with different quirks applied on top of the raw registers).
Filter eligibility and offset computation
This NIP defines (for now) two filters eligible for HyperLogLog:
{"#e": ["<id>"], "kinds": [7]}, i.e. a filter forkind:7events with a single"e"tag, which means the client is interested in knowing how many people have reacted to the target event<id>. In this case theoffsetwill be given by reading the character at the position32of the hex<id>value as a base-16 number then adding8to it.{"#e": ["<id>"], "kinds": [6]}, the same as above, but forkind:6reposts.{"#p": ["<pubkey>"], "kinds": [3]}, i.e. a filter forkind:3events with a single"p"tag, which means the client is interested in knowing how many people "follow" the target<pubkey>. In this case theoffsetwill be given by reading the character at the position32of the hex<pubkey>value as a base-16 number then adding8to it.{"#E": ["<id>"], "kinds": [1111]}, i.e. a filter for the total number of comments any specific root event has received. In this case theoffsetwill be given by reading the character at the position32of the hex<id>value as a base-16 number then adding8to it.
Attack vectors
One could mine a pubkey with a certain number of zero bits in the exact place where the HLL algorithm described above would look for them in order to artificially make its reaction or follow "count more" than others. For this to work a different pubkey would have to be created for each different target (event id, followed profile etc). This approach is not very different than creating tons of new pubkeys and using them all to send likes or follow someone in order to inflate their number of followers. The solution is the same in both cases: clients should not fetch these reaction counts from open relays that accept everything, they should base their counts on relays that perform some form of filtering that makes it more likely that only real humans are able to publish there and not bots or artificially-generated pubkeys.
hll encoding
The value hll value must be the concatenation of the 256 registers, each being a uint8 value (i.e. a byte). Therefore hll will be a 512-character hex string.
Client-side usage
This algorithm also allows clients to combine HLL responses received from relays with HLL counts computed locally from raw events. It's recommended that clients keep track of HLL values locally and add to these on each message received from relays. For example:
- a client wants to keep track of the number of reactions an event Z has received over time;
- the client has decided it will read reactions from relays A, B and C (the NIP-65 "read" relays of Z's author);
- of these, only B and C support HLL responses, so the client fetches both and merges them locally;
- then the client fetches all reaction events from A then manually applies each event to the HLL from the previous step, using the same algorithm described above;
- finally, the client reads the estimate count from the HLL and displays that to the user;
- optionally the client may store that HLL value (together with some "last-read-date" for relay A) and repeat the process again later:
- this time it only needs to fetch the new reactions from A and add those to the HLL
- and redownload the HLL values from B and C and just reapply them to the local value.
This procedure allows the client to download much less data.
Examples
Count posts and reactions
["COUNT", <query_id>, {"kinds": [1, 7], "authors": [<pubkey>]}]
["COUNT", <query_id>, {"count": 5}]
Count posts approximately
["COUNT", <query_id>, {"kinds": [1]}]
["COUNT", <query_id>, {"count": 93412452, "approximate": true}]
Followers count with HyperLogLog
["COUNT", <subscription_id>, {"kinds": [3], "#p": [<pubkey>]}]
["COUNT", <subscription_id>, {"count": 16578, "hll": "0607070505060806050508060707070706090d080b0605090607070b07090606060b0705070709050807080805080407060906080707080507070805060509040a0b06060704060405070706080607050907070b08060808080b080607090a06060805060604070908050607060805050d05060906090809080807050e0705070507060907060606070708080b0807070708080706060609080705060604060409070a0808050a0506050b0810060a0908070709080b0a07050806060508060607080606080707050806080c0a0707070a080808050608080f070506070706070a0908090c080708080806090508060606090906060d07050708080405070708"}]
Relay refuses to count
["COUNT", <query_id>, {"kinds": [1059], "#p": [<pubkey>]}]
["CLOSED", <query_id>, "auth-required: cannot count other people's DMs"]