mirror of
https://github.com/nostr-protocol/nips.git
synced 2026-02-05 16:04:32 +00:00
Compare commits
1 Commits
hyperloglo
...
nip85-read
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
2a1c38bfec |
108
45.md
108
45.md
@@ -29,119 +29,29 @@ In case a relay uses probabilistic counts, it MAY indicate it in the response wi
|
||||
|
||||
Whenever the relay decides to refuse to fulfill the `COUNT` request, it MUST return a `CLOSED` message.
|
||||
|
||||
## HyperLogLog
|
||||
|
||||
Relays may return an HyperLogLog value together with the count, hex-encoded.
|
||||
|
||||
```
|
||||
["COUNT", <subscription_id>, {"count": <integer>, "hll": "<hex>"}]
|
||||
```
|
||||
|
||||
This is so it enables merging results from multiple relays and yielding a reasonable estimate of reaction counts, comment counts and follower counts, while saving many millions of bytes of bandwidth for everybody.
|
||||
|
||||
### Algorithm
|
||||
|
||||
This section describes the steps a relay should take in order to return HLL values to clients.
|
||||
|
||||
1. Upon receiving a filter, if it is eligible (see below) for HyperLogLog, compute the deterministic `offset` for that filter (see below);
|
||||
2. Initialize 256 registers to `0` for the HLL value;
|
||||
3. For all the events that are to be counted according to the filter, do this:
|
||||
1. Read the byte at position `offset` of the event `pubkey`, its value will be the register index `ri`;
|
||||
2. Count the number of leading zero bits starting at position `offset+1` of the event `pubkey` and add `1`;
|
||||
3. Compare that with the value stored at register `ri`, if the new number is bigger, store it.
|
||||
|
||||
That is all that has to be done on the relay side, and therefore the only part needed for interoperability.
|
||||
|
||||
On the client side, these HLL values received from different relays can be merged (by simply going through all the registers in HLL values from each relay and picking the highest value for each register, regardless of the relay).
|
||||
|
||||
And finally the absolute count can be estimated by running some methods I don't dare to describe here in English, it's better to check some implementation source code (also, there can be different ways of performing the estimation, with different quirks applied on top of the raw registers).
|
||||
|
||||
### `offset` computation
|
||||
|
||||
The `offset` (used in the HLL computation above) is derived deterministically from the filter sent by the client to the relay. The formula for obtaining the `offset` value is as follows:
|
||||
|
||||
1. Take the first tag attribute in the filter (with the `#` prefix);
|
||||
2. From that, take its first item (it will be a string);
|
||||
3. Obtain a 32-byte hex string from it:
|
||||
- if the string is an event id or pubkey hex, use it as it is;
|
||||
- if the string is an address (`<kind>:<pubkey>:<d-tag>`), use the `<pubkey>` part;
|
||||
- if the string is anything else, hash it with a `sha256()` and take the result as a hex string;
|
||||
4. From the 64-character hex string obtained before, take the character at position `32`;
|
||||
5. Read that character as a base-16 number;
|
||||
6. Add 8 to it: the result is the `offset`.
|
||||
|
||||
For cases not covered above (filters without a tag attribute, for example), behavior isn't yet defined. This NIP may be modified later to specify those, but for now there isn't a use case that justifies using HLL in those circumstances.
|
||||
|
||||
### Rationale
|
||||
|
||||
The value of `offset` must be deterministic because that's the only way to allow relays to cache the HLL values so they don't have to count thousands of events from the database on every query. It also allows relays to precompute HLL values for any given target `<id>` or `<pubkey>` without having to store the events themselves directly, which can be handy in case of reactions, for example.
|
||||
|
||||
### Common filters
|
||||
|
||||
Some relays may decide to cache or precompute HLL values for some common canonical queries, and also to refrain from counting events that do not match these specs. These are such queries (this NIP can be modified later if more common useful queries are discovered and start being used):
|
||||
|
||||
- **reaction count**: `{"#e": ["<id>"], "kinds": [7]}`
|
||||
- **repost count**: `{"#e": ["<id>"], "kinds": [6]}`
|
||||
- **quote count**: `{"#q": ["<id>"], "kinds": [1, 1111]}`
|
||||
- **reply count**: `{"#e": ["<id>"], "kinds": [1]}`
|
||||
- **comment count**: `{"#E": ["<id>"], "kinds": [1111]}`
|
||||
- **follower count**: `{"#p": ["<pubkey>"], "kinds": [3]}`
|
||||
|
||||
Notice that these queries only include 1 tag attribute with always a single item in it, which means that implementors don't have to check the order in which these attributes show up in the filter.
|
||||
|
||||
### Attack vectors
|
||||
|
||||
One could mine a pubkey with a certain number of zero bits in the exact place where the HLL algorithm described above would look for them in order to artificially make its reaction or follow "count more" than others. For this to work a different pubkey would have to be created for each different target (event id, followed profile etc). This approach is not very different than creating tons of new pubkeys and using them all to send likes or follow someone in order to inflate their number of followers. The solution is the same in both cases: clients should not fetch these reaction counts from open relays that accept everything, they should base their counts on relays that perform some form of filtering that makes it more likely that only real humans are able to publish there and not bots or artificially-generated pubkeys.
|
||||
|
||||
### `hll` encoding
|
||||
|
||||
The value `hll` value must be the concatenation of the 256 registers, each being a uint8 value (i.e. a byte). Therefore `hll` will be a 512-character hex string.
|
||||
|
||||
### Client-side usage
|
||||
|
||||
This algorithm also allows clients to combine HLL responses received from relays with HLL counts computed locally from raw events. It's recommended that clients keep track of HLL values locally and add to these on each message received from relays. For example:
|
||||
|
||||
- a client wants to keep track of the number of reactions an event Z has received over time;
|
||||
- the client has decided it will read reactions from relays A, B and C (the NIP-65 "read" relays of Z's author);
|
||||
- of these, only B and C support HLL responses, so the client fetches both and merges them locally;
|
||||
- then the client fetches all reaction events from A then manually applies each event to the HLL from the previous step, using the same algorithm described above;
|
||||
- finally, the client reads the estimate count from the HLL and displays that to the user;
|
||||
- optionally the client may store that HLL value (together with some "last-read-date" for relay A) and repeat the process again later:
|
||||
- this time it only needs to fetch the new reactions from A and add those to the HLL
|
||||
- and redownload the HLL values from B and C and just reapply them to the local value.
|
||||
|
||||
This procedure allows the client to download much less data.
|
||||
|
||||
## Examples
|
||||
|
||||
### Count notes and reactions
|
||||
### Followers count
|
||||
|
||||
```
|
||||
["COUNT", <query_id>, {"kinds": [3], "#p": [<pubkey>]}]
|
||||
["COUNT", <query_id>, {"count": 238}]
|
||||
```
|
||||
|
||||
### Count posts and reactions
|
||||
|
||||
```
|
||||
["COUNT", <query_id>, {"kinds": [1, 7], "authors": [<pubkey>]}]
|
||||
["COUNT", <query_id>, {"count": 5}]
|
||||
```
|
||||
|
||||
### Count notes approximately
|
||||
### Count posts approximately
|
||||
|
||||
```
|
||||
["COUNT", <query_id>, {"kinds": [1]}]
|
||||
["COUNT", <query_id>, {"count": 93412452, "approximate": true}]
|
||||
```
|
||||
|
||||
### Followers count with HyperLogLog
|
||||
|
||||
```
|
||||
["COUNT", <subscription_id>, {"kinds": [3], "#p": [<pubkey>]}]
|
||||
["COUNT", <subscription_id>, {"count": 16578, "hll": "0607070505060806050508060707070706090d080b0605090607070b07090606060b0705070709050807080805080407060906080707080507070805060509040a0b06060704060405070706080607050907070b08060808080b080607090a06060805060604070908050607060805050d05060906090809080807050e0705070507060907060606070708080b0807070708080706060609080705060604060409070a0808050a0506050b0810060a0908070709080b0a07050806060508060607080606080707050806080c0a0707070a080808050608080f070506070706070a0908090c080708080806090508060606090906060d07050708080405070708"}]
|
||||
```
|
||||
|
||||
### Reaction counts with HyperLogLog
|
||||
|
||||
```
|
||||
["COUNT", <subscription_id>, {"kinds": [7], "#e": [<id>]}]
|
||||
["COUNT", <subscription_id>, {"count": 2044, "hll": "01ef070505060806050508060707070706090d080b0605090607070b07090606060b0705070709050807080805080407060906080707080507070805060509040a0b06060704060405070706080607050907070b08060808080b080607090a06060805060604070908050607060805050d05060906090809080807050e0705070507060907060606070708080b0807070708080706060609080705060604060409070a0808050a0506050b0810060a0908070709080b0a07050806060508060607080606080707050806080c0a0707070a080808050608080f070506070706070a0908090c080708080806090508060606090906060d07050708080405070708"}]
|
||||
```
|
||||
|
||||
### Relay refuses to count
|
||||
|
||||
```
|
||||
|
||||
16
73.md
16
73.md
@@ -18,7 +18,6 @@ There are certain established global content identifiers such as [Book ISBNs](ht
|
||||
| URLs | "`<URL, normalized, no fragment>`" | "web" |
|
||||
| Books | "isbn:`<id, without hyphens>`" | "isbn" |
|
||||
| Geohashes | "geo:`<geohash, lowercase>`" | "geo" |
|
||||
| Countries | "iso3166:`<code, uppercase>`" | "iso3166" |
|
||||
| Movies | "isan:`<id, without version part>`" | "isan" |
|
||||
| Papers | "doi:`<id, lowercase>`" | "doi" |
|
||||
| Hashtags | "#`<topic, lowercase>`" | "#" |
|
||||
@@ -44,21 +43,6 @@ For the webpage "https://myblog.example.com/post/2012-03-27/hello-world" the "i"
|
||||
]
|
||||
```
|
||||
|
||||
### Geohashes:
|
||||
|
||||
- Geohash: `["i", "geo:ezs42e44yx96"]` - https://www.movable-type.co.uk/scripts/geohash.html
|
||||
|
||||
Geohashes are a geocoding system that encodes geographic locations into short strings of letters and digits. They MUST be lowercase.
|
||||
|
||||
### Countries:
|
||||
|
||||
ISO 3166 codes can reference countries (ISO 3166-1 alpha-2) or subdivisions like states/provinces (ISO 3166-2).
|
||||
|
||||
- Country (Venezuela): `["i", "iso3166:VE"]`
|
||||
- Subdivision (California, USA): `["i", "iso3166:US-CA"]`
|
||||
|
||||
ISO 3166 codes MUST be uppercase. More info: https://en.wikipedia.org/wiki/ISO_3166
|
||||
|
||||
### Books:
|
||||
|
||||
- Book ISBN: `["i", "isbn:9780765382030"]` - https://isbnsearch.org/isbn/9780765382030
|
||||
|
||||
Reference in New Issue
Block a user