diff --git a/45.md b/45.md index 970409dc..794dbe81 100644 --- a/45.md +++ b/45.md @@ -56,14 +56,38 @@ On the client side, these HLL values received from different relays can be merge And finally the absolute count can be estimated by running some methods I don't dare to describe here in English, it's better to check some implementation source code (also, there can be different ways of performing the estimation, with different quirks applied on top of the raw registers). -### Filter eligibility and `offset` computation +### `offset` computation -This NIP defines (for now) two filters eligible for HyperLogLog: +The `offset` (used in the HLL computation above) is derived deterministically from the filter sent by the client to the relay. The formula for obtaining the `offset` value is as follows: -- `{"#e": [""], "kinds": [7]}`, i.e. a filter for `kind:7` events with a single `"e"` tag, which means the client is interested in knowing how many people have reacted to the target event ``. In this case the `offset` will be given by reading the character at the position `32` of the hex `` value as a base-16 number then adding `8` to it. -- `{"#e": [""], "kinds": [6]}`, the same as above, but for `kind:6` reposts. -- `{"#p": [""], "kinds": [3]}`, i.e. a filter for `kind:3` events with a single `"p"` tag, which means the client is interested in knowing how many people "follow" the target ``. In this case the `offset` will be given by reading the character at the position `32` of the hex `` value as a base-16 number then adding `8` to it. -- `{"#E": [""], "kinds": [1111]}`, i.e. a filter for the total number of comments any specific root event has received. In this case the `offset` will be given by reading the character at the position `32` of the hex `` value as a base-16 number then adding `8` to it. + 1. Take the first tag attribute in the filter (with the `#` prefix); + 2. From that, take its first item (it will be a string); + 3. Obtain a 32-byte hex string from it: + - if the string is an event id or pubkey hex, use it as it is; + - if the string is an address (`::`), use the `` part; + - if the string is anything else, hash it with a `sha256()` and take the result as a hex string; + 4. From the 64-character hex string obtained before, take the character at position `32`; + 5. Read that character as a base-16 number; + 6. Add 8 to it: the result is the `offset`. + +For cases not covered above (filters without a tag attribute, for example), behavior isn't yet defined. This NIP may be modified later to specify those, but for now there isn't a use case that justifies using HLL in those circumstances. + +### Rationale + +The value of `offset` must be deterministic because that's the only way to allow relays to cache the HLL values so they don't have to count thousands of events from the database on every query. It also allows relays to precompute HLL values for any given target `` or `` without having to store the events themselves directly, which can be handy in case of reactions, for example. + +### Common filters + +Some relays may decide to cache or precompute HLL values for some common canonical queries, and also to refrain from counting events that do not match these specs. These are such queries (this NIP can be modified later if more common useful queries are discovered and start being used): + +- **reaction count**: `{"#e": [""], "kinds": [7]}` +- **repost count**: `{"#e": [""], "kinds": [6]}` +- **quote count**: `{"#q": [""], "kinds": [1, 1111]}` +- **reply count**: `{"#e": [""], "kinds": [1]}` +- **comment count**: `{"#E": [""], "kinds": [1111]}` +- **follower count**: `{"#p": [""], "kinds": [3]}` + +Notice that these queries only include 1 tag attribute with always a single item in it, which means that implementors don't have to check the order in which these attributes show up in the filter. ### Attack vectors