Request for Feedback on Governance: list_neurons pagination API

Context

The IC Protocol limits message sizes to 2MB, including responses.

Some users need to retrieve over a thousand neurons (typically empty) for various reporting purposes. They are currently unable to list them with this API because of these message size limits, as the responses would be over 2MB.

While there are workarounds, it was determined that adding paging to this API would allow it to scale better into the future, and prevent wasted computation that results in undeliverable messages.

Current Behavior

If a user sends parameter include_neurons_readable_by_caller set to true, the NNS Governance Canister will attempt to send all neurons back to the caller.

In cases where the message size limit is exceeded, the user cannot see any entries on the list, as the response will not be able to be returned through the protocol.

New Behavior

To make it easier for users to retrieve all of their neurons, we propose to serve neurons as a sequence of pages . Each page is a structure with a bounded number of neurons, specifically, at most 500 neurons would be served at a time.

The new API fields will be:

// Canid Request payload

ListNeurons {
  ... existing fields ...

  page_number: opt nat64;
  page_size: opt nat64;
}

// Candid Response payload

ListNeuronsResponse {
  ... existing fields ...

  total_pages_available : opt nat64;
}

The new field in the response payload will tell users if more requests are needed.

Most users, who only need active neurons (i.e. neurons with stake or maturity), will get all of the data with one request.

In cases where historical data is required, it will now be accessible, whereas before it was not.

This design should be friendly to users with cold-storage wallets (i.e. air-gapped keys), as they could sign all of the paged requests at the same time, and could send them concurrently as well.

Impact to Existing Clients / Timeline

Currently, fewer than 100 principals have more than 500 neurons, and of those, none have more than 200 active neurons.

Due to the behavior change mentioned in ListNeurons API Change - Empty Neurons, which will omit inactive neurons from the response by default, adding paging to this API should not impact users who need to retrieve their neurons, and will positively impact users who need to retrieve a large number of inactive neurons.

We believe, therefore, that there will be no negative impact of this change, and therefore, it can be rolled out quickly.

We plan to roll out this change in the NNS Governance canister as early as January 27, 2025.

Feedback Requested

Please tell us your thoughts.

If you believe there will be a problem with this plan, please comment below. All feedback is welcome.

Action Required

For users who anticipate needing to get more than 500 neurons from list_neurons, they will need to make their client aware of the paging API.

This means reading the total_pages_available field, and if its value is greater than 1, making the additional requests needed by reusing the request and adding a 0-indexed page_number (i.e. first page is page_number 0, second page is page_number 1, etc).

5 Likes

How does this impact existing clients if the new fields are not used? What will be the default behaviour if clients do not use the new fields?

Where does this max number come from? Can we use a higher upper bound number here?

Will the default behaviour for ListNeurons to try and send 2mb worth of neurons? Or will it be capped at 500 neurons and clients forced to paginate (If so I would consider this a breaking change)?

2 Likes

Again, I’d like to push for list_neurons to support searching by subaccounts. As I did here:

This would also be beneficial for clients that have many neurons and they can do their own pagination by grabbing nonces, computing the subaccount and popping them into the neuron_ids_or_subaccounts array.

In this new pagination design, lets say I have over 500 neurons and I need to list certain neurons from different pages I would need to perform multiple async calls to get that information and list all my neurons. If I could search by subaccount I can compute those myself and search for specifically what I want in one call.

These things will become more relevant as more and more canisters start to offer staking to many thousands of users. Just using the current neuron numbers are not a good justification in my opinion.

3 Likes

FYI, this would be a breaking change for one of my canisters that relies on ListNeurons potentially returning much more than 500 neurons in one call - I would need to restructure some things.

Although I’m not against this feature. At this stage I would be voting no on this when it gets proposed (in a few days?) unless some clarity can be provided on how this would affect existing clients or if searching by subaccount can be added alongside this.

1 Like

How many neurons are you requesting in a single call? Can you explain your use case?

It’s a round number based on maximum neuron sizes. It’s actually possible to have less than 500 neurons and hit the limit, but only if they have a lot of following configuration, which is not that common.

How many neurons is it getting in a single request?

Also, @dfxjesse - how are you requesting all of the neurons? Are you specifying IDs, or is it all the neurons visible to the caller?

We’d like to avoid breaking existing uses if possible.

If you have time you can check out the ICP Neuron Vector. I’ll try give an overview of how this change affects it:

The neurons in the canister are created with unique nonces derived from the Vector ID (but easily computable) - the spawning neurons are derived like this too. There can be many vectors in the one canister, and thus many neurons per vector but by creating known nonces - one Vector ID can specifically own many neurons, inside the one canister.

Currently there are only few neurons (maybe 30) in there but I tested ListNeurons with thousands of neurons (and although they were empty it was returning them fine). So it was built with the assumption that thousands of neurons are okay with ListNeurons.

It has to list all the neurons the canister owns to find just one vectors owned neurons (because ListNeurons only supports providing Neuron IDs - which are just random numbers and can’t be known beforehand). However, if ListNeurons allowed me to provide subaccounts (which I can compute myself), the canister can search each vectors specific neurons in one call (so no need to retrieve all the neurons).

Hope this use case is clear enough, but feel free to ask questions.

EDIT: If the change goes through without allowing subaccount searching in ListNeurons. Instead of just one call, it would require x calls (expensive) to find a Vectors owned neurons, in the event the canister needs to scale to hundreds of neurons.

If all of the neurons are empty, then you can currently list a lot of neurons in a single call. But as the neurons pick up a voting history, for example, that assumption would eventually no longer be true, and you would have to specify neuron IDs or some other mechanism to limit the result set. It would also break somewhat unexpectedly, even if no new neurons were added, and then if you didn’t have the neuron_ids, you’d have no way to get any data back from that endpoint, and would have to query list_neuron_ids and implement your own manual paging mechanism.

A couple other questions:

Are these neurons managed on behalf of other users? So your canister would interact with NNS on behalf of lots and lots of users who collectively own thousands of neurons?

I’m also not understanding why a vector of neurons (i.e. lots of small neurons) instead of one larger neuron? Neurons can be split if need be quite easily, so what’s the advantage of creating a vector of neurons ahead of time?

The majority would be spawning neurons which contain no voting history, and are nearly equivalent to empty neurons in size.

The canister is governed by the Neutrinite DAO. And yes it could possibly scale to lots of users.

Take a look at the README to understand the use case. Essentially one vector has one main neuron (user retains VP) and up to a max of 8 Spawning Neurons (which is where the majority of the neurons come from).

We are hoping, in the near future, to get rid of spawning neurons and to replace it with disburse maturity (like SNSes do). This would no longer create all of these unnecessary intermediate neurons, but instead just mint ICP directly to a specified account.

Additionally, the spawning neurons, after their ICP is distributed, will soon not be returned by default from the API (and they can already be excluded from the response).

Does that change things for you?

I did look at the README, but I still don’t quite understand what problem is being solved.

In any case, I’ll discuss this with the team.

That’s fine, I don’t return empty neurons by default. It won’t change anything.

What about backwards compatibliity there too? How will that work? I don’t mind this change but would need time to migrate.

Anyway, what I need is ListNeurons to support neuron_ids_or_subaccounts and I’ve been trying to push it for awhile. You can currently get neurons by their subaccounts with the get_full_neuron_by_id_or_subaccount and get_neuron_info_by_id_or_subaccount methods already, so ListNeurons is an outlier here and I don’t understand why it can’t be added with all the other work that is happening.

It will be helpful for a bunch of usecases in the future and make canister staking much more scalable.

I think this explains it nicely: x.com

Thank you!

So you don’t actually need the empty neurons in the response from the API? Because that means that you will need paging at a little under 1000 neurons based on recent_ballots alone.

It sounds like, other than the additional feature you are requesting, that this will help facilitate making your service at scale.

As far as supporting subaccounts in list_neurons goes, we are opening that discussion up again. I haven’t seen many requests for it, but it does seem like a reasonable feature and would be symmetrical with other API methods. I’ll ask for a security review of the idea, and see if we can fit it in somewhere.

1 Like

Ye that’s right, I don’t need empty neurons.

I do need to find specific neurons (with ICP in them) though - that could be on any one of the pages (if this new feature is added). The current way I do it is by listing everything in one call so up to 1000 neurons (with ICP in them) is plenty for awhile but the most scalable would be by adding the additional neuron subaccounts in list_neurons feature.

Hi @dfxjesse, after taking a look at the implementation, I was wondering what is the reason not to use the neuron id returned in the spawn response (currently discarded here)? We thought you might be trying to avoid storage cost, but it seems you are storing neuron information anyway.

Hi @jasonzhu Thanks for looking at it. Mostly because of this:

In the event something goes wrong, if I relied solely on the response from spawn neuron, I would have no way of finding whose spawned neuron that is - only by creating known subaccounts can it reliably find the users spawned neuron from list_neurons - and even if I just hoped all the calls were successful I would still need to do this as a backup measure.

Just wanted to make sure we are on the same page: currently IC is still using the guaranteed response messaging model - when a canister makes an inter-canister call, it will wait for the response forever, until it hears back from the callee. If the NNS Subnet stalls, the caller canister will keep waiting without a timeout (and therefore it cannot be stopped or be upgraded safely). Until the best-effort messaging model is delivered (which is coming) and the caller canister starts to use that, there doesn’t seem to be any benefit not using the neuron id returned by NNS Governance.

That being said, you are right that when the best-effort messaging model comes and this canister starts to use it for spawning neurons, the canister will function in a better way if the NNS Governance goes down (e.g. it can be upgraded). I’m not entirely sure if that’s a good trade-off though (I might be biased since I work on the NNS :sweat_smile:)

Just to clarify (sorry if it’s already obvious to you): in the post you were linking to regarding ingress messages, it doesn’t apply to inter-canister messages, where the caller is another canister.

Right I was more talking about being able to query the result of the response after the call has been made.

I know we are guaranteed responses, but as far I understand it and due to the nature of these systems, it is possible for a neuron to be spawned and the neuron ID never sent back to the canister in the response (IE something went wrong, and the canister gets a network error or other in transit). So it becomes necessary to query results of this after the call has been made.

This is true for the upcoming best-effort messaging feature (if you set a timeout), and it’s also true for ingress messages currently, but it’s not true if you send messages from a canister, at this time.

More concretely, if the canister calls NNS Governance to spawn and the neuron is indeed spawned and the response is produced, the response cannot be lost. There is indeed no guarantee WHEN the response is delivered. In that sense, it’s possible that the response message sent from NNS Governance to the caller canister is NEVER delivered (extremely unlikely, but possible), but if that happens, I don’t think another call between the exact same 2 canisters will have the responses delivered.

cc @free to keep me honest, especially since this post was referenced. In addition, I think I’m assuming the messages are ordered, which sounds reasonable, but not 100% sure.

1 Like

We only have ordering guarantees for requests (so that if two requests R1 and R2 from a canister A to a canister B are both delivered to B, then they are delivered in the order in which they were sent by A).

But we do not have ordering guarantees for responses. Among other issues, this would mean that of request R1 ends up doing an ungodly amount of work, then we wouldn’t be allowed to deliver the response to R2 before R1 completes. But also more generally, if the response for R1 is produced before the response to R2, it’s still possible that e.g. if a subnet split happens, they could be delivered in reverse order. Which is why the protocol makes no such guarantee for responses.

Thanks for further clarification on how that works. Based on what you and @free are saying there is a (albeit small) chance response delivery can take an arbitrarily long time to deliver (in the current response system). For the canister I worked on, that chance is not really worth the risk. If a user spawned a neuron in there it could have X amount of ICP in it that could be lost, so having the backup of using list_neurons to find that neuron and by it’s subaccount, seems like a safe bet. And from reading Free’s reply it seems it’s perfectly fine for another call to go through in this case, even if the first call is waiting for a response.

Aside from that, I also need to know when to spawn maturity and when that neuron is ready to be claimed, which is why I don’t solely rely on the responses for things (but that is an unrelated design decision).

I appreciate the information here though, it’s very informative and I will be thinking about it further when engineering things.