Ballot Type Proposal - Consensus Pause

#1

I’d like to begin the discussion around the addition of a new ballot type. Basically allowing for a validator to “pause” their node.

This ballot would allow for a validator to remove themselves from consensus for a set period of time, and then restoring after said time automatically.

Validators may find themselves away from their computer(s) due to work-related travels, illness, or in need of personal leave.

This would solve the issue of the block creation time taking a hit.

Thoughts?

1 Like

#2

Removing of a mining key is a designed way to pause participation in the consensus. If you decide that 48h is too much for pause, you can decrease threshold for that type of ballots.

0 Likes

#3

True, however - I suspect that not having such a tool will result in validators that might be away from their ability to perform actions on their node choosing to instead ‘risk it’.

Perhaps think of these like sick days…

Again, if someone were to know that they are going on Holiday or perhaps going under the knife (surgery) - wouldn’t we want these folks to feel comfortable to remove themselves in the off chance that an emergency action is needed.

I suspect that in 99% of the times, people will be away and unable to update - and nothing is needed. I’m always thinking in terms of expectation values (not averages) as this is a better tool. And so to avoid a Black Swan is best.

As for shortening up the time window. I think I’ve made my personal beliefs around this known to all. 48 hrs for a set of humans to coordinate and act is probably the shortest window possible (and I’m of the opinion we need to increase this for a variety of reasons).

The other mechanism of course is to increase the number of validators to 100+, so some nodes that become unreachable doesn’t have as big of an effect.

0 Likes

#4

Again, if someone were to know that they are going on Holiday or perhaps going under the knife (surgery) - wouldn’t we want these folks to feel comfortable to remove themselves in the off chance that an emergency action is needed.

Consensus can tolerate a worst-case scenario for any individual validators, e.g. death.

As for shortening up the time window. I think I’ve made my personal beliefs around this known to all. 48 hrs for a set of humans to coordinate and act is probably the shortest window possible (and I’m of the opinion we need to increase this for a variety of reasons).

48h is enough in my opinion too. Meanwhile, only 3 validators are required to remove a validator. Thus, personal involvement of each individual validator in the voting process is overrated by validators.

0 Likes

#5

True. I was more focusing on the concern that some have expressed wrt the block creation time increasing (ever so slightly) from 5 sec should a node go down.

Again, the other way is to increase the number of validators… make the denominator big…

So the time step for any one validating node rather small.

0 Likes

#6

The vote out ballot is more of an emergency rather than a regular process. Therefore it shouldn’t be treated same as vote it. It might make sense to decrease the time to 24h, but increase the threshold to 50% as the error safe measure.

0 Likes

#7

At the moment, vote out ballot is designed as a general purpose ballot. If you’d like to have another type of ballots for emergency vote out by majority of validators please create a corresponding RFC here https://github.com/poanetwork/rfc

1 Like

#8

One beauty of Bitcoin was in its handling of the Byzantine General’s (specifically the: Two Generals) problem - i.e. how do you coordinate between nodes. The brilliance, I think, was to slow down the network. This was a crazy idea - telling computers on a modern network to wait 10 min’s or so to let everyone figure themselves out. Most were thinking that computers should go faster and not slower…

So I wonder if asking humans to coordinate within 24hrs (or even 48 hours in my opinion) is realistic. When people are pressed hard, there is the real potential for non-ideal solutions to be implemented. Also, there is the matter of how to get in touch with everyone. Recently, some members simply did not know how to reach out to other members. So there’s my two bits regarding shorten of the time window.

0 Likes

#9

I’m wondering what constitutes an “emergency”.

Some may think myself to be ‘slow’. I actually am slow on purpose, as I like to verify, study, think, and then Document every action. Documentation takes a great deal of time to do correctly - and I do this as a means for back tracking/troubleshooting.

Nonetheless, some might think my slowness may be cause for an ‘emergency’.

Or what if one were not able to reach a validator. This was an issue. I’d rather we not discuss ‘emergency’ ballots - as these open an unnecessary can of worms.

0 Likes

#10

We designed to have 25 validators on Core. This process of onboarding is what is really slow in POA at the moment.

0 Likes

#11

To clarify this statement (and it is a bit out of scope for this forum posting - but important):

Three validators are needed for a quorum but not to perform an action.

For example. If there was a vote in which three members voted (3 all voted for) no action is taken, as there wasn’t a valid quorum.

0 Likes

#12

To remove a validator only 3 unique voting keys are needed that means only 3 validators are required to remove a validator from the consensus. Also, one of voters can be a ballot proposer.

0 Likes

#13

Am I understanding correctly: if 3 Vote to remove a validator out and 22 Vote to keep a validator - the validator is removed?

0 Likes

#14

No, the validator will not be removed.
3 voted to remove or 2 voted to remove and 1 voted against, 22 didn’t vote - validator will be removed

0 Likes

#15

Ok. So 3 is necessary for a quorum - not for an action. And we are using only simple majority rules.

0 Likes

#16

James said: The main concern I would like to address is response time to a similar situation. How do we improve upon it in the future, using the tools that we have in place as independent validators?

Honestly, I think that the speed with which the network updated is not sustainable. It was too fast. Human coordination is the rate limiting step. I can tell you a couple of issues that arose: 1) People were literally asking who’s handle is XXX 2) I think we need to be ok with Y% of nodes that may partition. What that percentage is I don’t know.

Having been raised watching Cosmos, I’m reminded of a quote by Sagan:

“Every time a scientific paper presents a bit of data, it’s accompanied by an error bar – a quiet but insistent reminder that no knowledge is complete or perfect. It’s a calibration of how much we trust what we think we know. If the error bars are small, the accuracy of our empirical knowledge is high; if the error bars are large, then so is the uncertainty in our knowledge.”

We mustn’t be afraid of error bars, just better understand what is acceptable.

UPDATE: Having more tools to communicate is great, but the rate determining step (I believe) is in how fast the humans can perform the work (for a variety of reasons: illness, travel, comfort, etc…). We should just be mindful (and accepting) of this reality.

0 Likes

#17

Perhaps it is best to provide some examples, ones that I’m hoping never happen mind you:

Suppose Validator Bob is going on vacation, specifically a cruise. Internet is spotty at best on the open seas. Now in 99% of the time, zero action is necessary from Bob to update his Node’s software.

But he doesn’t think in terms of probabilities, rather he understands the concept of expected values. Should a black swan event happen - he will most likely be out, due to his being negligent in his duties.

Hum… What is Bob to do?

He could vote himself out…. (just let that one sink in for a bit)

Would he? Firstly there is a question of loss of reward. I’m going to ignore this - he was never in it for the money (that is a big assumption). Bob’s really concerned that his removal would be permanent.

So Bob gives his acquaintance Alice his password* to his node. Notice that I said ‘acquaintance’ and not ‘great friend’. Bob actually doesn’t have to particularly like Alice and we can actually compute the boundaries of the amount of trust Bob must have for Alice.

Low Side:

Scenario 1: Alice stops Bob’s node, Bob is available (he isn’t on vacation yet) and brings his node back online. He gives the reason of “hackers” and that’s that. Neither Alice or Bob would share the reality of the activity to the larger validator set, as both would be out.

Scenario 2: Alice stops Bob’s node, Bob’s cruising. Well, Bob’s SOL… But remember he was facing this scenario at the very beginning.

Scenario 3: A network issue arises, Bob’s cruising and Alice Restarts his node. :slight_smile:

Only Scenario 2 is of concern - and this was going to happen anyways… So on the low side the level is a normalized ZERO.

High Side:

Bob only needs to ascribe a value to his trust of Alice that is below the compensation that he would be out. Being removed permanently from participation - is rather high (at the moment). He just needs to trust Alice less that this BIG number.

Solution (potential - love to discuss others / expand upon…):

Having a mechanism in place that would allow a validator to gracefully exit from validation duties (perhaps keep voting duties in place - since it is easier to vote) may result in validators to more easily self disclose. There will be times that they might not be able to perform their duty and we must ask ourselves if it is wise to ignore this possibility.

Again - I would much rather void the possibility of ad hoc systems being arrived out between validators. One of the core constructs underpinning many Blockchain projects - unpacking/understanding incentivization schemas… (and being totally honest about them.)

*Please note, I’m grossly oversimplifying - as all good examples ought to for readability.

0 Likes