Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ACS and BA algorithms #11

Merged
merged 14 commits into from
May 10, 2018
Merged

The ACS and BA algorithms #11

merged 14 commits into from
May 10, 2018

Conversation

vkomenda
Copy link
Contributor

@vkomenda vkomenda commented May 7, 2018

I've defined the ACS algorithm and an interface to it consisting of 3 functions: new, send_proposed_value and on_input. The latter is a message handler for RB and BA messages. These two types of message are therefore joined in the types of message inputs and outputs.

Note about BA: A common coin implementation is not provided in this pull request. The coin value is constant 1 at

let coin: u64 = 1;
.

Copy link
Collaborator

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks, good, apart from a few details. (Most of my comments are just nit-picks.)

bool bval = 1;
bool aux = 2;
bool bval = 2;
bool aux = 3;

This comment was marked as outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustfmt doesn't work with anything but Rust sources:

$ rustfmt proto/message.proto 
error: expected one of `!` or `::`, found `=`
 --> /home/vk/src/poanetwork/hbbft/proto/message.proto:1:8
  |
1 | syntax = "proto3";
  |        ^ expected one of `!` or `::` here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry! You're right, of course, and indentation 2 is fine in a proto file!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed indentation to 4 spaces and committed.

src/agreement.rs Outdated

#[derive(Default)]
pub struct Agreement {
pub struct Agreement<NodeUid> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a TODO about amiller/HoneyBadgerBFT#59, so we don't forget to add the CONF message round later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

src/agreement.rs Outdated
/// Values received in AUX messages. Reset on every epoch update.
received_aux: HashMap<NodeUid, BTreeSet<bool>>,
/// All the output values in all epochs.
outputs: BTreeMap<u32, bool>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called estimates instead? An output should only be produced once, I think.

Since the epochs start from 0 and are then incremented by 1, we might as well use a Vec here.

Or could we even do away with keeping the previous rounds' estimates around, and just make this a bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreement can output multiple times. That's why the termination criterion is more complex than whether the instance has output a value. I have to correct the code to reflect that. At the moment only one value is stored per agreement instance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's a bit ambiguous in the paper. I think the reason for the complex termination criterion is just that a node needs to keep sending messages to help the other nodes output, too. The formulation "then output b" should really be read as "then output b, unless you already have done so". For the user of the module it will certainly be more convenient if it only outputs once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the uniqueness of output. The paper Signature-Free Asynchronous Byzantine Consensus with t < n/3 and O(n^2) Messages from which this version of BA originates has better presentation and does say "decide(v) if not yet done" in Figure 2.

src/agreement.rs Outdated
} else {
count
}
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe slightly simpler:

self.received_bval.values().filter(|values| values.contains(&b)).count()

src/agreement.rs Outdated
if count_bval == 2 * self.num_faulty_nodes + 1 {
self.bin_values.insert(b);

// wait until bin_values_r /= 0, then multicast AUX_r(w)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Haskell. 😁 Let's use != or .

} else {
// Send the message to the agreement instance.
agreement_instance
.on_input(uid.clone(), &amessage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be the same uid as above, should it? If I understand correctly:

  • self.agreement_instances.get_mut(&uid) is the instance that decides whether the element proposed by node uid should be part of the common subset in the end.
  • The parameter of Agreement::on_input should be the uid of the node that sent this particular message.


if let Ok((output, mut outgoing)) = result {
if let Some(b) = output {
outgoing.append(&mut self.on_agreement_result(uid, b));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that &mut a typo?

let results1: usize =
self.agreement_results
.iter()
.fold(0, |count, (_, v)| if *v { count + 1 } else { count });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could also be simplified with filter(|v| **v).count(), I think.

if instance_uids == completed_uids {
// All instances of Agreement that delivered `true`.
let delivered_1: HashSet<NodeUid> = self.agreement_results
.all(|(_, instance)| instance.terminated())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only the values are needed, values() should be preferred over iter():

self.agreement_instances.values().all(Agreement::terminated)

src/proto/mod.rs Outdated
/// BVAL message with an epoch.
BVal((u32, bool)),
/// AUX message with an epoch.
Aux((u32, bool)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move the AgreementMessage type into the agreement module. I think the individual algorithms' modules could be pretty self-contained.

@vkomenda
Copy link
Contributor Author

vkomenda commented May 8, 2018

@afck, I've changed on_input_agreement considerably while working on your passing remark on uid. Please check whether you find any issues with the new version. The rationale behind the new version is that any incoming BA message should be delivered to all local BA instances since all BA messages are broadcasts, hence there is no explicit destination of a BA message.

Copy link
Collaborator

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a few details I'm not sure about. We should add extensive tests for this soon. (In a later PR, if you prefer.)

oneof payload {
BroadcastProto broadcast = 1;
AgreementProto agreement = 2;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about my remark! You were right, of course, and we should leave the indentation at 2 spaces instead of 4 here, as is the standard for proto files.

src/agreement.rs Outdated
self.received_aux
.values()
.filter(|values| values.is_subset(&self.bin_values))
.map(|values| vals.union(values))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTreeSet::union doesn't modify the receiver, so vals will remain empty. This should probably say:

vals.extend(values)

src/agreement.rs Outdated
/// Sent BVAL values. Reset on every epoch update.
sent_bval: BTreeSet<bool>,
/// Values received in AUX messages. Reset on every epoch update.
received_aux: HashMap<NodeUid, BTreeSet<bool>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this a HashMap<NodeUid, bool>, since multiple Aux messages from the same node can be safely ignored. (I think… can they?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an agreement instance can output different AUX values in different epochs. Note that the values in AUX messages depend on the current state of bin_values which is subject to change.

Copy link
Collaborator

@afck afck May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant multiple Aux messages in a single epoch can be safely ignored. This container should be cleared in each epoch change, shouldn't it?

src/agreement.rs Outdated
let (count_aux, vals) = self.count_aux();
if count_aux < self.num_nodes - self.num_faulty_nodes {
// Continue waiting for the (N - f) AUX messages.
(None, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make this an early return (None, None), then the rest of the method wouldn't have to be indented.

src/agreement.rs Outdated
} else {
// FIXME: Implement the Common Coin algorithm. At the moment the
// coin value is random and local to each instance of Agreement.
let coin2 = random::<bool>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using random means the coin won't be "common" at all: each node will have a different value. I think that breaks the guarantee that it terminates, even without any malicious nodes. The instances take the coin value as a hint when to terminate, so an unlucky node may outlive all its remote counterparts, and never be able to output anything.

src/agreement.rs Outdated
let output: Vec<Option<bool>> = vals.into_iter()
.take(1)
.map(|b| {
message = Some(self.set_input(b));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does self.epoch need to be incremented before this call? I imagine the BVal messages this sends should have the next epoch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It makes sense to increment the epoch earlier.

/// Message from a remote node `message_sender_id` concerning the common
/// subset element proposed by the node `element_proposer_id`.
Agreement {
message_sender_id: NodeUid,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message sender's ID doesn't need to be part of the message itself, I think, since it's passed into on_input anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are confusing CommonSubset::on_input which does not receive the ID as argument with Agreement::on_input which does. The latter gets the sender ID from the message received by CommonSubset::on_input.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't CommonSubset::on_input just pass on the sender ID to Agreement::on_input? It looks dangerous to me to have it in the message, since it would basically allow the sender to lie about their identity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, the sender ID is in common_subset::Input which is not a message but rather a type of input to the Common Subset algorithm. As we discussed before, AgreementMessage does not have any IDs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see! But then what will actually be transmitted over the network, i.e. what's the "common subset" message type? Because that will need to contain element_proposer_id, but not message_sender_id, I think. So wouldn't it be more natural to replace Input with Message, and change the signature of CommonSubset::on_input so that it takes the sender ID separately from the message?

) -> Result<Option<AgreementMessage>, Error> {
if let Some(agreement_instance) = self.agreement_instances.get_mut(uid) {
fn on_broadcast_result(&mut self, uid: &NodeUid) -> Result<Option<AgreementMessage>, Error> {
if let Some(agreement_instance) = self.agreement_instances.get_mut(&uid) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it's dangerous to change an agreement instance's estimate value later than round 0. Could this prevent termination? (Not sure about that.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the Agreement interface needs to change instead, and just ignore inputs after the first round.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this in a stricter way. There is a check has_input which returns true iff the first estimated value has been written. set_input will return an error if the input has already been provided.

let mut result = Err(Error::NoSuchAgreementInstance);

// Send the message to the local instance of Agreement
if let Some(mut agreement_instance) = self.agreement_instances.get_mut(&element_proposer_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the mutable result, you could just return here on error:

let agreement_instance = self.agreement_instances
    .get_mut(&element_proposer_id)
    .ok_or(Error::NoSuchAgreementInstance)?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the mutable result helps to avoid a multiple mutable borrow conflict.

let instances = &mut self.agreement_instances;
for (_uid0, instance) in instances.iter_mut() {
if results1 >= self.num_nodes - self.num_faulty_nodes {
for instance in self.agreement_instances.values_mut() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above: I'm worried that this could break the agreement if the instance has already progressed beyond the first round. (But it's not quite clear in the paper and I haven't thought much about it yet.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the paper, BA instances can set the estimated value only for the next epoch. So, if an instance hasn't been provided with input true from the ACS instance, the estimated value of the epoch 0 will remain empty until the ACS instance fills it in using Agreement::set_input(false). Then that BA can terminate as soon as the coin value is false due to the termination criterion.

src/agreement.rs Outdated
if self.bin_values.contains(b) {
vals.insert(b.clone());
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now vals.len() will always be <= 2 (number of values instead of number of entries). Maybe it should be:

let aux_count = self.received_aux.values()
    .filter(|b| self.bin_values.contains(b))
    .count();
(aux_count, vals)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you define vals?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same way you already do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so then aux_count == vals.len(). So, aux_count is redundant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't, in general: vals is a set of bools, so it has 0, 1 or 2 elements. The above aux_count would be the number of entries in the received_aux map that have one of those values, which can be anything between 0 and num_nodes. (The values() iterator doesn't deduplicate, so it will yield &true and &false as often as they appear in the map.)

src/agreement.rs Outdated
let vals: BTreeSet<bool> = self.received_aux
.values()
.filter(|b| {
count += 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it will count the total number of entries in received_aux, instead of just the ones whose values are in bin_values.

src/agreement.rs Outdated
// Check the termination condition: "continue looping until both a
// value b is output in some round r, and the value Coin_r' = b for
// some round r' > r."
self.terminated = self.terminated || self.estimated.values().any(|b| *b == coin2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not sufficient that any previous estimate agrees with the coin. The output has to agree with it.

src/agreement.rs Outdated
/// Estimates of the decision value in all epochs. The first estimated value
/// is provided as input by Common Subset using the `set_input` function
/// which triggers the algorithm to start.
estimated: BTreeMap<u32, bool>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually need to remember the previous estimates: Once we move on to epoch i + 1, the value from epoch i can safely be forgotten.

I think we could even just make estimate a simple bool and initialize it with false (without sending messages). Agreement::on_input would then do nothing if epoch > 0 or input has already been provided before (i.e. we'd need another has_input: bool).

We'd also need a has_output: bool. After that's true, the estimate can't change again (we don't even need to cater for that, as it's guaranteed by the algorithm itself, I think), and we can terminate as soon as in any future round has_output && coin == estimate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the termination criterion, I mostly agree but I think the paper asks for self.output == Some(coin). In that case do we need estimated at all? It is never read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, estimated is used for broadcast.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think you're right: It is used for broadcasting BVal, but only at the point where we transition to a new epoch (or receive user input in epoch 0). After that we can forget it, so we don't actually have to store it in the struct at all. (But then we'd have to make output an Option<bool>, as you say. Either way is fine with me.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at my last commit. I added a BVal message at the start of every epoch r, r > 0. I think the paper clearly requires those.

Copy link
Collaborator

@afck afck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now! 👍

Thanks for your patience and sorry for the long list of nit-picks! I've just got one more, but we can also resolve this in another PR.

src/agreement.rs Outdated
self.try_coin();
Ok((self.output, VecDeque::new()))
outgoing.extend(self.try_coin());
Ok((self.output, outgoing))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure we output only once.

@vkomenda
Copy link
Contributor Author

Yes, I fixed the once-per-instance property of output too.

@afck afck merged commit d9febca into master May 10, 2018
@afck afck deleted the vk-agreement branch May 10, 2018 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants