gossip/gossip/gossip_impl.go · f9318cdf39a1e5b5cfee8160cdb4292ea5acc43f · zistvan-public / StreamChain Prototype

Jul 05, 2017

[FAB-5165] Optimize block verification · 6d56e6eb

yacovm authored Jul 05, 2017



In gossip, when block messages are gossiped among peers the
signature of the ordering service on them is validated.

This causes a message to be validated in several places:

1) When it is received from the ordering service
2) When it is received from a peer via forwarding or pull
3) When it is received from a peer via state tranfer

The problem with (2) is that it is done in an inefficient way:
- When the block is received from the communication layer it is verified
  and then forwarded to the "channel" module that handles it.
- The channel module verifies blocks in 2 cases:
  - If the block is part of a "data update" (gossip pull response) message
    the message is opened and all blocks are verified
  - If the block is a block message itself, it is verified again,
    although... it was verified before passed into the channel module.
    This is redundant.

But the biggest inefficiency is w.r.t the handling in the channel module:
When a block is verified it is then decided if it should be be
propagated to the state transfer layer (the final stop before it is
passed to the committer module). It is decided by asking the in-memory
message store if the block has been already received before, or
if it is too old.

The problem is that this is done *AFTER* the verification and not *BEFORE*
and therefore - since in gossip you may get the same block several times
(from other peers) - we end up verifying the block and then discarding
it anyway.

Empirical performance tests I have conducted show that for blocks
of 100KB, the time spent on verifying a block is between 700 micro-seconds
to 2milliseconds.

When testing a benchmark scenario of 1000 blocks with a single leader
disseminating to 7 non-leader peers, with propagation factor of 4,
a block entry rate (to the leader peer) of bursts of 20 blocks every 100ms,
the gossip network is over committed and starting from block 500 -
most blocks were dropped because the gossip internal buffers were full
(we drop blocks in order for the network not to be "deadlocked").

With this change applied, no block is dropped.

Change-Id: I02ef1a203f469d324509a2fdbd1c8b449a9bcf8f
Signed-off-by: yacovm <yacovm@il.ibm.com>

6d56e6eb

[FAB-5165] Optimize block verification

yacovm authored Jul 05, 2017



In gossip, when block messages are gossiped among peers the
signature of the ordering service on them is validated.

This causes a message to be validated in several places:

1) When it is received from the ordering service
2) When it is received from a peer via forwarding or pull
3) When it is received from a peer via state tranfer

The problem with (2) is that it is done in an inefficient way:
- When the block is received from the communication layer it is verified
  and then forwarded to the "channel" module that handles it.
- The channel module verifies blocks in 2 cases:
  - If the block is part of a "data update" (gossip pull response) message
    the message is opened and all blocks are verified
  - If the block is a block message itself, it is verified again,
    although... it was verified before passed into the channel module.
    This is redundant.

But the biggest inefficiency is w.r.t the handling in the channel module:
When a block is verified it is then decided if it should be be
propagated to the state transfer layer (the final stop before it is
passed to the committer module). It is decided by asking the in-memory
message store if the block has been already received before, or
if it is too old.

The problem is that this is done *AFTER* the verification and not *BEFORE*
and therefore - since in gossip you may get the same block several times
(from other peers) - we end up verifying the block and then discarding
it anyway.

Empirical performance tests I have conducted show that for blocks
of 100KB, the time spent on verifying a block is between 700 micro-seconds
to 2milliseconds.

When testing a benchmark scenario of 1000 blocks with a single leader
disseminating to 7 non-leader peers, with propagation factor of 4,
a block entry rate (to the leader peer) of bursts of 20 blocks every 100ms,
the gossip network is over committed and starting from block 500 -
most blocks were dropped because the gossip internal buffers were full
(we drop blocks in order for the network not to be "deadlocked").

With this change applied, no block is dropped.

Change-Id: I02ef1a203f469d324509a2fdbd1c8b449a9bcf8f
Signed-off-by: yacovm <yacovm@il.ibm.com>