Skip to content
  • yacovm's avatar
    [FAB-5165] Optimize block verification · 6d56e6eb
    yacovm authored
    
    
    In gossip, when block messages are gossiped among peers the
    signature of the ordering service on them is validated.
    
    This causes a message to be validated in several places:
    
    1) When it is received from the ordering service
    2) When it is received from a peer via forwarding or pull
    3) When it is received from a peer via state tranfer
    
    The problem with (2) is that it is done in an inefficient way:
    - When the block is received from the communication layer it is verified
      and then forwarded to the "channel" module that handles it.
    - The channel module verifies blocks in 2 cases:
      - If the block is part of a "data update" (gossip pull response) message
        the message is opened and all blocks are verified
      - If the block is a block message itself, it is verified again,
        although... it was verified before passed into the channel module.
        This is redundant.
    
    But the biggest inefficiency is w.r.t the handling in the channel module:
    When a block is verified it is then decided if it should be be
    propagated to the state transfer layer (the final stop before it is
    passed to the committer module). It is decided by asking the in-memory
    message store if the block has been already received before, or
    if it is too old.
    
    The problem is that this is done *AFTER* the verification and not *BEFORE*
    and therefore - since in gossip you may get the same block several times
    (from other peers) - we end up verifying the block and then discarding
    it anyway.
    
    Empirical performance tests I have conducted show that for blocks
    of 100KB, the time spent on verifying a block is between 700 micro-seconds
    to 2milliseconds.
    
    When testing a benchmark scenario of 1000 blocks with a single leader
    disseminating to 7 non-leader peers, with propagation factor of 4,
    a block entry rate (to the leader peer) of bursts of 20 blocks every 100ms,
    the gossip network is over committed and starting from block 500 -
    most blocks were dropped because the gossip internal buffers were full
    (we drop blocks in order for the network not to be "deadlocked").
    
    With this change applied, no block is dropped.
    
    Change-Id: I02ef1a203f469d324509a2fdbd1c8b449a9bcf8f
    Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
    6d56e6eb