1. 27 Mar, 2019 1 commit
    • Jason Yellick's avatar
      FAB-11863 Assorted Raft serviceability fixes · 5fb1be95
      Jason Yellick authored
      
      
      This CR bundles four small serviceability fixes for Raft.
      
      1) It removes the newlines from a log message which made it difficult to
      consume and appeared to create a truncated list like this:
      
       INFO 17155f Entering, channel: testorgschannel1, nodes: [ID: 3
      
      2) It adds periods to all of the metric definitions in the cluster
      metrics.
      
      3) It converts the message send time in the cluster package to be
      seconds and clarifies the description with the unit 'seconds'.
      
      4) It clarifies that the number of leader changes is what this process
      has observed since start, and not the total number of leader changes for
      the network.
      
      Change-Id: Ic4ad6551af57497f174518188022bf4dfd04fc19
      Signed-off-by: default avatarJason Yellick <jyellick@us.ibm.com>
      5fb1be95
  2. 15 Mar, 2019 1 commit
    • yacovm's avatar
      [FAB-14682] Add stream ID to err msg · 42420364
      yacovm authored
      
      
      This change set:
      
      - Adds the steam ID to the aborted error message for better
        readability and troubleshooting.
      - Reverses the order between:
        - The log message that says that the stream was terminated.
        - The actual termination of the stream.
      
      Change-Id: If3537b4770eb00d5c67b032bdb49303c9007e794
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      42420364
  3. 05 Mar, 2019 1 commit
    • yacovm's avatar
      [FAB-14077] cluster comm metrics · 61abf567
      yacovm authored
      
      
      This change set adds cluster communication metrics:
      
      	EgressQueueSize          metrics.Gauge
      	EgressWorkerCount        metrics.Gauge
      	IngressStreamsCount      metrics.Gauge
      	EgressStreamsCount       metrics.Gauge
      	EgressTLSConnectionCount metrics.Gauge
      	MessageSendTime          metrics.Histogram
      	MessagesDroppedCount     metrics.Counter
      
      And adds corresponding unit tests for them.
      
      Change-Id: I769d83b0721cc8dbf35b07b51b1d024539064009
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      (cherry picked from commit 37fc516d6465be36c4e3daabd612155b9c2c18d3)
      61abf567
  4. 03 Mar, 2019 2 commits
    • yacovm's avatar
      [FAB-14045] Send messages asynchronously in clusters · 5c2e2122
      yacovm authored
      
      
      This change set makes consensus messages be sent asynchronously, over
      a buffered channel with a size of 10.
      
      Consensus messages are dropped when the buffer overflows,
      and Submit messages are blocking on the buffer.
      
      Without this change set, the sending of large messages takes milliseconds,
      while with the change set it takes micro-seconds.
      
      Change-Id: Id60b05b96eed6d9d04f89b8967945b18ddfbef94
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      5c2e2122
    • yacovm's avatar
      [FAB-13805] Unify Step and Submit into a stream · 38c1515c
      yacovm authored
      
      
      This change set removes Step RPC from the cluster protobuf,
      and renames Submit stream to a Step stream, and makes both
      transaction forwarding and consensus messages use the
      new Step stream.
      
      It also makes both egress Send() and Recv(), have a maximum
      timeout (the RPC timeout in the config).
      A Send or Recv that is used to send a consensus message,
      or send (receive) a transaction (status) will now abort prematurely
      in order to protect against any liveness issue on the remote node,
      and also to return an answer to clients within a timely manner.
      
      Change-Id: Id942b248212f5c324e12af34fce48f96fdbb6aea
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      38c1515c
  5. 27 Feb, 2019 1 commit
    • yacovm's avatar
      [FAB-13633] Make Step RPC failures non blocking · fba0b4e2
      yacovm authored
      
      
      Per the gRPC documentation:
      
      If an RPC is issued but the channel is in TRANSIENT_FAILURE or
      SHUTDOWN states, the RPC is unable to be transmited promptly.
      By default, gRPC implementations SHOULD fail such RPCs immediately.
      This is known as "fail fast," but usage of the term is historical.
      
      However...
      
      RPCs SHOULD NOT fail as a result of the channel being in other states
      (CONNECTING, READY, or IDLE).
      
      Therefore, if it takes too much time for gRPC to move from
      a state of CONNECTING to TRANSIENT_FAILURE (i.e - packet drop,
      or DNS lookup failure) - it will slow down the entire Raft FSM.
      
      This change set makes Step RPCs inspect the underlying
      gRPC connection state prior to being invoked.
      If the connection is in state connecting, then
      it fails fast.
      
      Change-Id: I50df1f758a00fc99bed54ff1a2056f83f53efdf7
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      fba0b4e2
  6. 08 Oct, 2018 1 commit
    • yacovm's avatar
      [FAB-11833] Say hello to Raft OSN · 96735d2f
      yacovm authored
      
      
      This change set:
      
      1) Wires the consenter into the registrar
      2) Wires the communication to the consenter and
         to the registrar, in preperation for multi-node Raft.
         This makes a new chain configure the communication layer
         of the cluster according to the TLS certificates
         of the channel.
      3) Adds a unit test that spawns a Raft OSN
      
      Change-Id: I3923f6f5e0fae6428c2be2e056b82355332f3324
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      96735d2f
  7. 07 Oct, 2018 1 commit
  8. 28 Sep, 2018 1 commit
  9. 19 Sep, 2018 1 commit
  10. 26 Aug, 2018 1 commit
    • yacovm's avatar
      [FAB-11586] Raft communication layer, part 2 · 0e8eedda
      yacovm authored
      
      
      This change set implements the higher level of the raft
      communication layer which is to be used by the consumers
      of the communication layer.
      
      It uses the base communication layer from FAB-11585 and
      exposes a higher level API over it for sending messages,
      via an RPC object, and also implements a gRPC service
      that implements the Cluster protobuf API and passes
      the messages to the base communication layer.
      
      Change-Id: Id5f5f60179b4cac5ef7ab88bc4c9f5b7425391e6
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      0e8eedda
  11. 20 Aug, 2018 1 commit
    • yacovm's avatar
      [FAB-11585] Raft communication layer, part 1 · bb311c5b
      yacovm authored
      
      
      This change set implements a base communication layer for
      the raft orderer, but it can actually be used for any
      consensus protocol that has:
      - A specified number of members at any given point in time
      - uses TLS pinning for identification
      - Represents members as integers
      
      The communication has 3 basic capabilities:
      
      1) Obtain a remote context given a destination node ID and a channel
      2) Configure the communication on the context of a given channel
      3) Get notified of Step() and Submit() messages from other remote nodes
      
      Change-Id: I87f193b7723f95f359cc4071e28b362be703ab74
      Signed-off-by: default avataryacovm <yacovm@il.ibm.com>
      bb311c5b