Commit 4f8fee12 authored by Zsolt István's avatar Zsolt István
Browse files

updated the documentation to Multes

parent ae8eda53
# Caribou
# Multes
Caribou [1] is **smart distributed storage** built with FPGAs. Each node stores key-value pairs in main memory and exposes a simple interface over TCP/IP [2] that software clients can connect to.
Multes [1] is the multi-tenant incarnation of Caribou [2]. It implements **smart distributed storage** built with FPGAs that can efficiently be shared by a large number of tenants. Each node stores key-value pairs in main memory and exposes a simple interface over TCP/IP [3] that software clients can connect to.
It is **smart** because it is possible to offload filtering into the storage nodes. The nodes can also perform scans on the data. In this design filtering is a combination of regular expression matching and predicate evaluation. Different types of processing can, however, easily be added to the processing pipeline.
It is **distributed** because it runs on multiple FPGAs that replicate the data using a leader-based consensus protocol [3] that is both low latency and high throughput.
It is **distributed** because it runs on multiple FPGAs that replicate the data using a leader-based consensus protocol [4] that is both low latency and high throughput.
It is **storage** because it stores key-value pairs in a Cuckoo hash table and implements slab-based memory allocation. The current design uses DRAM to store data, as an exploration for solutions that will work well with the emerging non-volatile memory technologies.
#### Referenced articles:
[1] Caribou: Intelligent Distributed Storage. Zs. Istvan, D. Sidler, G. Alonso. To appear in VLDB 2017, Munich, Germany. https://people.inf.ethz.ch/zistvan/doc/vldb17-caribou.pdf
[1]Providing Multi-tenant Services with FPGAs: Case Study on a Key-Value Store. Zs. István, G. Alonso, A. Singla. 28th International Conference on Field Programmable Logic and Applications (FPL'18), Dublin, Ireland . https://zistvan.github.io/doc/multes-fpl18.pdf
[2] Low-Latency TCP/IP Stack for Data Center Applications. D. Sidler, Zs. Istvan, G. Alonso. 26th International Conference on Field Programmable Logic and Applications (FPL'16), Lausanne, Switzerland, September 2016. http://davidsidler.ch/files/fpl16-lowlatencytcpip.pdf
[2] Caribou: Intelligent Distributed Storage. Zs. Istvan, D. Sidler, G. Alonso. In VLDB 2017, Munich, Germany. https://people.inf.ethz.ch/zistvan/doc/vldb17-caribou.pdf
[3] Consensus in a Box: Inexpensive Coordination in Hardware. Zs. Istvan, D. Sidler, G. Alonso, M. Vukolic. 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI '16), March 2016. https://people.inf.ethz.ch/zistvan/doc/nsdi16-istvan-rev1.pdf
\ No newline at end of file
[3] Low-Latency TCP/IP Stack for Data Center Applications. D. Sidler, Zs. Istvan, G. Alonso. 26th International Conference on Field Programmable Logic and Applications (FPL'16), Lausanne, Switzerland, September 2016. http://davidsidler.ch/files/fpl16-lowlatencytcpip.pdf
[4] Consensus in a Box: Inexpensive Coordination in Hardware. Zs. Istvan, D. Sidler, G. Alonso, M. Vukolic. 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI '16), March 2016. https://people.inf.ethz.ch/zistvan/doc/nsdi16-istvan-rev1.pdf
\ No newline at end of file
Booting
=======
(1) Starting the Nodes
======================
### Flushing
### Initializing data structures, flushing memory contents
Caribou can be used in two different modes (or a mix of these): replicated or node-local.
Regardless of the use, after programming the FPGA it needs to be reset (essentially a flush command to each FPGA):
Multes can be used in two different modes (or a mix of these): replicated or node-local.
Regardless of the use, after programming the FPGA it needs to be reset (essentially a flush command to each FPGA). Furthermore, since by default each FPGA supports eight tenants, this has to be done for each tenant.
echo -n 'FFFF000001000108F00BA20000000000f00f00f00f00f00f' | xxd -r -p | nc $FPGAIP_0 2888 -q 2
echo -n 'FFFF000001000108F00BA20000000000f00f00f00f00f00f' | xxd -r -p | nc $FPGAIP_1 2888 -q 2
echo -n 'FFFF000001000108F00BA20000000000f00f00f00f00f00f' | xxd -r -p | nc $FPGAIP_2 2888 -q 2
...
for IP in $FPGAIP_0 $FPGAIP_1 $FPGAIP_2
do
for TEN in `seq 0 7` do
echo -n 'FFFF00FF01000000F00BA20000000000f00f00f00f00f00f' | xxd -r -p | nc $IP 288$TEN -q 2
done
done
The expected answer to this request is (with the last 8B being the same as the middle 8B of the flush command above):
FFFF 0100 0000 0000 F00B A200 0000 0000
Once this has been done, the FPGA is ready to serve get/put requests or to have the Zookeeper Atomic Broadcast subsystem configured.
### Initial ZAB config
### Initial replication config
Nodes need to be told that they will participate in the replication group and who the first leader is. This can be done either "manually" using a script, or with the code in the /src/ClusterManagement project running the CommandLineInterface class:
If there is more than one FPGA, nodes need to be told that they will participate in the replication group and who the first leader is. This can be done with the code in the /src/ClusterManagement project running the CommandLineInterface class:
CommandLineInterface $FPGAIP_0:2888;$FPGAIP_1:2888;$FPGAIP_2:2888
......@@ -23,11 +29,31 @@ To add nodes later, run the same class with the original group as first argument
CommandLineInterface $FPGAIP_0:2888;$FPGAIP_1:2888;$FPGAIP_2:2888 $FPGAIP_new:2888
For an overview of the commands that can be sent to the replication control logic (initialize peer, add or remove peers, and set leader), please use the Java code as a starting point.
(2) Sending Commands to the Nodes
=================================
Multes has two types of commands: node-local and replicated ones.
Replicated commands are:
* Set
* [Delete]
Node local commands are:
* Flush
* Get
* [ConditionalGet]
* [Scan]
* SetLocal
* [DeleteLocal]
* ConfigTenantLimits
Note: the operations in square brackets have buggy/incomplete behavior and are not to be used until further code updates.
Sendind requests
================
## Request format definitions
In the current setup there are two ways to execute commands on Caribou.
All requests sent to Multes start with the magic number 0xFFFF and have to be zero-padded to 8byte multiples.
### Replicated
......@@ -45,10 +71,10 @@ These operations are formatted as follows:
Legend:
* x [1B] = reserved to encode node id
* C [1B] = opcode of the operation
* C [1B] = opcode of the operation: 0x01 for SET
* P [2B] = payload (key + value) size in 64bit words. E.g. 4=4*64bit
* E [8B] = reserved to encode epoch, zxid
* K [64B] = key
* K [64B] = key (can be only 64bit long)
* L [2B] = length of value (including these two bytes) in bytes
* V [variable] = value (if no value is needed for the operation, stop at K)
......@@ -57,7 +83,7 @@ Legend:
To perform operations that are local to the node, we use a similar format of the packets as above, but with extra information in bytes 4-7 (see nukv_ht_write_v2.v for opcodes):
FFFF0000PPPPkkQQ
FFFF00CCPPPPkk00
0000000000000000
KKKKKKKKKKKKKKKK
LLLLVVVVVVVVVVVV
......@@ -65,22 +91,39 @@ To perform operations that are local to the node, we use a similar format of the
VVVVVVVVVVVVVVVV
* P [2B] = payload (key + value) size in 64bit words. E.g. 4=4*64bit
* k [1B] = length of key in 64 bit words (can be 01 or 02).
* Q [1B] = node-local command code
* K [64B/128B] = key
* L [2B] = length of value (including these two bytes) in bytes
* CC [1B] = node-local command code: 0x00 for GET, 0x1F for SET-LOCAL, 0xFF for FLUSH
* k [1B] = length of key in 64 bit words (can be only 01).
* K [64B] = key
* L [2B] = length of value (including these two bytes) in bytes -- maximum is 1KB
* V [variable] = value (if no value is needed for the operation, stop at K)
In-code examples of these operations can be found in the Go client in /src/
### Go Client
### Tenant shares config
The Token Buckets can be configured for each tenant separately by piggybacking the information on a regular packet (e.g. on a GET). The tenant choice is implicit depending on what to which port the request is sent to.
The information is to be encoded in bytes 8-16 of the request, in the following way:
Bytes 8-9: 0xBBBB -- magic number
Byte 10: 0x00 or 0x01 -- choice of the first or second token bucket
Byte 11: 0x01 to 0xFF -- how many tokens to add on each "tick"
Byte 12: 0x01 to 0xFF -- how many clock cycles @156MHz for one "tick"
Byte13-14: 2 bytes for maximum burst size as number of 8B words (beware of reverse byte order on the network link)
Byte15-16: 0x0000 padding
For more information on where each token bucker is located, please see the paper Multes paper that describes the architecture.
(3) Go Client
=============
We provide a go client for testing purposes (code needs cleanup).
To populate run:
./caribou -host "$LEADER_IP:2888" -populate -time 120 (-replicate) (-flush)
./client -host "$LEADER_IP:2880" -populate -time 120 (-replicate) (-flush)
To do some mixed ops (50% writes) for 10 seconds
./caribou -host "$LEADER_IP:2888" -setp 0.5 -time 10
./client -host "$LEADER_IP:2880" -setp 0.5 -time 10
...
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment