We use a Ceph Storage Cluster for our storage system. http://docs.ceph.com/docs/master/
Ceph has an underlying system called RADOS which stores the data. On top of that three different mechanisms are provided to access and use the underlying RADOS layer.
The cluster itself consists of basically two components: Monitors
and OSD Daemons
Some important concepts to understand are Pools, the Crush Map and the Placement groups.
Pools: A pool is a logical partition for storing objects. It is basically a part of the overall storage cluster. Each pool has its own number of Placement groups and its own Crush Map. (Different number of replicas, failure domains, etc.)
Placement groups: One pool has a certain number of placement groups, when an object is added it is hashed and put in one placement group, each placement group (based on the number of replicas) puts its object on certain OSDs.
Here is a Placement group calculator for choosing the number of PGs for a pool. https://ceph.com/pgcalc/
More on Placement Groups: http://docs.ceph.com/docs/master/rados/operations/placement-groups/
Placement Groups can be in different states:
Crush Map: The Crush Map finally maps the objects in a placement group to one or several (number of replicas) OSDs. The crush algorithm can be tweaked to take different failure domains in to account.
Server | Purpose | N-Disks | Raw Space | N-Journal Disks | Journal Space |
---|---|---|---|---|---|
mon01-cm | Monitor + (CephFS Metadata) | - | - | 1 x 800GB | 800GB |
sto01-cm | OSD Daemon + Monitor | 15 x 9,9TB | 148,5TB | 2 x 120GB | 240GB |
sto02-cm | OSD Daemon + Monitor | 15 x 9,9TB | 148,5TB | 2 x 120GB | 240GB |
Pool Name | Purpose | Replica Size | Note |
---|---|---|---|
rbd | Default Pool | repl: 2 | Used for Rados Block devices |
one | Opennebula pool | repl: 2 | VM disks for Opennebula |
fs | Ceph FS Data Pool | repl: 2 | - |
fs_meta | Ceph FS Meta Data Pool | repl: 3 | - |
ec | Erasure Coded Pool | erasure: k=8 m=4 | - |
To export CephFS namespaces nfs-ganesha needs to be used.
EXPORT { # Export Id (mandatory, each EXPORT must have a unique Export_Id) Export_Id = 2; # Exported path (mandatory) Path = /I11/sto/student; # Pseudo Path (required for NFS v4) Pseudo = /mnt/public; # Exporting FSAL FSAL { Name = CEPH; User_Id = "I11.fs.sto.student"; } # Export to clients CLIENT { Clients = 131.159.24.0/23, 172.24.24.0/23; Squash = None; Access_Type = RW; } }
ceph osd pool get <pool-name> pg_num
ceph osd pool set <pool-name> pg_num <pg-number> ceph osd pool set <pool-name> pgp_num <pg-number> #first command splits data, second command makes the number available to crush algorithm # values should be equal
ceph tell mon.* version
ceph daemon mds.sto01 config show ceph daemon osd.0 config show
ceph health
ceph status
ceph df
ceph auth list
ceph tell osd.<osd-id> bench
ceph mon dump
ceph osd dump
ceph pg dump ceph pg map <pg-num>
ceph osd getcrushmap -o <output-file> #decompile map crushtool -d <crushmap> -o <decompiled-crushmap> #view with vim or cat
ceph fs dump
ceph tell mds.0 session ls
ceph-deploy osd prepare <node>:<path-to-device/directory>
ceph-deploy osd activate <node>:<path-to-device/directory>
ceph tell mds.* session ls
ceph osd crush reweight osd.<ID> 0.0
#take the osd out of the cluster ceph osd out <ID> #stop the osd daemon for that drive on the host it is running sudo systemctl stop ceph-osd@<ID> #remove osd from crush map ceph osd crush remove osd.<ID> #remove authentication key ceph auth del osd.<ID> #remove the osd ceph osd rm <ID> (optional) #delete partition table on node sudo umount /dev/<drive> sudo wipefs -a /dev/<drive>
ceph osd set nodown ceph osd set noout
sudo umount -lf /mnt/cephfs sudo service ceph stop mds
ceph osd set noscrub ceph osd set nodeep-srub
sudo service ceph start osd.<ID> sudo service ceph start mds sudo mount /mnt/cephfs ceph osd unset noscrub ceph osd unset nodeep-scrub ceph osd unset noout ceph osd unset nodown
ceph osd getcrushmap -o crush-com
crushtool -d crush-com -o crush-dec
vim crush-dec
crushtool -c crush-dec -o crush-com
ceph osd setcrushmap -i crush-com
The actual Map consists of four components:
- Devices
: One device for each OSD daemon/disk
- Bucket Type
: Define the buckets used in Crush hierarchy
- Bucket Instance
: Define the buckets to each other, sets the devices in failure domains
- Rules
: Rules determine data placement for pools
More Information on the Crush Map can be found here
ceph osd crush set <ID> <weight-TB> root=<root-of-tree> ... ceph osd crush set 0 9.01598 root=default host=sto01-1
ceph osd crush remove osd.<ID>
ceph osd crush add-bucket <name> <type> ceph osd crush add-bucket sto01-1 host
ceph osd crush move <name> <crush-location> ceph osd crush move sto01-1 root=ssd room=room1
ceph osd crush remove <name>
ceph osd pool set <name> size <repl-size>
ceph osd dump | grep 'replicated size'
ceph osd pool create <name> <pg_num> <pgp_num> replicated
ceph osd pool ls
ceph osd pool delete <name> <name> --yes-i-really-really-mean-it
ceph osd pool rename <name> <new-name>
sudo systemctl status ceph\*.service ceph\*.target
sudo systemctl stop ceph.target sudo systemctl start ceph.target
sudo systemctl stop ceph-mon\*.service ceph-mon.target sudo systemctl stop ceph-osd\*.service ceph-osd.target sudo systemctl stop ceph-mds\*.service ceph-mds.target #start services sudo systemctl start ceph-osd.target sudo systemctl start ceph-mon.target sudo systemctl start ceph-mds.target
sudo systemctl start ceph-osd@{id} sudo systemctl start ceph-mon@{hostname} sudo systemctl start ceph-mds@{hostname}
rbd create --size 4096 <pool>/<name> sudo rbd feature disable <pool>/<name> exclusive-lock object-map fast-diff deep-flatten sudo rbd map <pool>/<name> --name client.admin sudo mkfs.ext4 -m0 /dev/rbd1 sudo mount /dev/rbd1 /mnt
rbd ls #list in specific pool rbd ls <pool>
rbd info <pool>/<image>
rbd resize --size 2048 <pool>/<name> # increase rbd resize --size 2048 <pool>/<name> --allow-shrink #decrease
sudo umount /mnt sudo rbd unmap /dev/rbd1 rbd rm <pool>/<name> #if still watchers look where rbd showmapped sudo service rbdmap stop rbd rm <pool>/<name> #or rbd info <pool>/<name> rados -p rbd listwatchers rbd_header.<end-of-block-prefix-number>
ceph auth list
ceph auth add client.fs_user mon 'allow r' osd 'allow rwx pool=fs, allow rwx pool=fs_meta' mds 'allow r'
ceph auth get-key client.fs_user | tee client.fs_user.key
ceph auth get client.fs_user -o ceph.client.fs_user.keyring
ceph-deploy install 10.0.60.4 ceph-deploy config push 10.0.60.4
sudo mv ~/ceph.client.fs_user.keyring /etc/ceph/ sudo mv ~/client.fs_user.key /etc/ceph/
ceph --id=fs_user health
sudo apt install ceph-fs-common
sudo mount -t ceph 10.0.10.1:6789:/ /mnt/cephfs -o name=fs_user,secretfile=/etc/ceph/client.fs_user.key
Change user permission
ceph auth caps client.fs_user mds ' allow rw' mon 'allow r' osd 'allow rwx pool=fs'
Path restriction
mds 'allow r' -> allow read only for whole fs mds 'allow rw' -> allow read and write for whole fs mds 'allow r, allo rw path=/data_tonetto' -> allow read for whole fs and only write for path data_tonetto mds 'allow rw path=/datasets' -> allow read and write only to the path datasets
mon ' allow r' -> get object map, access to cluster
osd 'allow rwx pool=fs' -> allow read and write to fs pool
ceph auth caps client.fs_user mds 'allow rw path=/datasets, allow rw path=/data_tonetto' mon 'allow r' osd 'allow rw pool=fs' #on client #install fuse sudo apt install ceph-fuse #mount directories sudo ceph-fuse -n client.fs_user --keyring=/etc/ceph/ceph.client.fs_user.keyring -r /data_tonetto /data sudo ceph-fuse -n client.fs_user --keyring=/etc/ceph/ceph.client.fs_user.keyring -r /datasets /datasets
The installation will be made from one server with ceph-deploy.
Following this guide: http://docs.ceph.com/docs/master/rados/deployment/
ssh mon01-cm mkdir sto_cluster cd sto_cluster wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add - echo deb https://download.ceph.com/debian-kraken/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list sudo apt-get update && sudo apt-get install ceph-deploy ceph-deploy new mon01-cm vim ceph.conf ------------------------------------------------------------ [global] fsid = b2fe6c5c-10d5-4eb3-af02-121d6493d6bf mon_initial_members = mon01 mon_host = 10.0.10.1 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public_network = 10.0.0.0/16 osd_journal_size = 14336 #reasonable number of replicas and placement groups osd_pool_default_size = 3 #Write an object 3 times osd_pool_default_min_size = 1 #Allow writing 1 copy in degraded state osd_pool_default_pg_num = 256 osd_pool_default_pgp_num = 256 ------------------------------------------------------------ #install python on nodes sto01, sto02 sudo apt install python python-apt # enable nat on gateway ceph-deploy install mon01-cm sto01 sto02 ceph-deploy disk list sto01 ceph-deploy mon create-initial #sometimes two monitors are running mon01-cm + mon01 #stop mon01: sudo systemctl stop ceph-mon@mon01 #run create-initial command again sudo chmod +r /etc/ceph/ceph.client.admin.keyring ceph-deploy admin sto01 sto02 #run command on both nodes: sudo chmod +r /etc/ceph/ceph.client.admin.keyring ceph status
ceph-deploy osd prepare sto01:/dev/sdb sto01:/dev/sdc sto01:/dev/sdd sto01:/dev/sde sto01:/dev/sdf sto01:/dev/sdg sto01:/dev/sdh sto01:/dev/sdi sto01:/dev/sdj sto01:/dev/sdk sto01:/dev/sdl sto01:/dev/sdo sto01:/dev/sdp sto01:/dev/sdq sto01:/dev/sdr sto02:/dev/sdb sto02:/dev/sdc sto02:/dev/sdd sto02:/dev/sde sto02:/dev/sdf sto02:/dev/sdg sto02:/dev/sdh sto02:/dev/sdi sto02:/dev/sdj sto02:/dev/sdk sto02:/dev/sdl sto02:/dev/sdo sto02:/dev/sdp sto02:/dev/sdq sto02:/dev/sdr ceph-deploy osd activate sto01:/dev/sdb1 sto01:/dev/sdc1 sto01:/dev/sdd1 sto01:/dev/sde1 sto01:/dev/sdf1 sto01:/dev/sdg1 sto01:/dev/sdh1 sto01:/dev/sdi1 sto01:/dev/sdj1 sto01:/dev/sdk1 sto01:/dev/sdl1 sto01:/dev/sdo1 sto01:/dev/sdp1 sto01:/dev/sdq1 sto01:/dev/sdr1 sto02:/dev/sdb1 sto02:/dev/sdc1 sto02:/dev/sdd1 sto02:/dev/sde1 sto02:/dev/sdf1 sto02:/dev/sdg1 sto02:/dev/sdh1 sto02:/dev/sdi1 sto02:/dev/sdj1 sto02:/dev/sdk1 sto02:/dev/sdl1 sto02:/dev/sdo1 sto02:/dev/sdp1 sto02:/dev/sdq1 sto02:/dev/sdr1 #cluster to reach health ok ceph osd pool rbd set size 2
A profile must be set when creating a new pool. The profile can not be changed later! A new pool has to be created and all the data moved from the first pool to the second.
#create custom profile ceph osd erasure-code-profile get default ceph osd erasure-code-profile set fs1 k=8 m=4 ruleset-failure-domain=osd #optional take another pool ceph osd erasure-code-profile set ruleset-root=ssd #create pool ceph osd pool create ec 12 12 erasure fs1
#create FS pools - recommended to use higher replication level for metadata pool #any data loss can render whole filesystem inaccessible ceph osd pool create fs 256 256 replicated ceph osd pool create fs_meta 256 256 replicated ceph osd pool set fs size 2 ceph osd pool set fs_meta size 3 #create filesystem ceph fs new ceph_fs fs_meta fs #mount filesystem on client ceph-deploy install emu11 #on client: sudo apt install ceph-fs-common ceph-fuse cat /etc/ceph/ceph.client.admin.keyring #copy only key into new file /etc/ceph/admin.secret sudo mount -t ceph 10.0.10.1:6789:/ /mnt/cephfs -o name=admin,secretfile=/etc/ceph/admin.secret #unmount ceph-fuse / ceph kernel client sudo fusermount -u /mnt/cephfs sudo umount /mnt/cephfs
#network benchmark sudo apt install iperf sto01: iperf -s emu12: iperf -c sto01 #disk benchmark sudo hdparm -tT /dev/sdc sudo hdparm -tT --direct /dev/sdc sudo mount /dev/sdc /mnt/tmp cd /mnt/tmp sudo dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc sudo dd if=/dev/zero of=tempfile2 bs=1G count=5 conv=fdatasync,notrunc sudo dd if=tempfile of=/dev/null bs=1M count=1024 sudo dd if=tempfile2 of=/dev/null bs=1G count=5 #bench rados cluster #normal cluster rados bench -p rbd 60 write --no-cleanup #read random for 60 seconds rados bench -p rbd 60 rand #Cleanup Mess on every pool rados -p <pool> cleanup
#network benchmark sudo apt install iperf sto01: iperf -s emu12: iperf -c sto01 9.40 Gbits/sec - sto01 -> sto02 9.41 Gbits/sec - sto01 -> sto02 9.39 Gbits/sec - emu12 -> sto02 9.35 Gbits/sec - emu12 -> sto01 #disk benchmark sudo hdparm -tT /dev/sdc Timing cached reads: 20318 MB in 2.00 seconds = 10168.52 MB/sec Timing buffered disk reads: 700 MB in 3.01 seconds = 232.81 MB/sec sudo hdparm -tT --direct /dev/sdc Timing O_DIRECT cached reads: 886 MB in 2.00 seconds = 442.57 MB/sec Timing O_DIRECT disk reads: 358 MB in 3.00 seconds = 119.31 MB/sec sudo mkdir /mnt/tmp sudo mount /dev/sdc /mnt/tmp cd /mnt/tmp sudo dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.10234 s, 210 MB/s sudo dd if=/dev/zero of=tempfile2 bs=1G count=5 conv=fdatasync,notrunc 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 25.7669 s, 208 MB/s sudo dd if=tempfile of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.204041 s, 5.9 GB/s sudo dd if=tempfile2 of=/dev/null bs=1G count=5 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.280824 s, 4.8 GB/s #disk benchmark - ssd sudo hdparm -tT /dev/sdn4 Timing cached reads: 20768 MB in 2.00 seconds = 10394.23 MB/sec Timing buffered disk reads: 1458 MB in 3.00 seconds = 485.62 MB/sec sudo hdparm -tT --direct /dev/sdn4 Timing O_DIRECT cached reads: 646 MB in 2.00 seconds = 322.43 MB/sec Timing O_DIRECT disk reads: 1494 MB in 3.00 seconds = 497.91 MB/sec sudo mkfs.xfs -f -i size=2048 /dev/sdn4 sudo dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.14751 s, 135 MB/s sudo dd if=/dev/zero of=tempfile2 bs=1G count=5 conv=fdatasync,notrunc 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 40.4716 s, 133 MB/s sudo dd if=tempfile of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.204041 s, 5.8 GB/s sudo dd if=tempfile2 of=/dev/null bs=1G count=5 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.280824 s, 5.2 GB/s #bench rados cluster #normal cluster rados bench -p rbd 60 write --no-cleanup 2017-03-30 16:58:01.468876 min lat: 0.0414839 max lat: 1.96491 avg lat: 0.378728 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 60 16 2529 2513 167.516 96 0.265697 0.378728 Total time run: 60.665624 Total writes made: 2530 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 166.816 Stddev Bandwidth: 57.3734 Max bandwidth (MB/sec): 336 Min bandwidth (MB/sec): 84 Average IOPS: 41 Stddev IOPS: 14 Max IOPS: 84 Min IOPS: 21 Average Latency(s): 0.383575 Stddev Latency(s): 0.259392 Max latency(s): 1.96491 Min latency(s): 0.0414839 # two concurrent on both nodes Total time run: 30.422575 Total writes made: 744 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 97.8221 Stddev Bandwidth: 73.2887 Max bandwidth (MB/sec): 352 Min bandwidth (MB/sec): 16 Average IOPS: 24 Stddev IOPS: 18 Max IOPS: 88 Min IOPS: 4 Average Latency(s): 0.653725 Stddev Latency(s): 0.609964 Max latency(s): 3.22897 Min latency(s): 0.0601762 #read random for 60 seconds rados bench -p rbd 60 rand Total time run: 60.054755 Total reads made: 39269 Read size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 2615.55 Average IOPS: 653 Stddev IOPS: 26 Max IOPS: 725 Min IOPS: 604 Average Latency(s): 0.0237641 Max latency(s): 0.179591 Min latency(s): 0.00335397 #ssd journal cluster rados bench -p data-ssd 60 write --no-cleanup Total time run: 60.250558 Total writes made: 2516 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 167.036 Stddev Bandwidth: 11.7987 Max bandwidth (MB/sec): 196 Min bandwidth (MB/sec): 144 Average IOPS: 41 Stddev IOPS: 3 Max IOPS: 49 Min IOPS: 36 Average Latency(s): 0.383063 Stddev Latency(s): 0.126893 Max latency(s): 1.07908 Min latency(s): 0.0446201 # two concurrent on both nodes Total time run: 30.744405 Total writes made: 682 Write size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 88.7316 Stddev Bandwidth: 14.6719 Max bandwidth (MB/sec): 140 Min bandwidth (MB/sec): 60 Average IOPS: 22 Stddev IOPS: 3 Max IOPS: 35 Min IOPS: 15 Average Latency(s): 0.713941 Stddev Latency(s): 0.306251 Max latency(s): 1.82924 Min latency(s): 0.0950345 #reads Total time run: 60.049589 Total reads made: 37142 Read size: 4194304 Object size: 4194304 Bandwidth (MB/sec): 2474.09 Average IOPS: 618 Stddev IOPS: 24 Max IOPS: 669 Min IOPS: 561 Average Latency(s): 0.0251675 Max latency(s): 0.208302 Min latency(s): 0.00336694 #--------------------------------------- #rados Block Device sudo rbd create image02 --size 4096 --pool data-ssd sudo rbd feature disable image02 exclusive-lock object-map fast-diff deep-flatten --pool data-ssd sudo rbd map image02 --pool data-ssd --name client.admin sudo mkfs.ext4 -m0 /dev/rbd1 sudo mount /dev/rbd1 /mnt/device-bl2 rbd bench-write image02 --pool data-ssd rbd - pool elapsed: 12 ops: 262144 ops/sec: 21747.94 bytes/sec: 89079547.35 data-ssd - pool elapsed: 11 ops: 262144 ops/sec: 23236.06 bytes/sec: 95174905.63
sudo hdparm -tT /dev/rbd0 # normal Timing cached reads: 20254 MB in 2.00 seconds = 10139.08 MB/sec Timing buffered disk reads: 2722 MB in 3.00 seconds = 907.20 MB/sec sudo hdparm --direct -tT /dev/rbd0 Timing O_DIRECT cached reads: 3720 MB in 2.00 seconds = 1860.49 MB/sec Timing O_DIRECT disk reads: 4096 MB in 2.96 seconds = 1384.69 MB/sec sudo hdparm -tT /dev/rbd1 #ssd Timing cached reads: 20072 MB in 2.00 seconds = 10047.58 MB/sec Timing buffered disk reads: 2766 MB in 3.00 seconds = 921.83 MB/sec sudo hdparm --direct -tT /dev/rbd1 Timing O_DIRECT cached reads: 3872 MB in 2.00 seconds = 1936.37 MB/sec Timing O_DIRECT disk reads: 4096 MB in 1.77 seconds = 2311.77 MB/sec #normal sudo dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.1852 s, 257 MB/s sudo dd if=tempfile of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.184213 s, 5.8 GB/s #ssd sudo dd if=/dev/zero of=tempfile2 bs=1M count=1024 conv=fdatasync,notrunc 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.91692 s, 155 MB/s sudo dd if=tempfile of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.184213 s, 5.8 GB/s
Raw Network Speed iperf -s iperf -c <address> 9.40 Gbits/sec - sto01 -> sto02 9.41 Gbits/sec - sto01 -> sto02 9.39 Gbits/sec - emu12 -> sto02 9.35 Gbits/sec - emu12 -> sto01 -> ca. 1170 MB/sec Raw Speed sudo hdparm -tT /dev/<drive> ssd Timing cached reads: 10220.44 MB/sec Timing cached reads: 10014.83 MB/sec Timing cached reads: 10406.89 MB/sec Timing cached reads: 10233.53 MB/sec Timing buffered reads: 321.97 MB/sec Timing buffered reads: 372.85 MB/sec Timing buffered reads: 324.69 MB/sec Timing buffered reads: 371.13 MB/sec normal Timing cached reads: 9862.59 MB/sec 10251.11 MB/sec 10005.25 MB/sec 10581.66 MB/sec Timing buffered reads: 237.98 MB/sec 240.19 MB/sec 233.71 MB/sec 208.83 MB/sec ssd-DIRECT Timing cached reads: 345.82 MB/sec 372.97 MB/sec 377.45 MB/sec 370.21 MB/sec Timing buffered reads: 404.91 MB/sec 436.78 MB/sec 436.65 MB/sec 411.83 MB/sec normal-DIRECT Timing cached reads: 879.61 MB/sec 853.61 MB/sec 437.60 MB/sec 867.01 MB/sec Timing buffered reads: 119.51 MB/sec 121.01 MB/sec 175.96 MB/sec 112.51 MB/sec Rados Bench rados bench -p <pool> 10/20/30/60 write --no-cleanup ssd-primary Bandwidth (MB/sec): 246.749 Bandwidth (MB/sec): 216.592 Bandwidth (MB/sec): 232.526 Bandwidth (MB/sec): 186.368 ssd-only Bandwidth (MB/sec): 118.422 Bandwidth (MB/sec): 121.656 Bandwidth (MB/sec): 116.717 Bandwidth (MB/sec): 108.17 rbd-normal Bandwidth (MB/sec): 260.393 Bandwidth (MB/sec): 282.246 Bandwidth (MB/sec): 268.01 Bandwidth (MB/sec): 271.671 erasure-coded Bandwidth (MB/sec): 244.602 Bandwidth (MB/sec): 231.217 Bandwidth (MB/sec): 242.125 Bandwidth (MB/sec): 229.342 Rados Read Seq rados bench -p <pool> 10/20/30 seq ssd-primary Bandwidth (MB/sec): 975.046 Bandwidth (MB/sec): 1025.61 Bandwidth (MB/sec): 1030.69 ssd-only Bandwidth (MB/sec): 1071.45 Bandwidth (MB/sec): 1065.46 Bandwidth (MB/sec): 1045.66 rbd-normal Bandwidth (MB/sec): 1009.87 Bandwidth (MB/sec): 1022.06 Bandwidth (MB/sec): 1054.07 erasure-coded Bandwidth (MB/sec): 962.095 Bandwidth (MB/sec): 956.349 Bandwidth (MB/sec): 974.866 Rados Read Rand rados bench -p <pool> 10/20/30 rand ssd-primary Bandwidth (MB/sec): 993.921 Bandwidth (MB/sec): 1034.7 Bandwidth (MB/sec): 1062.02 ssd-only Bandwidth (MB/sec): 1055.95 Bandwidth (MB/sec): 1039.78 Bandwidth (MB/sec): 1063.78 rbd-normal Bandwidth (MB/sec): 1037.47 Bandwidth (MB/sec): 1057.92 Bandwidth (MB/sec): 1049.02 erasure-coded Bandwidth (MB/sec): 964.135 Bandwidth (MB/sec): 952.147 Bandwidth (MB/sec): 950.543 --> Network bottleneck: 1170 MB/sec --> Single Hard Drive Read: 230 MB/sec normal 130 MB/sec direct --> Single SSD Read: 330 MB/sec normal 430 MB/sec direct --> All Cluster Read Operations are capped by network: 1050 MB/sec --> Erasure Coded Pool slightly less 955 MB/sec --> Single Hard Drive Write: 210 MB/sec --> Single SSD Write: 130 MB/sec --> Cluster Write Operations: 265 MB/sec normal 120 MB/sec ssd 235 MB/sec erasure-coded