RBD块存储
块存储是存储区域网络中使用的一个数据存储类型。在这种类型中,数据以块的形式存储在卷里,卷会挂载到节点上。可以为应用程序提供更大的存储容量,并且可靠性和性能都更高。
RBD协议,也就是Ceph块设备 (Ceph Block Device)。RBD除了可靠性和性能之外,还支持完整和增量式快照,精简的配置,写时复制(copy-on-write)式克隆。并且支持全内存式缓存。
目前CEPH RBD支持的最大镜像为16EB,镜像可以直接作为磁盘映射到物理裸机,虚拟机或者其他主机使用,KVM和Xen完全支持RBD,VMware等云厂商也支持RBD模式
创建资源池Pool
ceph osd pool create sunday 64 64
# sunday 为pool名称
# pg为64个 (pg和pgp数量需要一致)
# pgp为64个
# 没指定副本数,默认为三个副本
查看pool
[root@ceph01 ceph]# ceph osd lspools
1 sunday
查看 sunday pool 副本数
默认情况下为3个副本数 保证高可用
[root@ceph01 ceph]# ceph osd pool get sunday size
size: 3
这里也可以按需 修改副本数
[root@ceph01 ceph]# ceph osd pool set sunday size 2
set pool 1 size to 2
[root@ceph01 ceph]# ceph osd pool get sunday size
size: 2
RBD创建和映射
在创建镜像前我们还需要修改一下features值
在Centos7内核上,rbd很多特性都不兼容,目前3.0内核仅支持layering。所以我们需要删除其他特性
- layering: 支持分层
- striping: 支持条带化 v2
- exclusive-lock: 支持独占锁
- object-map: 支持对象映射(依赖 exclusive-lock)
- fast-diff: 快速计算差异(依赖 object-map)
- deep-flatten: 支持快照扁平化操作
- journaling: 支持记录 IO 操作(依赖独占锁)
关闭不支持的特性一种是通过命令的方式修改,还有一种是在ceph.conf中添加rbd_default_features = 1
来设置默认 features(数值仅是 layering 对应的 bit 码所对应的整数值)。
cd /etc/ceph/
echo "rbd_default_features = 1" >>ceph.conf
ceph-deploy --overwrite-conf config push ceph01 ceph02 ceph03
#当然也在rbd创建后手动删除,这种方式设置是临时性,一旦image删除或者创建新的image 时,还会恢复默认值。`
rbd feature disable sunady/sunday-rbd.img deep-flatten`
rbd feature disable sunady/sunday-rbd.img fast-diff`
rbd feature disable sunady/sunday-rbd.img object-map`
rbd feature disable sunady/sunday-rbd.img exclusive-lock`
RBD创建例子
rbd create -p sunday --image sunday-rbd.img --size 15G
#rbd create sunday/sunday-rbd.img --size 15G # 简写
查看RBD
[root@ceph01 ceph]# rbd -p sunday ls
sunday-rbd.img
删除RBD
[root@ceph01 ceph]# rbd rm sunday/sunday-rbd.img
[root@ceph01 ceph]# rbd info -p sunday --image sunday-rbd.img
[root@ceph01 ceph]# rbd info sunday/sunday-rbd.img
rbd image 'sunday-rbd.img':
size 15 GiB in 3840 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 126f1be88a69
block_name_prefix: rbd_data.126f1be88a69
format: 2
features: layering
op_features:
flags:
create_timestamp: Tue Apr 16 12:34:10 2024
access_timestamp: Tue Apr 16 12:34:10 2024
modify_timestamp: Tue Apr 16 12:34:10 2024
块文件挂载
接下来我们要进行rbd的挂载 (这里不建议分区,如果分区,后续扩容比较麻烦,容易存在丢数据的情况。在分区不够用的情况下建议多块rbd)
[root@ceph01 ceph]# rbd map sunday/sunday-rbd.img
/dev/rbd0
[root@ceph01 ceph]# rbd device list
id pool namespace image snap device
0 sunday sunday-rbd.img - /dev/rbd0
fdisk 可以看到/dev/rbd0 16GB
[root@ceph01 ceph]# fdisk -l
Disk /dev/sda: 36.5 GB, 36507222016 bytes, 71303168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000eb3ad
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 1050623 524288 83 Linux
/dev/sda2 1050624 71303167 35126272 83 Linux
Disk /dev/sdb: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/ceph--207f99ae--521e--421d--89ae--9dc47dede9a3-osd--block--89a69449--9021--428f--a70c--a8840a61d401: 107.4 GB, 107369988096 bytes, 209707008 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/rbd0: 16.1 GB, 16106127360 bytes, 31457280 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4194304 bytes / 4194304 bytes
这里不建议进行分区
直接格式化 并挂载
[root@ceph01 ceph]# mkfs.ext4 /dev/rbd0
[root@ceph01 ceph]# mount /mnt/rbd0
[root@ceph01 ceph]# mount /dev/rbd0 /mnt/rbd0
[root@ceph01 ceph]# echo "123" > /mnt/rbd0/sunday.txt
[root@ceph01 ceph]# ls -l /mnt/rbd0
total 20
drwx------ 2 root root 16384 Apr 16 12:41 lost+found
-rw-r--r-- 1 root root 4 Apr 16 12:42 sunday.txt
RBD扩容
这里将sunday-rbd 大小为15G扩容到20G
[root@ceph01 ceph]# rbd info sunday/sunday-rbd.img
rbd image 'sunday-rbd.img':
size 15 GiB in 3840 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 126f1be88a69
block_name_prefix: rbd_data.126f1be88a69
format: 2
features: layering
op_features:
flags:
create_timestamp: Tue Apr 16 12:34:10 2024
access_timestamp: Tue Apr 16 12:34:10 2024
modify_timestamp: Tue Apr 16 12:34:10 2024
[root@ceph01 ceph]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 12M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sda2 34G 2.0G 32G 6% /
/dev/sda1 488M 113M 340M 25% /boot
tmpfs 1.9G 52K 1.9G 1% /var/lib/ceph/osd/ceph-0
tmpfs 378M 0 378M 0% /run/user/0
/dev/rbd0 15G 41M 14G 1% /mnt
rbd扩容命令
[root@ceph01 ceph]# rbd resize sunday/sunday-rbd.img --size 20G
Resizing image: 100% complete...done.
rbd扩容完成 但文件系统还未识别扩容
[root@ceph01 ceph]# rbd resize sunday/sunday-rbd.img --size 20G
Resizing image: 100% complete...done.
[root@ceph01 ceph]# rbd info sunday/sunday-rbd.img
rbd image 'sunday-rbd.img':
size 20 GiB in 5120 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 126f1be88a69
block_name_prefix: rbd_data.126f1be88a69
format: 2
features: layering
op_features:
flags:
create_timestamp: Tue Apr 16 12:34:10 2024
access_timestamp: Tue Apr 16 12:34:10 2024
modify_timestamp: Tue Apr 16 12:34:10 2024
```
使用resize2fs 文件系统扩容识别
```bash
[root@ceph01 ceph]# df -h | grep "/dev/rbd0"
/dev/rbd0 15G 41M 14G 1% /mnt
[root@ceph01 ceph]# resize2fs /dev/rbd0
resize2fs 1.42.9 (28-Dec-2013)
Filesystem at /dev/rbd0 is mounted on /mnt; on-line resizing required
old_desc_blocks = 2, new_desc_blocks = 3
The filesystem on /dev/rbd0 is now 5242880 blocks long.
[root@ceph01 ceph]# df -h | grep "/dev/rbd0"
/dev/rbd0 20G 44M 19G 1% /mnt
扩容一般会涉及三方面的内容:
1.底层存储(rbd resize)
2.磁盘分区的扩容 (例如mbr分区)
3.Linux文件系统的扩容
所以不建议在rbd块设备进行分区
Ceph 警告处理
当我们osd有数据写入时,我们在查看ceph集群。发现ceph集群目前有警告这时候我们就需要处理这些警告
[root@ceph01 ceph]# ceph -s
cluster:
id: ed040fb0-fa20-456a-a9f0-c9a96cdf089e
health: HEALTH_WARN
application not enabled on 1 pool(s)
services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 25h)
mgr: ceph01(active, since 24h), standbys: ceph02, ceph03
osd: 3 osds: 3 up (since 24h), 3 in (since 24h)
data:
pools: 1 pools, 64 pgs
objects: 83 objects, 221 MiB
usage: 3.7 GiB used, 296 GiB / 300 GiB avail
pgs: 64 active+clean
当我们创建pool资源池后,必须制定它使用ceph应用的类型 (ceph块设备、ceph对象网关、ceph文件系统)
如果我们不指定类型,集群health会提示HEALTH_WARN
查看ceph健康详情的信息
[root@ceph01 ceph]# ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'sunday'
use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.
接下来我们将这个pool资源池进行分类,将sunday pool标示为rbd类型
[root@ceph01 ceph]# ceph osd pool application enable sunday rbd
enabled application 'rbd' on pool 'sunday'
[root@ceph01 ceph]# ceph osd pool application get sunday
{
"rbd": {}
}
[root@ceph01 ceph]# ceph health detail
HEALTH_OK
当我们初始化后,rbd会将我们的pool修改为rbd格式。 健康状态自然就不会报错
设置完毕后,我们通过下面的命令可以看到pool目前的类型属于rbd类型
若处理完application告警,我们继续查看ceph健康信息
[root@ceph01 ceph]# ceph -s
cluster:
id: ed040fb0-fa20-456a-a9f0-c9a96cdf089e
health: HEALTH_WARN
2 daemons have recently crashed
[root@ceph01 ceph]# ceph health detail
HEALTH_WARN 2 daemons have recently crashed; too many PGs per OSD (672 > max 250)
RECENT_CRASH 2 daemons have recently crashed
mon.ceph03 crashed on host ceph03 at 2024-04-24 05:52:10.099517Z
mon.ceph02 crashed on host ceph02 at 2024-04-24 05:52:14.566149Z
这里我们发现monitor ceph01节点有告警,大致意思是ceph01节点有一个crashed守护进程崩溃了。
官方解释如下
One or more Ceph daemons has crashed recently, and the crash has not yet been archived (acknowledged) by the administrator. This may indicate a software bug, a hardware problem (e.g., a failing disk), or some other problem.
一个或多个Ceph守护进程最近崩溃,管理员尚未存档(确认)崩溃。这可能表示软件错误、硬件问题(例如,磁盘故障)或其他问题。
这个报错并不影响我们,我们可以通过下面的命令看到crashed进程 (只要我们其他组件都是正常的,那么这一条就多半是误报。生产环境中处理这个故障还是要根据实际情况进行处理,不可以盲目的删除告警)
[root@ceph01 ceph]# ceph crash ls-new
ID ENTITY NEW
2024-04-24_05:52:10.099517Z_b19c3b1a-0f27-4424-90dd-d78a3a460d36 mon.ceph03 *
2024-04-24_05:52:14.566149Z_6eac1e20-e4e2-4bc8-b342-5e1c891a32c2 mon.ceph02 *
同时还可以使用ceph crash info [ID]查看进程详细信息
那么如何处理这个警告呢
第一种方法 (适合处理单个告警)
[root@ceph01 ~]# ceph crash archive <ID>
第二种方法 (将所有的crashed打包归档)
[root@ceph01 ~]# ceph crash archive-all
[root@ceph01 ~]# ceph crash ls-new
我们再次查看状态就已经恢复
[root@ceph01 ~]# ceph -s
故障处理
[root@ceph01 ceph]# ceph -s
cluster:
id: ed040fb0-fa20-456a-a9f0-c9a96cdf089e
health: HEALTH_WARN
too many PGs per OSD (672 > max 250)
1/3 mons down, quorum ceph01,ceph03
[root@ceph01 ceph]# vim /etc/ceph/ceph.conf
[global]
...
# 添加
mon_max_pg_per_osd = 0
[root@ceph01 ceph]# ceph-deploy --overwrite-conf config push ceph01 ceph02 ceph03
[root@ceph01 ceph]# export HOSTS="ceph01 ceph02 ceph03"
[root@ceph01 ceph]# for i in $HOSTS; do ssh $i "systemctl restart ceph-mgr.target";done
自 ceph版本Luminous v12.2.x以后,参数mon_pg_warn_max_per_osd变更为mon_max_pg_per_osd,默认值也从300变更为200,修改该参数后,也由原来的重启ceph-mon.target服务变为重启ceph-mgr.target服务。
https://www.jianshu.com/p/f2b20a175702