OpenStack Swift源码分析(4)----swift-ring-builder源代码解析之一
本帖最后由 pig2 于 2014-11-21 15:12 编辑问题导读
1.ring组件在swift中的作用是什么?
2.swift-ring-builder中包含了对ring的哪些操作方法?
static/image/hrline/4.gif
ring是swift中的核心组件,它描述和决定了数据如何在集群系统中分布。其中的一致性哈希算法更是核心内容之一。
swift-ring-builder中包含了对ring的各种操作方法,包括create、default、search、list_parts、add、set_weight、set_info、remove、rebalance、validate、write_ring、pretend_min_part_hours_passed、set_min_part_hours和set_replicas等方法,我将会逐个解析这写方法的实现过程;
首先来看swift-ring-builder中的main方法:
if __name__ == '__main__':
if len(argv) < 2:
print "swift-ring-builder %(MAJOR_VERSION)s.%(MINOR_VERSION)s\n" % globals()
print Commands.default.__doc__.strip()
cmds = [c for c, f in Commands.__dict__.iteritems()
if f.__doc__ and c != '_' and c != 'default']
cmds.sort()
for cmd in cmds:
print Commands.__dict__.__doc__.strip()
print RingBuilder.search_devs.__doc__.strip()
for line in wrap(' '.join(cmds), 79, initial_indent='Quick list: ',
subsequent_indent=' '):
print line
print ('Exit codes: 0 = operation successful\n'
' 1 = operation completed with warnings\n'
' 2 = error')
exit(EXIT_SUCCESS)
# 验证argv的路径是否存在;
if exists(argv):
# builder从argv指定文件获取初始化数据;
# 再对RingBuilder类进行实例化;
# 根据具体情况决定是否改变builder的初始化数据;
builder = RingBuilder.load(argv)
# 如果argv不存在,而且调用的还不是'create'命令,则退出;
elif len(argv) < 3 or argv != 'create':
print 'Ring Builder file does not exist: %s' % argv
exit(EXIT_ERROR)
# 获取文件夹'backups'的整体路径;
backup_dir = pathjoin(dirname(argv), 'backups')
# 新建文件夹'backups';
try:
mkdir(backup_dir)
except OSError, err:
if err.errno != EEXIST:
raise
# 获取argv,作为ring_file;
ring_file = argv
if ring_file.endswith('.builder'):
ring_file = ring_file[:-len('.builder')]
ring_file += '.ring.gz'
if len(argv) == 2:
command = "default"
else:
command = argv
if argv.endswith('-safe'):
try:
with lock_parent_directory(abspath(argv), 15):
Commands.__dict__.get(command, Commands.unknown.im_func)()
except exceptions.LockTimeout:
print "Ring/builder dir currently locked."
exit(2)
else:
Commands.__dict__.get(command, Commands.unknown.im_func)()
这个main方法主要做了以下几件事:
(1)检测命令行的正确性;
(2)如果argv文件存在,则调用类RingBuilder中的方法load加载argv文件,并返回类RingBuilder的实例化对象;
(3)建立backups文件夹;
(4)初始化ring_file文件;
(5)调用运行command中指定的处理ring的方法;
在类Commands中有若干处理ring的方法,如create、default、search、list_parts、add、set_weight、set_info、remove、rebalance、validate、write_ring、pretend_min_part_hours_passed、set_min_part_hours和set_replicas等等。下面我们来逐一解析这些方法:
1.来看方法create:
def create():
if len(argv) < 6:
print Commands.create.__doc__.strip()
exit(EXIT_ERROR)
builder = RingBuilder(int(argv), float(argv), int(argv))
backup_dir = pathjoin(dirname(argv), 'backups')
try:
mkdir(backup_dir)
except OSError, err:
if err.errno != EEXIST:
raise
# Python中可以使用 pickle 模块将对象转化为文件保存在磁盘上,在需要的时候再读取并还原。
# 这里即是把转化为字典格式的builder写入到文件pathjoin(backup_dir,'%d.' % time() + basename(argv))文件中;
# builder.to_dict():以字典的形式返回初始化的RingBuilder类的对象;
pickle.dump(builder.to_dict(), open(pathjoin(backup_dir,'%d.' % time() + basename(argv)), 'wb'), protocol=2)
# 再把转化为字典格式的builder写入到argv指明的文件中;
pickle.dump(builder.to_dict(), open(argv, 'wb'), protocol=2)
exit(EXIT_SUCCESS)
这个方法主要实现的功能是:
(1)根据命令初始化类RingBuilder,获取类RingBuilder的实例化对象;
(2)创建用于备份的文件夹'backups';
(3)使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;这里具体是把转化为字典格式的builder保存两份,一份写入到建立的文件夹'backups'中的指定文件中,一份写入到argv指明的文件中;
2.来看方法default:
def default():
print '%s, build version %d' % (argv, builder.version)
regions = 0
zones = 0
balance = 0
dev_count = 0
if builder.devs:
regions = len(set(d['region'] for d in builder.devs
if d is not None))
zones = len(set((d['region'], d['zone']) for d in builder.devs
if d is not None))
dev_count = len([d for d in builder.devs
if d is not None])
balance = builder.get_balance()
print '%d partitions, %.6f replicas, %d regions, %d zones, ' \
'%d devices, %.02f balance' % (builder.parts, builder.replicas,
regions, zones, dev_count,
balance)
print 'The minimum number of hours before a partition can be ' \
'reassigned is %s' % builder.min_part_hours
if builder.devs:
print 'Devices: idregionzone ip addressport' \
' name weight partitions balance meta'
weighted_parts = builder.parts * builder.replicas / sum(d['weight'] for d in builder.devs if d is not None)
for dev in builder.devs:
if dev is None:
continue
if not dev['weight']:
if dev['parts']:
balance = 999.99
else:
balance = 0
else:
balance = 100.0 * dev['parts'] / (dev['weight'] * weighted_parts) - 100.0
print(' %5d %5d %5d %15s %5d %9s %6.02f %10s'
'%7.02f %s' %
(dev['id'], dev['region'], dev['zone'], dev['ip'],
dev['port'], dev['device'], dev['weight'], dev['parts'],
balance, dev['meta']))
exit(EXIT_SUCCESS)
这个方法主要实现的功能是显示ring和设备内部的信息;
3.来看方法search:
def search():
if len(argv) < 4:
print Commands.search.__doc__.strip()
print builder.search_devs.__doc__.strip()
exit(EXIT_ERROR)
# search_devs:<search-value>可以通过<device_id>/<region>/<zone>-<ip>:<port>/<device_name>_<meta>几种格式来进行查询;
devs = builder.search_devs(argv)
if not devs:
print 'No matching devices found'
exit(EXIT_ERROR)
print 'Devices: idregionzone ip addressport name ' \
'weight partitions balance meta'
weighted_parts = builder.parts * builder.replicas / sum(d['weight'] for d in builder.devs if d is not None)
for dev in devs:
if not dev['weight']:
if dev['parts']:
balance = 999.99
else:
balance = 0
else:
balance = 100.0 * dev['parts'] / \
(dev['weight'] * weighted_parts) - 100.0
print(' %5d %5d %5d %15s %5d %9s %6.02f %10s %7.02f %s' %
(dev['id'], dev['region'], dev['zone'], dev['ip'],
dev['port'], dev['device'], dev['weight'], dev['parts'],
balance, dev['meta']))
exit(EXIT_SUCCESS)
这个方法主要实现了显示匹配的设备信息的功能;
主要做了以下几件事:
(1)验证命令行正确性;
(2)调用方法search_devs实现对设备信息的搜索功能;
(3)遍历得到的匹配设备信息,组成输出信息并进行打印输出;
我们来看看方法search_devs:
def search_devs(self, search_value):
"""
可以通过//-:/_几种格式来进行查询;
Examples::
d74 Matches the device id 74
r4 Matches devices in region 4
z1 Matches devices in zone 1
z1-1.2.3.4 Matches devices in zone 1 with the ip 1.2.3.4
1.2.3.4 Matches devices in any zone with the ip 1.2.3.4
z1:5678 Matches devices in zone 1 using port 5678
:5678 Matches devices that use port 5678
/sdb1 Matches devices with the device name sdb1
_shiny Matches devices with shiny in the meta data
_"snet: 5.6.7.8" Matches devices with snet: 5.6.7.8 in the meta data
[::1] Matches devices in any zone with the ip ::1
z1-[::1]:5678 Matches devices in zone 1 with ip ::1 and port 5678
Most specific example::
d74r4z1-1.2.3.4:5678/sdb1_"snet: 5.6.7.8"
"""
orig_search_value = search_value
match = []
if search_value.startswith('d'):
......
if search_value.startswith('r'):
......
if search_value.startswith('z'):
......
if search_value.startswith('-'):
......
if len(search_value) and search_value.isdigit():
......
elif len(search_value) and search_value == '[':
......
if search_value.startswith(':'):
......
if search_value.startswith('/'):
......
if search_value.startswith('_'):
......
if search_value:
......
matched_devs = []
for dev in self.devs:
if not dev:
continue
matched = True
for key, value in match:
if key == 'meta':
if value not in dev.get(key):
matched = False
elif dev.get(key) != value:
matched = False
if matched:
matched_devs.append(dev)
return matched_devs
在这个方法中我们要注意的地方就是搜索指令的格式;
4.来看方法add:
def add():
"""
swift-ring-builder <builder_file> add
z<zone>-<ip>:<port>/<device_name>_<meta> <weight>
[z<zone>-<ip>:<port>/<device_name>_<meta> <weight>] ...
使用给定的信息添加新的设备到ring上;
add操作不会分配partitions到新的设备上,只有运行了'rebalance'命令后,才会进行分区的分配;
因此,这种机制可以允许我们一次添加多个设备,并只执行一次'rebalance'实现对这些设备的分区分配;
使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;
这里具体是把转化为字典格式的builder写入到argv指定文件中;
"""
if len(argv) < 5 or len(argv) % 2 != 1:
print Commands.add.__doc__.strip()
exit(EXIT_ERROR)
# itertools.islice(iterable, start, stop[, step])
# islice('ABCDEFG', 2) --> A B
# islice('ABCDEFG', 2, 4) --> C D
# islice('ABCDEFG', 2, None) --> C D E F G
# islice('ABCDEFG', 0, None, 2) --> A C E G
# 从命令行获取匹配的devstr和weightstr集合;
devs_and_weights = izip(islice(argv, 3, len(argv), 2),islice(argv, 4, len(argv), 2))
for devstr, weightstr in devs_and_weights:
region = 1
rest = devstr
if devstr.startswith('r'):
i = 1
while i < len(devstr) and devstr.isdigit():
i += 1
region = int(devstr)
rest = devstr
else:
stderr.write("WARNING: No region specified for %s. "
"Defaulting to region 1.\n" % devstr)
if not rest.startswith('z'):
print 'Invalid add value: %s' % devstr
exit(EXIT_ERROR)
i = 1
while i < len(rest) and rest.isdigit():
i += 1
zone = int(rest)
rest = rest
if not rest.startswith('-'):
print 'Invalid add value: %s' % devstr
print "The on-disk ring builder is unchanged.\n"
exit(EXIT_ERROR)
i = 1
if rest == '[':
i += 1
while i < len(rest) and rest != ']':
i += 1
i += 1
ip = rest.lstrip('[').rstrip(']')
rest = rest
else:
while i < len(rest) and rest in '0123456789.':
i += 1
ip = rest
rest = rest
if not rest.startswith(':'):
print 'Invalid add value: %s' % devstr
print "The on-disk ring builder is unchanged.\n"
exit(EXIT_ERROR)
i = 1
while i < len(rest) and rest.isdigit():
i += 1
port = int(rest)
rest = rest
if not rest.startswith('/'):
print 'Invalid add value: %s' % devstr
print "The on-disk ring builder is unchanged.\n"
exit(EXIT_ERROR)
i = 1
while i < len(rest) and rest != '_':
i += 1
device_name = rest
rest = rest
meta = ''
if rest.startswith('_'):
meta = rest
try:
weight = float(weightstr)
except ValueError:
print 'Invalid weight value: %s' % weightstr
print "The on-disk ring builder is unchanged.\n"
exit(EXIT_ERROR)
if weight < 0:
print 'Invalid weight value (must be positive): %s' % weightstr
print "The on-disk ring builder is unchanged.\n"
exit(EXIT_ERROR)
for dev in builder.devs:
if dev is None:
continue
if dev['ip'] == ip and dev['port'] == port and \
dev['device'] == device_name:
print 'Device %d already uses %s:%d/%s.' % \
(dev['id'], dev['ip'], dev['port'], dev['device'])
print "The on-disk ring builder is unchanged.\n"
exit(EXIT_ERROR)
# 增加一个device到ring;
# 这个方法不会马上执行ring的重新平衡操作,因为我们可能需要在重新平衡操作之前进行多次改变;
# 这样做也是为了提高效率的;
builder.add_dev({'region': region, 'zone': zone, 'ip': ip,
'port': port, 'device': device_name,
'weight': weight, 'meta': meta})
new_dev = builder.search_devs('r%dz%d-%s:%s/%s' % (region, zone, ip, port, device_name))['id']
if ':' in ip:
print(
'Device r%dz%d-[%s]:%s/%s_"%s" with %s weight got id %s' %
(region, zone, ip, port,
device_name, meta, weight, new_dev))
else:
print('Device r%dz%d-%s:%s/%s_"%s" with %s weight got id %s' %
(region, zone, ip, port,
device_name, meta, weight, new_dev))
# 使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;
# 这里具体是把转化为字典格式的builder写入到argv指定文件中;
pickle.dump(builder.to_dict(), open(argv, 'wb'), protocol=2)
exit(EXIT_SUCCESS)
可以知道命令行格式:
swift-ring-builder <builder_file> add
z<zone>-<ip>:<port>/<device_name>_<meta> <weight>
[z<zone>-<ip>:<port>/<device_name>_<meta> <weight>]
这个方法主要做了以下几件事:
(1)验证命令行的正确性;
(2)从命令行获取匹配的devstr和weightstr集合;
调试示例:
argv = ['/usr/bin/swift-ring-builder', 'account.builder', 'add', 'z1-127.0.0.1:6012/sda3', '1']
izip(islice(argv, 3, len(argv), 2),islice(argv, 4, len(argv), 2))) = ('z1-127.0.0.1:6012/sda3', '1')
(3)遍历每一对匹配的devstr和weightstr,从devstr和weightstr中获取region、zone、ip、port、device_name、weight和meta的值。调用add_dev方法实现增加这个device信息到ring,这个方法不会马上执行ring的重新平衡操作,因为可能需要在重新平衡操作之前进行多次改变(正如这里是个遍历循环操作,如果命令行中一次增加多个设备信息,就需要执行多次add_dev方法)。这样做也是为了提高效率的。
(4)最后使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原。这里具体是把转化为字典格式的builder写入到argv指定文件中。
5.来看方法set_weight:
def set_weight():
"""
swift-ring-builder <builder_file> set_weight <search-value> <weight>
[<search-value> <weight] ...
"""
if len(argv) < 5 or len(argv) % 2 != 1:
print Commands.set_weight.__doc__.strip()
print builder.search_devs.__doc__.strip()
exit(EXIT_ERROR)
devs_and_weights = izip(islice(argv, 3, len(argv), 2),
islice(argv, 4, len(argv), 2))
for devstr, weightstr in devs_and_weights:
devs = builder.search_devs(devstr)
weight = float(weightstr)
if not devs:
print("Search value \"%s\" matched 0 devices.\n"
"The on-disk ring builder is unchanged.\n"
% devstr)
exit(EXIT_ERROR)
if len(devs) > 1:
print 'Matched more than one device:'
for dev in devs:
print ' d%(id)sz%(zone)s-%(ip)s:%(port)s/%(device)s_' \
'"%(meta)s"' % dev
if raw_input('Are you sure you want to update the weight for '
'these %s devices? (y/N) ' % len(devs)) != 'y':
print 'Aborting device modifications'
exit(EXIT_ERROR)
for dev in devs:
# 设置device的weight值;
# 这个方法不是仅仅直接在device字典中变更weight值;
# 还有builder将会需要重新设置一些内部状态来反应weight值的改变;
builder.set_dev_weight(dev['id'], weight)
print 'd%(id)sz%(zone)s-%(ip)s:%(port)s/%(device)s_' \
'"%(meta)s" weight set to %(weight)s' % dev
pickle.dump(builder.to_dict(), open(argv, 'wb'), protocol=2)
exit(EXIT_SUCCESS)
这个方法实现了重新设置设备的weight。set_weight操作后,设备上的partition不会重新分配,只有运行了'rebalance'命令后才会进行分区的分配。因此,这种机制可以允许你一次添加多个设备,并只执行一次'rebalance'实现对这些设备的分区分配。
这个方法做了以下几件事:
(1)验证命令行正确性;
(2)从命令行获取匹配的devstr和weightstr集合;
(3)遍历每一对匹配的devstr和weightstr。调用search_devs方法根据devstr查询得到对应的devs,也就是要修改weight值的devs。遍历得到的devs,调用set_dev_weight方法为每一个dev重新设置weight值为float(weightstr)。所以每一组devs的weight的值都是相同的。
(4)最后使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原。这里具体是把转化为字典格式的builder写入到argv指定文件中。
这里再来看看方法set_dev_weight,方法比较容易理解,具体解析如注释所示;
def set_dev_weight(self, dev_id, weight):
"""
设置device的weight值;
这个方法不是仅仅直接在device字典中变更weight值;
还有builder将会需要重新设置一些内部状态来反应weight值的改变;
"""
self.devs['weight'] = weight
# _set_parts_wanted:方法根据dev的weight计算dev除了目前已经分配的partition数目而外,还要分配的partition数目;
self._set_parts_wanted()
# 设置devs_changed为TRUE,说明已经改变;
self.devs_changed = True
self.version += 1
这里再来看看方法_set_parts_wanted,这个方法会在很多地方被调用,这里先来解析一下,具体解析如注释所示:
def _set_parts_wanted(self):
"""
方法根据dev的weight计算dev除了目前已经分配的partition数目而外,还要分配的partition数目;
计算方法是:
1.首先计算每个partition的weight,即将partition数目乘以副本数得到总的partition数目;
然后除以现有dev的weight总和,得到每个partition的权重;
实际上得到的是单位权重对应的partition数目;
2.dev根据上述结构计算自己应该获取的partition数目;
计算方法:dev weight * weigh_of_one_part – 已经分配的partition数目;
具体解释就是单位权重对应的partition数目乘以权重值,得到权重dev['weight']对应的总的partition数目,
然后减去已经分配的partition数目,就得到了还要分配的partition数目;
由此可见,每次add设备操作都会引起partition分配的变化,但是真正的partition搬迁操作在rebalance执行时;
"""
# weight_of_one_part:从所有设备的总权重(weight)中,返回计算出来的每一个分区的权重(weight);
# 计算方法就是将partition数目乘以副本数得到总的partition数目,然后除以现有dev的weight总和,得到每个partition的权重;
# 也就是说具有同样副本数的分区具有同样的权重值(weight);
weight_of_one_part = self.weight_of_one_part()
for dev in self._iter_devs():
if not dev['weight']:
dev['parts_wanted'] = -self.parts * self.replicas
else:
dev['parts_wanted'] = int(weight_of_one_part * dev['weight']) - dev['parts']
此篇博文解析到这里吧,下一篇博文将会继续解析swift-ring-builder文件。
博文中不免有不正确的地方,欢迎朋友们不吝批评指正,谢谢大家了!
下一篇
OpenStack Swift源码分析(5)----swift-ring-builder源代码解析之二
页:
[1]