本帖最后由 pig2 于 2014-11-21 15:12 编辑
问题导读
1.ring组件在swift中的作用是什么?
2.swift-ring-builder中包含了对ring的哪些操作方法?
ring是swift中的核心组件,它描述和决定了数据如何在集群系统中分布。其中的一致性哈希算法更是核心内容之一。
swift-ring-builder中包含了对ring的各种操作方法,包括create、default、search、list_parts、add、set_weight、set_info、remove、rebalance、validate、write_ring、pretend_min_part_hours_passed、set_min_part_hours和set_replicas等方法,我将会逐个解析这写方法的实现过程;
首先来看swift-ring-builder中的main方法:
- if __name__ == '__main__':
- if len(argv) < 2:
- print "swift-ring-builder %(MAJOR_VERSION)s.%(MINOR_VERSION)s\n" % globals()
- print Commands.default.__doc__.strip()
- print
- cmds = [c for c, f in Commands.__dict__.iteritems()
- if f.__doc__ and c[0] != '_' and c != 'default']
- cmds.sort()
- for cmd in cmds:
- print Commands.__dict__[cmd].__doc__.strip()
- print
- print RingBuilder.search_devs.__doc__.strip()
- print
- for line in wrap(' '.join(cmds), 79, initial_indent='Quick list: ',
- subsequent_indent=' '):
- print line
- print ('Exit codes: 0 = operation successful\n'
- ' 1 = operation completed with warnings\n'
- ' 2 = error')
- exit(EXIT_SUCCESS)
-
- # 验证argv[1]的路径是否存在;
- if exists(argv[1]):
- # builder从argv[1]指定文件获取初始化数据;
- # 再对RingBuilder类进行实例化;
- # 根据具体情况决定是否改变builder的初始化数据;
- builder = RingBuilder.load(argv[1])
- # 如果argv[1]不存在,而且调用的还不是'create'命令,则退出;
- elif len(argv) < 3 or argv[2] != 'create':
- print 'Ring Builder file does not exist: %s' % argv[1]
- exit(EXIT_ERROR)
-
- # 获取文件夹'backups'的整体路径;
- backup_dir = pathjoin(dirname(argv[1]), 'backups')
- # 新建文件夹'backups';
- try:
- mkdir(backup_dir)
- except OSError, err:
- if err.errno != EEXIST:
- raise
-
- # 获取argv[1],作为ring_file;
- ring_file = argv[1]
- if ring_file.endswith('.builder'):
- ring_file = ring_file[:-len('.builder')]
- ring_file += '.ring.gz'
-
- if len(argv) == 2:
- command = "default"
- else:
- command = argv[2]
- if argv[0].endswith('-safe'):
- try:
- with lock_parent_directory(abspath(argv[1]), 15):
- Commands.__dict__.get(command, Commands.unknown.im_func)()
- except exceptions.LockTimeout:
- print "Ring/builder dir currently locked."
- exit(2)
- else:
- Commands.__dict__.get(command, Commands.unknown.im_func)()
复制代码
这个main方法主要做了以下几件事:
(1)检测命令行的正确性;
(2)如果argv[1]文件存在,则调用类RingBuilder中的方法load加载argv[1]文件,并返回类RingBuilder的实例化对象;
(3)建立backups文件夹;
(4)初始化ring_file文件;
(5)调用运行command中指定的处理ring的方法;
在类Commands中有若干处理ring的方法,如create、default、search、list_parts、add、set_weight、set_info、remove、rebalance、validate、write_ring、pretend_min_part_hours_passed、set_min_part_hours和set_replicas等等。下面我们来逐一解析这些方法:
1.来看方法create:
- def create():
- if len(argv) < 6:
- print Commands.create.__doc__.strip()
- exit(EXIT_ERROR)
-
- builder = RingBuilder(int(argv[3]), float(argv[4]), int(argv[5]))
- backup_dir = pathjoin(dirname(argv[1]), 'backups')
- try:
- mkdir(backup_dir)
- except OSError, err:
- if err.errno != EEXIST:
- raise
-
- # Python中可以使用 pickle 模块将对象转化为文件保存在磁盘上,在需要的时候再读取并还原。
- # 这里即是把转化为字典格式的builder写入到文件pathjoin(backup_dir,'%d.' % time() + basename(argv[1]))文件中;
- # builder.to_dict():以字典的形式返回初始化的RingBuilder类的对象;
- pickle.dump(builder.to_dict(), open(pathjoin(backup_dir,'%d.' % time() + basename(argv[1])), 'wb'), protocol=2)
-
- # 再把转化为字典格式的builder写入到argv[1]指明的文件中;
- pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2)
-
- exit(EXIT_SUCCESS)
复制代码
这个方法主要实现的功能是:
(1)根据命令初始化类RingBuilder,获取类RingBuilder的实例化对象;
(2)创建用于备份的文件夹'backups';
(3)使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;这里具体是把转化为字典格式的builder保存两份,一份写入到建立的文件夹'backups'中的指定文件中,一份写入到argv[1]指明的文件中;
2.来看方法default:
- def default():
- print '%s, build version %d' % (argv[1], builder.version)
- regions = 0
- zones = 0
- balance = 0
- dev_count = 0
- if builder.devs:
- regions = len(set(d['region'] for d in builder.devs
- if d is not None))
- zones = len(set((d['region'], d['zone']) for d in builder.devs
- if d is not None))
- dev_count = len([d for d in builder.devs
- if d is not None])
- balance = builder.get_balance()
- print '%d partitions, %.6f replicas, %d regions, %d zones, ' \
- '%d devices, %.02f balance' % (builder.parts, builder.replicas,
- regions, zones, dev_count,
- balance)
- print 'The minimum number of hours before a partition can be ' \
- 'reassigned is %s' % builder.min_part_hours
- if builder.devs:
- print 'Devices: id region zone ip address port' \
- ' name weight partitions balance meta'
- weighted_parts = builder.parts * builder.replicas / sum(d['weight'] for d in builder.devs if d is not None)
- for dev in builder.devs:
- if dev is None:
- continue
- if not dev['weight']:
- if dev['parts']:
- balance = 999.99
- else:
- balance = 0
- else:
- balance = 100.0 * dev['parts'] / (dev['weight'] * weighted_parts) - 100.0
- print(' %5d %5d %5d %15s %5d %9s %6.02f %10s'
- '%7.02f %s' %
- (dev['id'], dev['region'], dev['zone'], dev['ip'],
- dev['port'], dev['device'], dev['weight'], dev['parts'],
- balance, dev['meta']))
- exit(EXIT_SUCCESS)
复制代码
这个方法主要实现的功能是显示ring和设备内部的信息;
3.来看方法search:
- def search():
- if len(argv) < 4:
- print Commands.search.__doc__.strip()
- print
- print builder.search_devs.__doc__.strip()
- exit(EXIT_ERROR)
-
- # search_devs:<search-value>可以通过<device_id>/<region>/<zone>-<ip>:<port>/<device_name>_<meta>几种格式来进行查询;
- devs = builder.search_devs(argv[3])
-
- if not devs:
- print 'No matching devices found'
- exit(EXIT_ERROR)
- print 'Devices: id region zone ip address port name ' \
- 'weight partitions balance meta'
-
- weighted_parts = builder.parts * builder.replicas / sum(d['weight'] for d in builder.devs if d is not None)
-
- for dev in devs:
- if not dev['weight']:
- if dev['parts']:
- balance = 999.99
- else:
- balance = 0
- else:
- balance = 100.0 * dev['parts'] / \
- (dev['weight'] * weighted_parts) - 100.0
- print(' %5d %5d %5d %15s %5d %9s %6.02f %10s %7.02f %s' %
- (dev['id'], dev['region'], dev['zone'], dev['ip'],
- dev['port'], dev['device'], dev['weight'], dev['parts'],
- balance, dev['meta']))
- exit(EXIT_SUCCESS)
复制代码
这个方法主要实现了显示匹配的设备信息的功能;
主要做了以下几件事:
(1)验证命令行正确性;
(2)调用方法search_devs实现对设备信息的搜索功能;
(3)遍历得到的匹配设备信息,组成输出信息并进行打印输出;
我们来看看方法search_devs:
- def search_devs(self, search_value):
- """
- 可以通过//-:/_几种格式来进行查询;
-
- Examples::
- d74 Matches the device id 74
- r4 Matches devices in region 4
- z1 Matches devices in zone 1
- z1-1.2.3.4 Matches devices in zone 1 with the ip 1.2.3.4
- 1.2.3.4 Matches devices in any zone with the ip 1.2.3.4
- z1:5678 Matches devices in zone 1 using port 5678
- :5678 Matches devices that use port 5678
- /sdb1 Matches devices with the device name sdb1
- _shiny Matches devices with shiny in the meta data
- _"snet: 5.6.7.8" Matches devices with snet: 5.6.7.8 in the meta data
- [::1] Matches devices in any zone with the ip ::1
- z1-[::1]:5678 Matches devices in zone 1 with ip ::1 and port 5678
- Most specific example::
- d74r4z1-1.2.3.4:5678/sdb1_"snet: 5.6.7.8"
- """
- orig_search_value = search_value
- match = []
- if search_value.startswith('d'):
- ......
- if search_value.startswith('r'):
- ......
- if search_value.startswith('z'):
- ......
- if search_value.startswith('-'):
- ......
- if len(search_value) and search_value[0].isdigit():
- ......
- elif len(search_value) and search_value[0] == '[':
- ......
- if search_value.startswith(':'):
- ......
- if search_value.startswith('/'):
- ......
- if search_value.startswith('_'):
- ......
- if search_value:
- ......
- matched_devs = []
- for dev in self.devs:
- if not dev:
- continue
- matched = True
- for key, value in match:
- if key == 'meta':
- if value not in dev.get(key):
- matched = False
- elif dev.get(key) != value:
- matched = False
- if matched:
- matched_devs.append(dev)
- return matched_devs
复制代码
在这个方法中我们要注意的地方就是搜索指令的格式;
4.来看方法add:
- def add():
- """
- swift-ring-builder <builder_file> add
- [r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>
- [[r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>] ...
-
- 使用给定的信息添加新的设备到ring上;
- add操作不会分配partitions到新的设备上,只有运行了'rebalance'命令后,才会进行分区的分配;
- 因此,这种机制可以允许我们一次添加多个设备,并只执行一次'rebalance'实现对这些设备的分区分配;
-
- 使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;
- 这里具体是把转化为字典格式的builder写入到argv[1]指定文件中;
- """
- if len(argv) < 5 or len(argv) % 2 != 1:
- print Commands.add.__doc__.strip()
- exit(EXIT_ERROR)
-
- # itertools.islice(iterable, start, stop[, step])
- # islice('ABCDEFG', 2) --> A B
- # islice('ABCDEFG', 2, 4) --> C D
- # islice('ABCDEFG', 2, None) --> C D E F G
- # islice('ABCDEFG', 0, None, 2) --> A C E G
-
- # 从命令行获取匹配的devstr和weightstr集合;
- devs_and_weights = izip(islice(argv, 3, len(argv), 2),islice(argv, 4, len(argv), 2))
- for devstr, weightstr in devs_and_weights:
- region = 1
- rest = devstr
-
- if devstr.startswith('r'):
- i = 1
- while i < len(devstr) and devstr[i].isdigit():
- i += 1
- region = int(devstr[1:i])
- rest = devstr[i:]
- else:
- stderr.write("WARNING: No region specified for %s. "
- "Defaulting to region 1.\n" % devstr)
-
- if not rest.startswith('z'):
- print 'Invalid add value: %s' % devstr
- exit(EXIT_ERROR)
-
- i = 1
- while i < len(rest) and rest[i].isdigit():
- i += 1
- zone = int(rest[1:i])
- rest = rest[i:]
-
- if not rest.startswith('-'):
- print 'Invalid add value: %s' % devstr
- print "The on-disk ring builder is unchanged.\n"
- exit(EXIT_ERROR)
-
- i = 1
- if rest[i] == '[':
- i += 1
- while i < len(rest) and rest[i] != ']':
- i += 1
- i += 1
- ip = rest[1:i].lstrip('[').rstrip(']')
- rest = rest[i:]
- else:
- while i < len(rest) and rest[i] in '0123456789.':
- i += 1
- ip = rest[1:i]
- rest = rest[i:]
-
- if not rest.startswith(':'):
- print 'Invalid add value: %s' % devstr
- print "The on-disk ring builder is unchanged.\n"
- exit(EXIT_ERROR)
-
- i = 1
- while i < len(rest) and rest[i].isdigit():
- i += 1
- port = int(rest[1:i])
- rest = rest[i:]
-
- if not rest.startswith('/'):
- print 'Invalid add value: %s' % devstr
- print "The on-disk ring builder is unchanged.\n"
- exit(EXIT_ERROR)
-
- i = 1
- while i < len(rest) and rest[i] != '_':
- i += 1
- device_name = rest[1:i]
- rest = rest[i:]
-
- meta = ''
- if rest.startswith('_'):
- meta = rest[1:]
-
- try:
- weight = float(weightstr)
- except ValueError:
- print 'Invalid weight value: %s' % weightstr
- print "The on-disk ring builder is unchanged.\n"
- exit(EXIT_ERROR)
-
- if weight < 0:
- print 'Invalid weight value (must be positive): %s' % weightstr
- print "The on-disk ring builder is unchanged.\n"
- exit(EXIT_ERROR)
-
- for dev in builder.devs:
- if dev is None:
- continue
- if dev['ip'] == ip and dev['port'] == port and \
- dev['device'] == device_name:
- print 'Device %d already uses %s:%d/%s.' % \
- (dev['id'], dev['ip'], dev['port'], dev['device'])
- print "The on-disk ring builder is unchanged.\n"
- exit(EXIT_ERROR)
-
- # 增加一个device到ring;
- # 这个方法不会马上执行ring的重新平衡操作,因为我们可能需要在重新平衡操作之前进行多次改变;
- # 这样做也是为了提高效率的;
- builder.add_dev({'region': region, 'zone': zone, 'ip': ip,
- 'port': port, 'device': device_name,
- 'weight': weight, 'meta': meta})
- new_dev = builder.search_devs('r%dz%d-%s:%s/%s' % (region, zone, ip, port, device_name))[0]['id']
-
- if ':' in ip:
- print(
- 'Device r%dz%d-[%s]:%s/%s_"%s" with %s weight got id %s' %
- (region, zone, ip, port,
- device_name, meta, weight, new_dev))
- else:
- print('Device r%dz%d-%s:%s/%s_"%s" with %s weight got id %s' %
- (region, zone, ip, port,
- device_name, meta, weight, new_dev))
-
- # 使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原;
- # 这里具体是把转化为字典格式的builder写入到argv[1]指定文件中;
- pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2)
-
- exit(EXIT_SUCCESS)
复制代码
可以知道命令行格式:
swift-ring-builder <builder_file> add
[r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>
[[r<region>]z<zone>-<ip>:<port>/<device_name>_<meta> <weight>]
这个方法主要做了以下几件事:
(1)验证命令行的正确性;
(2)从命令行获取匹配的devstr和weightstr集合;
调试示例:
argv = ['/usr/bin/swift-ring-builder', 'account.builder', 'add', 'z1-127.0.0.1:6012/sda3', '1']
izip(islice(argv, 3, len(argv), 2),islice(argv, 4, len(argv), 2))) = ('z1-127.0.0.1:6012/sda3', '1')
(3)遍历每一对匹配的devstr和weightstr,从devstr和weightstr中获取region、zone、ip、port、device_name、weight和meta的值。调用add_dev方法实现增加这个device信息到ring,这个方法不会马上执行ring的重新平衡操作,因为可能需要在重新平衡操作之前进行多次改变(正如这里是个遍历循环操作,如果命令行中一次增加多个设备信息,就需要执行多次add_dev方法)。这样做也是为了提高效率的。
(4)最后使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原。这里具体是把转化为字典格式的builder写入到argv[1]指定文件中。
5.来看方法set_weight:
- def set_weight():
- """
- swift-ring-builder <builder_file> set_weight <search-value> <weight>
- [<search-value> <weight] ...
- """
- if len(argv) < 5 or len(argv) % 2 != 1:
- print Commands.set_weight.__doc__.strip()
- print
- print builder.search_devs.__doc__.strip()
- exit(EXIT_ERROR)
-
- devs_and_weights = izip(islice(argv, 3, len(argv), 2),
- islice(argv, 4, len(argv), 2))
- for devstr, weightstr in devs_and_weights:
- devs = builder.search_devs(devstr)
- weight = float(weightstr)
- if not devs:
- print("Search value "%s" matched 0 devices.\n"
- "The on-disk ring builder is unchanged.\n"
- % devstr)
- exit(EXIT_ERROR)
- if len(devs) > 1:
- print 'Matched more than one device:'
- for dev in devs:
- print ' d%(id)sz%(zone)s-%(ip)s:%(port)s/%(device)s_' \
- '"%(meta)s"' % dev
- if raw_input('Are you sure you want to update the weight for '
- 'these %s devices? (y/N) ' % len(devs)) != 'y':
- print 'Aborting device modifications'
- exit(EXIT_ERROR)
- for dev in devs:
- # 设置device的weight值;
- # 这个方法不是仅仅直接在device字典中变更weight值;
- # 还有builder将会需要重新设置一些内部状态来反应weight值的改变;
- builder.set_dev_weight(dev['id'], weight)
- print 'd%(id)sz%(zone)s-%(ip)s:%(port)s/%(device)s_' \
- '"%(meta)s" weight set to %(weight)s' % dev
- pickle.dump(builder.to_dict(), open(argv[1], 'wb'), protocol=2)
- exit(EXIT_SUCCESS)
复制代码
这个方法实现了重新设置设备的weight。set_weight操作后,设备上的partition不会重新分配,只有运行了'rebalance'命令后才会进行分区的分配。因此,这种机制可以允许你一次添加多个设备,并只执行一次'rebalance'实现对这些设备的分区分配。
这个方法做了以下几件事:
(1)验证命令行正确性;
(2)从命令行获取匹配的devstr和weightstr集合;
(3)遍历每一对匹配的devstr和weightstr。调用search_devs方法根据devstr查询得到对应的devs,也就是要修改weight值的devs。遍历得到的devs,调用set_dev_weight方法为每一个dev重新设置weight值为float(weightstr)。所以每一组devs的weight的值都是相同的。
(4)最后使用 pickle 模块将对象转化为文件保存在磁盘上,以便在需要的时候再读取还原。这里具体是把转化为字典格式的builder写入到argv[1]指定文件中。
这里再来看看方法set_dev_weight,方法比较容易理解,具体解析如注释所示;
- def set_dev_weight(self, dev_id, weight):
- """
- 设置device的weight值;
- 这个方法不是仅仅直接在device字典中变更weight值;
- 还有builder将会需要重新设置一些内部状态来反应weight值的改变;
- """
- self.devs[dev_id]['weight'] = weight
- # _set_parts_wanted:方法根据dev的weight计算dev除了目前已经分配的partition数目而外,还要分配的partition数目;
- self._set_parts_wanted()
- # 设置devs_changed为TRUE,说明已经改变;
- self.devs_changed = True
- self.version += 1
复制代码
这里再来看看方法_set_parts_wanted,这个方法会在很多地方被调用,这里先来解析一下,具体解析如注释所示:
- def _set_parts_wanted(self):
- """
- 方法根据dev的weight计算dev除了目前已经分配的partition数目而外,还要分配的partition数目;
- 计算方法是:
- 1.首先计算每个partition的weight,即将partition数目乘以副本数得到总的partition数目;
- 然后除以现有dev的weight总和,得到每个partition的权重;
- 实际上得到的是单位权重对应的partition数目;
- 2.dev根据上述结构计算自己应该获取的partition数目;
- 计算方法:dev weight * weigh_of_one_part – 已经分配的partition数目;
- 具体解释就是单位权重对应的partition数目乘以权重值,得到权重dev['weight']对应的总的partition数目,
- 然后减去已经分配的partition数目,就得到了还要分配的partition数目;
-
- 由此可见,每次add设备操作都会引起partition分配的变化,但是真正的partition搬迁操作在rebalance执行时;
- """
-
- # weight_of_one_part:从所有设备的总权重(weight)中,返回计算出来的每一个分区的权重(weight);
- # 计算方法就是将partition数目乘以副本数得到总的partition数目,然后除以现有dev的weight总和,得到每个partition的权重;
- # 也就是说具有同样副本数的分区具有同样的权重值(weight);
- weight_of_one_part = self.weight_of_one_part()
-
- for dev in self._iter_devs():
- if not dev['weight']:
- dev['parts_wanted'] = -self.parts * self.replicas
- else:
- dev['parts_wanted'] = int(weight_of_one_part * dev['weight']) - dev['parts']
复制代码
此篇博文解析到这里吧,下一篇博文将会继续解析swift-ring-builder文件。
博文中不免有不正确的地方,欢迎朋友们不吝批评指正,谢谢大家了!
下一篇
OpenStack Swift源码分析(5)----swift-ring-builder源代码解析之二
|