Monthly Archives: September 2014

SSD的监控

SSD现在越来越普及了,我们也开始比较大量的使用SSD用作数据库存储,涉及到一些性能方面的可以参考《4k Sector Size》,《What every programmer should know about solid-state drives》。下面主要讲讲SSD的监控。

我们使用的主要是Intel的SSD,服务器包括了HP、IBM、DELL等。使用的OS主要是OEL 5 和OEL 6,工具使用的是smartctl(The smartmontools package contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology System (SMART) built into most modern ATA and SCSI harddisks. In many cases, these utilities will provide advanced warning of disk degradation and failure. )

smartmontools 尽量选取比较新的版本,因为有些老的版本对某些硬件或者指标不支持。我这里选用的是6.2的,同时我在不同的OS上编译了两个版本,然后就可以通过脚本自动scp到各台机器:
-rwxr-xr-x 1 mymonitor oinstall 1588025 Sep 12 17:03 smartctl5
-rwxr-xr-x 1 mymonitor oinstall 1744199 Sep 12 17:02 smartctl6

# ----------------------------------------------------------------------------------------
# Func : get_os_version
# ----------------------------------------------------------------------------------------
sub get_os_version() {
	
my $_ssh=shift;
 
	eval {
   		my $osversion= $_ssh->capture2({timeout => 5},'cat /etc/issue | grep release');
		if($_ssh->error){
			$logger->warning("get os version error");
			return -1;
		}else{
			
			if(index($osversion,'release 5')>-1){
				return 5;
			}elsif(index($osversion,'release 6')>-1){
				return 6;
			}else{
				return -1;
			}
		}
  }
  
}
# ----------------------------------------------------------------------------------------
# Func : scp_smartctl
# ----------------------------------------------------------------------------------------
sub scp_smartctl() {
	
my $_ssh=shift;
my $_os_version=shift;
my $_os_host=shift;
 
	eval {
	    if($_os_version==5){
	    	$_ssh->scp_put("$workdir/smartctl5", "$workdir/smartctl");
			if($_ssh->error){
				$logger->error("scp smartctl5 to $_os_host failed:$_ssh->error");
				exit;
			}else{
				$logger->info("scp smartctl5 to $_os_host success.");
				$_ssh->capture2({timeout => 5},"chmod +x $workdir/smartctl");
				if($_ssh->error){
					$logger->error("chmodx $workdir/smartctl on $_os_host failed:$_ssh->error");
					exit;						
				}else{
					$logger->info("chmodx $workdir/smartctl on $_os_host  success.");
				}
				
			}	    		
    	}elsif($_os_version==6){
	    	$_ssh->scp_put("$workdir/smartctl6", "$workdir/smartctl");
			if($_ssh->error){
				$logger->error("scp smartctl6 to $_os_host failed:$_ssh->error");
				exit;
			}else{
				$logger->info("scp smartctl6 to $_os_host success.");
				$_ssh->capture2({timeout => 5},"chmod +x $workdir/smartctl");
				if($_ssh->error){
					$logger->error("chmodx $workdir/smartctl on $_os_host failed:$_ssh->error");
					exit;						
				}else{
					$logger->info("chmodx $workdir/smartctl on $_os_host success.");
				}				
			}		    		
    	}else{
    		exit;
    	}
  	}
  
}

smartctl获取磁盘信息需要RAID卡的支持,各个厂商的实现不太一样,主要是HP的 device TYPE比较不一样:
IBM: smartctl -a -d megaraid,6 /dev/sdd
HP: smartctl -a -d cciss,6 /dev/sdd
DELL:smartctl -a -d megaraid,6 /dev/sdd
通过实验发现最后面要求的disk只要在本服务器上存在就可以了,不一定要跟device id一一匹配。

# ----------------------------------------------------------------------------------------
# Func : get_machine_type
# ----------------------------------------------------------------------------------------
sub get_machine_type() {
	
my $_ssh=shift;
 
	eval {
   		my $machine_type= $_ssh->capture2({timeout => 5},'sudo /usr/sbin/dmidecode -t1 | grep "Manufacturer"');
		if($_ssh->error){
			$logger->warning("get machine type error");
			return "Other";
		}else{
			$machine_type=uc($machine_type);
			if(index($machine_type,'DELL')>-1){
				return "DELL";
			}elsif(index($machine_type,'HP')>-1){
				return "HP";
			}elsif(index($machine_type,'IBM')>-1){
				return "IBM";
			}elsif(index($machine_type,'FUJITSU')>-1){
				return "FUJITSU";			
			}else{
				return "Other";
			}
		}
  }
  
}

上面用到的还有一个device id,如果是LSI的raid卡,先安装raid卡的工具程序,首先去lsi官网去拉个megacli-2.00.11-2.x86_64.rpm。参考
这个链接:http://blog.yufeng.info/archives/1096
我们采用的是一个比较暴力的办法:

# ----------------------------------------------------------------------------------------
# Func : get_ssd_deviceid
# ----------------------------------------------------------------------------------------
sub get_ssd_deviceid() {
	
my $_ssh=shift;
my $_machine_type=shift;

my $_devcount=0;
my $_i;
my $_msg;
my $_ssd_deviceid;
my $_parms_disk;
my $_raid_type;
 
	eval {
		if($_machine_type eq "HP"){
			$_raid_type="cciss";
			$_devcount=$_ssh->capture2('sudo df -h | grep "/dev/cciss/c0d0" | wc -l');
		
			if($_devcount>0){
				$_parms_disk="/dev/cciss/c0d0";
			}else{
				$_parms_disk="/dev/sda";
			}
		}else{
			$_raid_type="megaraid";
			$_parms_disk="/dev/sda";	
		}
		
		#try 	
		for($_i=1;$_i<=20 ; $_i++){
			$_msg=$_ssh->capture2("sudo $workdir/smartctl -a -d $_raid_type,$_i $_parms_disk | grep  'Solid State Device'");
			chomp($_msg);
			if(index($_msg,'Solid State Device')>-1){
				if(!(defined $_ssd_deviceid && "" ne $_ssd_deviceid)){
					$_ssd_deviceid= $_i ;
				}else{
					$_ssd_deviceid = $_ssd_deviceid . "|" . $_i;
				}						
			}
		}
					
		if(defined $_ssd_deviceid && "" ne $_ssd_deviceid){
			return $_parms_disk . "," . $_ssd_deviceid;
		}else{
			return $_parms_disk . "," ;
		}		

	}
  
}

最后就是监控指标,我们主要选取了这几个:Reallocated_Sector_Ct, Available_Reserver_Space, Media_Wearout_Indicator。指标的选择可以参考这个链接:http://www.hellodb.net/2011/04/ssd-media-wear.html

foreach my $id  (@resultids) {
	chomp($id);
	$logger->debug("$result[0] ssd check  deviceid:$id");
	if(defined $id && "" ne $id){
		$chk_ssd_msg=$cssh->capture2("sudo $workdir/smartctl -a -d $raid_type,$id $parms_disk | grep -E \"$result[12]\"");
		$logger->debug("$result[0] ssd check msg:$chk_ssd_msg");
	    foreach my $line (split /\n/,$chk_ssd_msg) { 
		if ($line =~ m{(\d+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)}i) { 
		    $attrname=$2;
		    $attrvalue=$4;
		    $attrthresholds=$tabhash{"$2"};
		    if ($ssdrpt==1){
			$logger->info("$result[0] ssd check: deviceid:$id;ATTRIBUTE_NAME:$attrname;VALUE:$attrvalue;Thresholds:$attrthresholds");	
		    }
		    
		    if($attrvalue <= $attrthresholds && $result[14]>0){
					&send_report("$result[0]:SSD Check warning. deviceid:$id; $attrname:$attrvalue");
			
		    }
		} 
	    }
	}
}

关于Dell 推出第13代服务器的一些想法

戴尔近日推出了旗下的13G服务器,其主力机型为R730xd,包含了诸多的特性,为其成为主流db server以及规模存储集群打下了良好的基础。

具体参考:http://www.storagereview.com/dell_poweredge_13g_r730xd_review
http://www.storagereview.com/dell_poweredge_gen13_servers_released

具体增强为:

1.CPU 为intel haswall最新架构,减少了功能的损耗。
2.更多的插槽,扩展为可支持18块1.8寸SSD的槽位 以及多种磁盘混插的模式。
3.DDR4 memory 拥有更高的主频
4.更加智能的基于iDRAC的装机模式
5.扩展的万兆网卡
6.基于iDRAC8的自动管理功能 包括服务器性能的监控,邮件报警(app端)等等。
7.Sandisk的缓存技术取代之前的LSI的(是否与LSI被希捷收购有关 ?)
8.增强的新一代的RAID卡 更大的内存以及基于RAID卡的直接系统日志收集等(依然采用电池)。
9.NFC技术的运用(自动扫描bios信息等)
10.NVMe协议的支持 (支持 NVMe_SSD 全面拥抱Intel ?)

等等

根据戴尔sales的描述,R730xd为下一代db-server,hadoop server 以及云计算server.在这里针对hadoop server持保留意见,其18块ssd的插槽扩展虽然增加了ssd的整体容量,但对于hadoop这类应用,或者对于目前hadoop的软件架构,SSD是否能发挥其应有的性能,facebook的测试给出了答案。

http://hadoopblog.blogspot.com/2012/05/hadoop-and-solid-state-drives.html

Also, a SSD device can support 100K to 200K operations/sec while a spinning disk controller can possibly 
issue only 200 to 300 ops/sec. This means that random reads/writes are not a bottleneck on SSDs. 
On the other hand, most of our existing database technology is designed to store data in spinning disks, 
so the natural question is "can these databases harness the full potential of the SSDs"?

结合两张图我们来看结论:

HdfsPreadImageCache4G

结论为现在HADOOP/hbase 并不能将SSD的性能优势发挥的玲离尽致 hadoop修改代码后的瓶颈依然存在(JAVA DFSClient),hbase线程锁导致cpu利用率低下,这归根于传统的数据库基于机械硬盘IO的设计,不过这一点在oracle上解决的非常好(oracle 在unix/linux是基于进程的数据库)。

最后如Dhruba Borth所说

@Sujoy: you are absolutely right. In fact, we currently run multiple servers instances per SSD 
just to be able to utilize all the IOPs. This is kindof-a-poor man's solution to the problem. 
Also, you have to have enough CPU power on the server to be able to drive multiple database 
instances on the same machine.

Facebook通过多实例并用server来以最小的成本达到硬件的最大性能,这类似于早期的mysql,mysql的多线程架构并不能在SMP NUMA架构的机器中充分利用CPU的能力,所以衍生出了NUMA多实例,多种绑定CPU的策略。所以在传统的数据库架构下要契合最新的硬件并不是一件很轻松的事。

另外针对线程以及进程(在unix时代对线程支持不是非常好,所以如oracle pg等数据库采用了进程的方式,mysql采用线程在早期对CPU的利用也是十分低下的) 可以暂且认为线程是近代DB的一种趋势(不知道准不准确)因为线程本省对于进程来说是具有一定优势的(内存的共享 以及更小的创建代价,更低的CPU上下文切换代价)

关于Partial write for all MySQL and Oracle

很久之前看过这个帖子,当时没有细究,这几天看了下mysql相关的internal 突然想起这个问题,跟三四个朋友讨论了下 想把这个问题展开讲一讲。

源帖子内容见:https://community.oracle.com/thread/1087650?start=0&tstart=0

这个帖子的内容为一个老外问oracle如何避免partial write的,也就是大家所说的半个IO问题,IO 在写到一半的时候发生断裂,(瞬间断线,瞬间crash瞬间存储或者机器故障等) 虽然这个情况极少发生,但是在非常规恢复的时候也是经常遇到的,例如某个block内部出现不一致,也就是我们所谓的逻辑坏块。
我仔细的又过了一遍帖子,发现下面的几个回答完全不准确,或者有点张冠李戴的意思。

首先我们看看什么是mysql 的double write:

—–引自 mysqlperformace blog

为什么要用doublewrite?
目的是为了保证出现部分写失效(partial page write)–即数据页写到一半时就出现故障–时的数据安全性。Innodb并不在日志中记录整个数据页,而是使用一种称之为“生理”日志的技术,即日志项中只包含页号、对数据进行的操作(如更新一行记录)和日志序列号等信息。这一日志结构的优点是能够减少写入到日志的数据量,但这也要求要保持页内部的一致性。页的版本号是没关系的,数据页可以是当前版本(这时Innodb(故障恢复时)会跳过对页的更新操作)若是早期版本(这时Innodb将执行更新操作)。但如果页内部数据不一致,恢复将不能进行。

部分写失效
什么是部分写失效,为什么会出现这一现象呢?部分写失效指的是提交给操作系统的写数据页操作只完成了一部分。例如一个16K的Innodb数据页中只有4K被更新了,其余部分数据还是旧的。大部分的部分写失效都在断电时产生,在操作系统崩溃时也可能出现,这是由于操作系统可能将一个写16K数据的操作分割成多个写操作(这可能由文件碎片导致),而在执行这多个写操作的过程中发出的故障。当使用软件RAID技术时,数据页可能恰好跨越分片(stripe)的边界,这时也需要执行多个IO操作,因此也可能导致部分写失效。当使用硬件RAID又没有使用带电池的缓存时,断电时出现部分写失效也是可能的。当发送到磁盘本身只有一个写操作时,理论上硬件可以保证写操作即使在断电时也可以完成,因为驱动器内部应该积累有足够的电量来完成这一操作。但实话说我并不知道事实是不是这样,这很难检测,而且这也不是部分写失效的唯一原因。我所知道的只是部分写失效可能会出现,当Innodb实现 doublewrite功能前,我遇到过很多由于这一原因而导致数据被破坏。

doublewrite如何工作?
你可以将doublewrite看作是在Innodb表空间内部分配的一个短期的日志文件,这一日志文件包含100个数据页。Innodb在写出缓冲区中的数据页时采用的是一次写多个页的方式,这样多个页就可以先顺序写入到doublewrite缓冲区并调用fsync()保证这些数据被写出到磁盘,然后数据页才被定出到它们实际的存储位置并再次调用fsync()。故障恢复时Innodb检查doublewrite缓冲区与数据页原存储位置的内容,若数据页在doublewrite缓冲区中处于不一致状态将被简单的丢弃,若在原存储位置中不一致则从doublewrite缓冲区中还原。

doublewrite缓冲区对MySQL有何影响?
虽然doublewrite要求每个数据页都要被写二次,其性能开销远远小于二倍。写出到doublewrite缓冲区时是顺序写,因此开销很小。 doublewrite同时还可以降低Innodb执行的fsync()操作,即不需要写每个页时都调用一下fsync(),而可以提交多个写操作最后再调用一次fsync()操作,这使得操作系统可以优化写操作的执行顺序及并行使用多个存储设备。但在不使用doublewrite技术时也可以用这些优化,事实上这些优化是与doublewrite同时实现的。因此总体来说,我预计使用doublewrite技术带来的性能开销不会超过5%到10%。

能否禁用doublewrite?
如果你不关心数据一致性(比如使用了RAID0)或文件系统可以保证不会出现部分写失效,你可以通过将innodb_doublewrite参数设置为0还禁用doublewrite。但通常这可能带来更大的麻烦。

这里暂且不讨论为何mysql不开启double write会容易出现断裂的写操作. 在mysql中数据写的单元是以page为单位 1page=16KB 而在oracle中是以block为单位 block可以指定大小。但是写入OS的时候都是以OS block为单位,也就是说如果写入OS block时发生partial write 同样会出现逻辑问题。

这里我们看一个老外的回答:

It’s an interesting question. I think the clue is in the link you provided: “Such logging structure is geat as it require less data to be written to the log, however it requires pages to be internally consistent.”

What that’s saying (I think!) is that the contents of the innodb transaction log can only be replayed to datafile pages which are ‘clean’ -and that’s true for Oracle, too. You can’t apply Oracle redo to an Oracle database block that is internally corrupted because some of its consituent “os pages” were written at a different time from others. When such partial writes happen, you get what’s called a “fractured block”, warnings in the alert log …and the data file is regarded as corrupt from that point on.

Oracle’s fix to this potential problem, however, is also hinted at in the article you linked to: “Innodb does not log full pages to the log files”. That’s an interesting sentence because , you see, Oracle does write full pages to the logs! I should immediately qualify that: it only does so when you take a “hot backup” using O/S copy commands -because it’s only then that you have to worry about the problem. In other words, you only have to worry about the fact that you can only apply redo to an internally consistent database block if you’re actually in the business of applying redo… and you’re only doing that in the event of a recovery. And complete recoveries in Oracle (as opposed to mere crash recoveries) require you to have restored something from backup. So, it’s only during the backup process that you only have to worry about the problem of fractured blocks -and so it’s only then that Oracle says, ‘if you have put the tablespace into hot backup mode (alter tablespace X begin backup), then the first time a block of data is changed, the entire block should be written in a consistent state into the redo (transaction) logs. Then, if the datafile copy of the block in the backup turns out to be fractured, we’ve got a known good copy in the redo we can restore in its place. And once you have a clean block as a starting point, you can continue to apply redo from that point on’.

Oracle has an alternative (and more up to date) mechanism for achieving this “I know your data block is clean” starting state, though. It’s called RMAN -the Oracle backup and recovery tool. Unlike your OS copy command, it’s an Oracle utility… so it understands the concept of Oracle blocks, and it can therefore check that a block that’s been copied has been copied consistently, with all its constituent OS ‘pages’ written coherently to disk in the same state. It knows how to compare input and output in a way no OS command could ever hope to do. So when RMAN copies a data block hot, it reads the copy, compares it with the original -and if it sees the copy is fractured, it just has another go copying the block again. Repeat until the copy is indeed verified as a good copy of the original. No need to write the block into the transaction log at all, because you know that the backup file itself contains the necessary clean block copy.

So, putting that into practice. Let’s say your server corrupts data on the disk for whatever reason and, in the process, your Oracle instance dies. You try and restart Oracle, but you get told that recovery is needed (you might get a message that file 16, for example, can’t be read). So you restore file 16 from your hot backup taken with OS commands. In that backup, one of the blocks is fractured, because only part of the Oracle block had hit disk at the point the backup was performed. So you restore a fractured block. But that’s not a problem, because as redo is replayed, you’ll find the clean copy of the block in the redo stream, and restore that over the top of the fractured block. The rest of the redo can then be replayed without a problem. Or, you restore file 16 using RMAN… and what it restores cannot be fractured, because it checks for that before it reports the original backup a success. Therefore, you restore a clean copy of file 16, and can apply redo to it without drama. Either way, you get your database recovered.

So, the article you linked to nails the important matter: “It does not matter which page version it is – it could be “current” version in which case Innodb will skip page upate operation or “former” in which case Innodb will perform update. If page is inconsistent recovery can’t proceed.” Absolutely true of Oracle, too. But Oracle has two alternatives for ensuring that a clean version of the block is always available: write a whole block into redo if it’s changed whilst the database is being backed up with OS commands, or make sure you only write clean blocks into the backup if you’re using RMAN -and you achieve that by multiple reads of the block, as many as are necessary to ensure the output is clean.

Oracle’s solutions in these regards are, I think, a lot more efficient than double-writing every block all the time, because the only time you have to worry that what’s on disk isn’t consistent is, as your linked article again points out, when ‘power failure’ or ‘os crash’ happens. That is, during some sort of failure. And the response to failure that involves corruption is always to restore something from backup… so, it’s really only that backup that needs to worry about ‘clean pages’. Instead of writing everything twice to disk during normal running (which sounds like a potentially enormous overhead to me!), therefore, Oracle only has to employ protective measures during the backup process itself (which should, ordinarily, be a mere fraction of ‘normal running’ time). The overhead is therefore only encountered sparingly and not something you need worry about as a potentially-constant performance problem.

In closing, I’ll second Aman’s observation that it is generally and usually the case that any variation away from the default 8K block size is a bad idea. Not always, and there may be justification for it in extremis… but you will certainly be at risk of encountering more and weirder bugs than if you stick to the defaults.

这一段回答可谓一点儿也没讲到重点 主要回答oracle 采用避免partial write的几个方法,注意 这里是在特殊场景下 如何避免partial write.而不是数据库机理. 我们看一下mysql怎么同样避免这个问题 —> ( oracle hot backup/RMAN backup)
传统Oracle的热备,备份读取与DBWR写文件并行执行,因此可能读取到Fractured Block(不一致的块),解决办法是对于备份的文件,DBWR写的Dirty Page同时写入Redo Log,用于Fractured Block的恢复。RMAN备份,会检查每个读取页面的一致性,不一致就重新读取。Percona的XtraBackup,采用类似于RMAN的方案。
如何检查备份读取的页面是否是一致的,其实方法很简单:无论是Oracle/InnoDB的页面,都会在页面的页头与页尾存储相同的SCN /CheckSum。当页头/页尾的SCN/CheckSum相同,说明页面处于一致状态,否则就说明页面损坏(Fractured Block),备份重新读取损坏页面即可。
所以这一段可以理解为当你需要备份的时候 如何避免partial write 因为在这个时候最容易发生断裂的块或者页。而此前别人提问的是oracle是否有类似double write的手法避免常规的partial write.

我们继续看下面的回答:

user646745 wrote:
Thanks HJR for detailed analysis.

But the double write mechanism works without restore from a backup an apply redo: before write the dirty buffer blocks, innodb flush the blocks in the double write buffer to disk which is a sequential disk area (so it’s fast),
————————-
before write the dirty buffer blocks, logwr flush the blocks in the redo buffer to disk which is a sequential disk area (so it’s fast),

so even if partial write happen, the the blocks in double write buffer already persistent in disk,
————————-
so even if partial write happen, the the blocks in redo buffer already persistent in disk,

and when mysql restart, innodb can compare the blocks flushed from double write buffer with the datafile blocks,
————————-
and when mysql restart, smon can compare control file scn with the datafile blocks scn,

if they matched (means no partial write), then everything is fine, if not matched (partial write happen), just overwrite the datafile blocks with the blocks from double write buffer.
————————-
if they matched (means no partial write), then everything is fine, if not matched (partial write happen), just apply the redo from the redo logs.

So no media recover is required.
————————-
sounds like media recovery to me

Based on your anaysis, oracle needs media recover.
————————-
Based on your analysis, so does mysql. It just applies it in a very slightly different fashion, and calls it all something else.

这里的回答又有误点. 他说道 “so even if partial write happen, the the blocks in double write redo buffer already persistent in disk” 这句话存在明显的误导,首先partial write发生的时候,redo是无法恢复
一个块内不一致的块的,redo只能负责recover作用,但这不是针对块内部的恢复 而是trasaction或者media的.

oracle recovery 分为 instance/crash recovery and media recovery 其本质区别在于instance recovery 需要online redo logfile 即apply 增量检查点之后的redolog. 而media recovery又分为 datafile /block media
recovery 其本质为恢复一个旧的datafile或者block 可能需要用到归档日志。注意这里的前提为需要有备份。restore datafile /block 只能在备份中恢复,而之后的recover是要用到archive log。 这里的media recover可以
对应为解决partial write的方法。

对于一个内部不一致的page mysql无法通过redo去恢复 :redo有效应用的前提是data的一致性,当data flush到磁盘时发生故障,比如16K只写了4K并且redo条目是change vector形式,属于逻辑记录,那么在InnoDB recovery时,redo就不知道从哪开始跑起。

对于oracle来说 内部不一致的block.仅仅有 redo也是不够的.我们可以通过dbf检查到逻辑问题的block,去repair这个block,前提是需要有备份。这里用到的方法是media recovery里的block recovery.
我们可以使用bbed 或者一些OS的工具来模拟这些逻辑坏块,并且往往发现数据库是可以正常open的,只有在访问到这些block的时候才会出现逻辑问题。所以说备份重于一切,oracle在处理这些”半个IO”或者一些静默丢失(storage),在没有备份的情况下也是很乏力的。

db-topology:显示MySQL/MariaDB复制的的拓扑结构

尽管pt-slave-find是个很好的主从拓扑发现工具,但命令行看起来毕竟不太直观,db-topology是我写的一个python小工具,可以以图方式直观显示MySQL/MariaDB的拓扑结构.如下图:
一主两从

级联复制

基本思路:
脚本里指定user和password,这个user能连上所有节点,且至少具有replication slavereplication clientprocess权限,然后递归找出它的所有master和slave.

依赖:
Python3
Connector/Python
NetworkX

使用方法

$ git clone https://github.com/leafonsword/db-topology.git
$ cd db-topology
$ chmod +x db-topology.py
修改db-topology.py里:
    user = 'yourname'
    password = 'yourpassword'
$ ./db-topology.py IP1:PORT [IP2:PORT ........]

然后当前目录下会生成一个force.json文件,这是绘图的数据素材,接下来用浏览器打开topology.html就能看到动态图了.