All posts by vmcd

vmcd

About vmcd

Phone: +86 18666668061 Email & Gtalk: ylouis83@gmail.com Personal Blog: http://www.vmcd.org 2010 Oracle Certified Database 10g Administrator Master SHOUG Member

使用Arrow生成一段连续的日期时间

日常工作中经常需要按天分表生成某一年的全部表,如下:

table_20150101
table_20150102
……
……
table_20151231

虽然Python自带的datetime能完成这样的工作,但有点繁琐,可能需要如下的代码:

import datetime
 
 dt = datetime.date(2015,1,1)
 end = datetime.date(2015,12,31)
 step = datetime.timedelta(days=1)
 
 result = []

 while dt < end:
 result.append(dt.strftime('%Y-%m-%d'))
 dt += step
 
 print(result)

Arrow这个第三方模块就能很方便的实现上诉场景,如下:

import arrow

start = arrow.get('2015-01-01','YYYY-MM-DD')
end = arrow.get('2015-12-31','YYYY-MM-DD')


for dt in arrow.Arrow.range('day', start, end):
 print(dt.format('YYYYMMDD'))

以上只是Arrow的一个小例子,但能感受到确实比datetime, time, calendar, dateutil等这些日期时间模块都方便,强烈推荐它!更多用法可以到软件主页去看。

使用MyDumper迁移InnoDB到TokuDB

最近几台微信买家相关数据库数据量增长迅猛,空间紧张,因而考虑迁移到压缩率超高的TokuDB,一个比较流行的迁移方法是用mysqldump导出同时修改InnoDB字符串到TokuDB:
mysqldump ...... | sed -e 's/^) ENGINE=MyISAM/) ENGINE=InnoDB/' > dump.sql

但这样导出的单一文件恢复时太慢,虽然有各种第三方方法能使mysqldump导出多个文件,但复杂了些;而mydumper因为并行特性是一个更理想的迁移工具,但0.6之前因为锁时间太长导致实际不可用,而0.6后大大降低了锁时间,因而这次我尝试使用MyDumper来迁移InnoDB到TokuDB,我的思路是:

因为MyDumper导出的文件事schema和data分开的,因而可以先导入schema,修改成TokuDB引擎后再将数据导入

以下是步骤:
1.导出文件
mydumper ...... -o /data/dump

2.将schema和data文件分离

<span class="hljs-title" style="font-weight: bold; color: #880000;">mkdir</span><span style="color: #000000;"> /</span><span class="hljs-typedef" style="color: #000000;"><span class="hljs-keyword" style="font-weight: bold;">data</span>/dump_schema 
</span><span class="hljs-title" style="font-weight: bold; color: #880000;">mv</span><span style="color: #000000;"> /</span><span class="hljs-typedef" style="color: #000000;"><span class="hljs-keyword" style="font-weight: bold;">data</span>/dump/*schema* /<span class="hljs-keyword" style="font-weight: bold;">data</span>/dump_schema 
</span><span class="hljs-title" style="font-weight: bold; color: #880000;">mv</span><span style="color: #000000;"> /</span><span class="hljs-typedef" style="color: #000000;"><span class="hljs-keyword" style="font-weight: bold;">data</span>/dump /<span class="hljs-keyword" style="font-weight: bold;">data</span>/dump_data 
</span><span class="hljs-title" style="font-weight: bold; color: #880000;">cp</span><span style="color: #000000;"> /</span><span class="hljs-typedef" style="color: #000000;"><span class="hljs-keyword" style="font-weight: bold;">data</span>/dump/metadata /<span class="hljs-keyword" style="font-weight: bold;">data</span>/dump_schema </span>

3.导入schema
myloader ...... -d /data/dump_schema

4.转换InnoDB到TokuDB
pt-find ...... --engine InnoDB --exec "ALTER TABLE %D.%N ENGINE=TokuDB"

5.导入data
myloader ...... -d /data/dump_data

linux tools part 10– Monitoring process iostat

We can use iopp tool to monitor process io performance

download from github : https://github.com/markwkm/iopp

pid: The process id.
rchar: The number of bytes which this task has caused to be read from storage.
wchar: The number of bytes which this task has caused, or shall cause to be written to disk.
syscr: Count of the number of read I/O operations.
syscw: Count of the number of write I/O operations.
rbytes rkb rmb reads: Count of the number of bytes which this process really did cause to be fetched from the storage layer.
wbytes wkb wmb writes: Count of the number of bytes which this process really did cause to be sent to the storage layer.
cwbytes cwkb cwmb cwrites: The number of bytes which this process caused to not happen, by truncating pagecache.
command: Filename of the executable.

There is one small bug , you’ll get different output from -i -k than you will from -i or from -i -m.

#iopp -i -k 2
QQ图片20150424145546
#grep ^ /sys/class/block/*/dev |grep 224

/sys/class/block/sdo/dev:8:224

#lsof |grep “8,224” |more

bash 5265 root cwd DIR 8,224 57151488 427819009 /data/yp900
 redis-ser 19843 root cwd DIR 8,224 4096 469368833 /data/redis6379
 sh 27579 root cwd DIR 8,224 4096 579731457 /data/mysql3306
 mysqld 28324 mysql cwd DIR 8,224 4096 579731459 /data/mysql3306/data
 mysqld 28324 mysql 1w REG 8,224 1327901 579743077 /data/mysql3306/mysql-error.log
 mysqld 28324 mysql 2w REG 8,224 1327901 579743077 /data/mysql3306/mysql-error.log
 mysqld 28324 mysql 3u REG 8,224 23520 579743078 /data/mysql3306/binlog/mysql-bin.index
 mysqld 28324 mysql 4uW REG 8,224 1073741824 579743083 /data/mysql3306/data/ibdata0
 mysqld 28324 mysql 5u REG 8,224 0 579731461 /data/mysql3306/mysql-tmpdir/ib1io6G9 (deleted)
 mysqld 28324 mysql 6u REG 8,224 0 579731462 /data/mysql3306/mysql-tmpdir/ibnkifvg (deleted)
 mysqld 28324 mysql 7u REG 8,224 0 579731463 /data/mysql3306/mysql-tmpdir/ibHWjojn (deleted)
 mysqld 28324 mysql 8u REG 8,224 0 579731464 /data/mysql3306/mysql-tmpdir/ib1o8yHC (deleted)
 mysqld 28324 mysql 9uW REG 8,224 1073741824 579743084 /data/mysql3306/data/ibdata1
 mysqld 28324 mysql 10uW REG 8,224 1073741824 579743085 /data/mysql3306/data/ibdata2
 mysqld 28324 mysql 11uW REG 8,224 27850178560 579743086 /data/mysql3306/data/ibdata3
 mysqld 28324 mysql 12uW REG 8,224 536870912 579743087 /data/mysql3306/data/ib_logfile0
 mysqld 28324 mysql 13uW REG 8,224 536870912 579743088 /data/mysql3306/data/ib_logfile1
 mysqld 28324 mysql 14uW REG 8,224 2281701376 579733990 /data/mysql3306/data/mit/agent_info#P#p20150404.ibd
 mysqld 28324 mysql 15u REG 8,224 0 579731465 /data/mysql3306/mysql-tmpdir/ibBZSSPL (deleted)

A nice smart tools with a little bugs :)

MariaDB : Bug when add index on Partition table

Version: 5.5.5-10.0.10-MariaDB-log MariaDB Server

When I use mariadb as multi-source slave DB I met a strange problem . Creating index on a vary large partition table , Mariadb do — copy to tmp table on the slave side. It takes a long time and still not finish over 9 hours.

Primary : MySQL 5.6.16 —

add index on a partition table:

CREATE TABLE `track_raw_wap211_log` (
 `table_id` varchar(100) DEFAULT NULL,
 `page_id` varchar(100) DEFAULT NULL,
 `banner_id` varchar(100) DEFAULT NULL,
 `button_id` varchar(100) DEFAULT NULL,
 `test_id` varchar(100) DEFAULT NULL,
 `classfication` varchar(100) DEFAULT NULL,
 `request_refer` varchar(100) DEFAULT NULL,
 `request_url` text,
 `title` varchar(100) DEFAULT NULL,
 `user_id` varchar(100) DEFAULT NULL,
 `language` varchar(100) DEFAULT NULL,
 `event` varchar(100) DEFAULT NULL,
 `event_desc` varchar(100) DEFAULT NULL,
 `event_type` varchar(100) DEFAULT NULL,
 `log_version` varchar(100) DEFAULT NULL,
 `project` varchar(100) DEFAULT NULL,
 `log_time` varchar(100) DEFAULT NULL,
 `unicookie` varchar(100) DEFAULT NULL,
 `session_id` varchar(100) DEFAULT NULL,
 `session_duration` varchar(100) DEFAULT NULL,
 `resolution` varchar(100) DEFAULT NULL,
 `channel` varchar(100) DEFAULT NULL,
 `refer` text,
 `user_agent` text,
 `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
 `httplogtime` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 `httpforwardip` varchar(100) DEFAULT NULL,
 `country` varchar(100) DEFAULT NULL,
 `province` varchar(100) DEFAULT NULL,
 `city` varchar(100) DEFAULT NULL,
 `area` varchar(100) DEFAULT NULL,
 `mapweidu` varchar(100) DEFAULT NULL,
 `mapjingdu` varchar(100) DEFAULT NULL,
 `sync_mysqlid` bigint(20) NOT NULL AUTO_INCREMENT,
 PRIMARY KEY (`sync_mysqlid`,`create_time`),
 KEY `ix_create_time` (`create_time`),
 KEY `ix_httplogtime` (`httplogtime`),
 KEY `ix_httplogtime_userid` (`httplogtime`,`user_id`)
 ) ENGINE=InnoDB AUTO_INCREMENT=23217118 DEFAULT CHARSET=utf8
 /*!50100 PARTITION BY RANGE ( UNIX_TIMESTAMP(`create_time`))
 (PARTITION p20150318 VALUES LESS THAN (1426694400) ENGINE = InnoDB,
 PARTITION p20150319 VALUES LESS THAN (1426780800) ENGINE = InnoDB,
 PARTITION p20150320 VALUES LESS THAN (1426867200) ENGINE = InnoDB,
 PARTITION p20150321 VALUES LESS THAN (1426953600) ENGINE = InnoDB,
 PARTITION p20150322 VALUES LESS THAN (1427040000) ENGINE = InnoDB,
 PARTITION p20150323 VALUES LESS THAN (1427126400) ENGINE = InnoDB,
 PARTITION p20150324 VALUES LESS THAN (1427212800) ENGINE = InnoDB,
 PARTITION p20150325 VALUES LESS THAN (1427299200) ENGINE = InnoDB,
 PARTITION p20150326 VALUES LESS THAN (1427385600) ENGINE = InnoDB,
 PARTITION p20150327 VALUES LESS THAN (1427472000) ENGINE = InnoDB,
 PARTITION p20150328 VALUES LESS THAN (1427558400) ENGINE = InnoDB,
 PARTITION p20150329 VALUES LESS THAN (1427644800) ENGINE = InnoDB,
 PARTITION p20150330 VALUES LESS THAN (1427731200) ENGINE = InnoDB,
 PARTITION p20150331 VALUES LESS THAN (1427817600) ENGINE = InnoDB,
 PARTITION p20150401 VALUES LESS THAN (1427904000) ENGINE = InnoDB,
 PARTITION p20150402 VALUES LESS THAN (1427990400) ENGINE = InnoDB,
 PARTITION p20150403 VALUES LESS THAN (1428076800) ENGINE = InnoDB,
 PARTITION p20150404 VALUES LESS THAN (1428163200) ENGINE = InnoDB,
 PARTITION p20150405 VALUES LESS THAN (1428249600) ENGINE = InnoDB,
 PARTITION p20150406 VALUES LESS THAN (1428336000) ENGINE = InnoDB,
 PARTITION p20150407 VALUES LESS THAN (1428422400) ENGINE = InnoDB,
 PARTITION p20150408 VALUES LESS THAN (1428508800) ENGINE = InnoDB,
 PARTITION p20150409 VALUES LESS THAN (1428595200) ENGINE = InnoDB,
 PARTITION p20150410 VALUES LESS THAN (1428681600) ENGINE = InnoDB,
 PARTITION p20150411 VALUES LESS THAN (1428768000) ENGINE = InnoDB,
 PARTITION p20150412 VALUES LESS THAN (1428854400) ENGINE = InnoDB,
 PARTITION p20150413 VALUES LESS THAN (1428940800) ENGINE = InnoDB,
 PARTITION p20150414 VALUES LESS THAN (1429027200) ENGINE = InnoDB,
 PARTITION p20150415 VALUES LESS THAN (1429113600) ENGINE = InnoDB,
 PARTITION p20150416 VALUES LESS THAN (1429200000) ENGINE = InnoDB,
 PARTITION p20150417 VALUES LESS THAN (1429286400) ENGINE = InnoDB,
 PARTITION p20150418 VALUES LESS THAN (1429372800) ENGINE = InnoDB,
 PARTITION p20150419 VALUES LESS THAN (1429459200) ENGINE = InnoDB,
 PARTITION p20150420 VALUES LESS THAN (1429545600) ENGINE = InnoDB,
 PARTITION p20150421 VALUES LESS THAN (1429632000) ENGINE = InnoDB,
 PARTITION p20150422 VALUES LESS THAN (1429718400) ENGINE = InnoDB,
 PARTITION p20150423 VALUES LESS THAN (1429804800) ENGINE = InnoDB,
 PARTITION p20150424 VALUES LESS THAN (1429891200) ENGINE = InnoDB,
 PARTITION p20150425 VALUES LESS THAN (1429977600) ENGINE = InnoDB,
 PARTITION p20150426 VALUES LESS THAN (1430064000) ENGINE = InnoDB,
 PARTITION p20150427 VALUES LESS THAN (1430150400) ENGINE = InnoDB,
 PARTITION p20150428 VALUES LESS THAN (1430236800) ENGINE = InnoDB,
 PARTITION p20150429 VALUES LESS THAN (1430323200) ENGINE = InnoDB,
 PARTITION p20150430 VALUES LESS THAN (1430409600) ENGINE = InnoDB,
 PARTITION p20150501 VALUES LESS THAN (1430496000) ENGINE = InnoDB,
 PARTITION p20150502 VALUES LESS THAN (1430582400) ENGINE = InnoDB,
 PARTITION p20150503 VALUES LESS THAN (1430668800) ENGINE = InnoDB,
 PARTITION p20150504 VALUES LESS THAN (1430755200) ENGINE = InnoDB,
 PARTITION p20150505 VALUES LESS THAN (1430841600) ENGINE = InnoDB,
 PARTITION p20150506 VALUES LESS THAN (1430928000) ENGINE = InnoDB,
 PARTITION p20150507 VALUES LESS THAN (1431014400) ENGINE = InnoDB,
 PARTITION p20150508 VALUES LESS THAN (1431100800) ENGINE = InnoDB,
 PARTITION p20150509 VALUES LESS THAN (1431187200) ENGINE = InnoDB,
 PARTITION p20150510 VALUES LESS THAN (1431273600) ENGINE = InnoDB,
 PARTITION p20150511 VALUES LESS THAN (1431360000) ENGINE = InnoDB) */ 

create index idx_tmp on track_raw_wap211_log (log_time);

Slave: Mariadb – 5.5.5-10.0.15-MariaDB-log MariaDB Server

21407 | root | localhost | jkgj_log | Query | 34 | copy to tmp table | create index idx_tmp on track_raw_wap211_log (log_time) | 0.000 |  

drop index command still has this problem.

——————————————————————

When I repeat these steps on MariaDB – 10.0.16 everything is OK.

This bug has been reported to MariaDB:

https://mariadb.atlassian.net/browse/MDEV-8038

Design for “seckilling ” on MySQL

Today I read a nice article , which give a new design for mysql database . In this article , Author (Mr.lou) uses OneSQL (a branch of MySQL) for the whole test.

you can read from : http://www.onexsoft.com/?page_id=378

This article explain how to use mysql in seckilling scenario (especially for E-Commerce commany )

If you want to use transaction in RDBMS database, you must manual start transaction and commit in the end of your operation.
So,commit will be cost of the total process, round-trip cost will be the most important cost as author said

(For simple update operations, without regard to conflict of IO and locks, a statement execution time is about 0.1ms, network delay under normal conditions is 0.4-0.8ms, that the waiting time is longer than the transaction commit time to really notice the SQL execution several times, showing that the network is a single-line transaction update performance killer)

How to avoid this ? We can start a transaction to manually control SQL commit. When autocommit=1 was set on MySQL server implicitly commit will happen after every SQLs.

An explicit transaction is when statement(s) are wrapped within an explicitly defined transaction code block – for MySQL, that’s START TRANSACTION. It also requires an explicitly made COMMIT or ROLLBACK statement at the end of the transaction. Nested transactions is beyond the scope of this topic.

Implicit transactions are slightly different from explicit ones. Implicit transactions do not require explicity defining a transaction. However, like explicit transactions they require a COMMIT or ROLLBACK statement to be supplied.

Using OneSQL ,Commit operation can be done on MySQL server side to avoid network round-trip. In total logical process, everything will be controlled by yourself .

In seckilling scenario, this optimization will give a tremendous performance promotion.

Actually ,we can also use Redis or memcached for this situation, Redis CAS feature witch MULTI in serial process will be a good choice.

Best practice – try to offload request before database. Order creation should be asynchronous (put operations in a serial async queue is the best choice)

基于消息队列的DW构建

基于消息队列的DW构建

多数公司在拆分数据库的同时需要考虑到汇总的问题,因为很多业务涉及到聚合,分类的需求。

可能遇到的问题如下:

1. 如何实时的获取聚合数据
2. 如何构建数据仓库以支持后续的分析。

我的上家公司采取的策略如下

在应用与数据库之间构造中间层聚合,在所有获取数据端通过内存运算得到业务需求,通过细分SQL,拆分SQL到不同节点从而保证数据的正确性,这点与TDDL也是类似的。

数仓层面,通过自定义ETL,从所有数据源抽取数据,落入Exadata集群,离线日志等数据落入hadoop集群。从而区分不同时间维度,差异类别需求的分析。

这里给出一个新的思路方案

结合Canal+roma (之前发过PDF) 通过消费roma的实时消息队列,构建一个实时的DW,因此一些简单的实时查询可以直接在DW上实现,而不需要依赖中间层的内存聚合。

DW可以采用MySQL实现,与前端MySQL采用源生复制技术,从而简化了整个ETL抽取过程。

这个架构的缺点在于:MySQL应对海量数据的时候能力不足,所以后面还要接入hadoop,hive等分析平台来处理各种日志类型数据。

目前一般的BI系统都是基于商业软件来实现的。对于中小型公司,这种架构可以节省很大的成本。

Client端的消息聚合:

etl_roma

拆分以及汇总:

split1split2

整体架构:

DW_arch