现在有个表t
mysql> desc t; +----------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+--------------+------+-----+---------+----------------+ | owner | varchar(30) | YES | | NULL | | | object_name | varchar(128) | YES | | NULL | | | subobject_name | varchar(30) | YES | | NULL | | | object_id | int | NO | PRI | NULL | auto_increment | | data_object_id | int | YES | | NULL | | | object_type | varchar(19) | YES | | NULL | | | created | datetime | YES | | NULL | | | last_ddl_time | datetime | YES | | NULL | | | timestamp | varchar(19) | YES | | NULL | | | status | varchar(7) | YES | | NULL | | | temporary | varchar(1) | YES | | NULL | | | generate | varchar(1) | YES | | NULL | | | secondary | varchar(1) | YES | | NULL | | | namespace | int | YES | | NULL | | | edition_name | varchar(30) | YES | | NULL | | +----------------+--------------+------+-----+---------+----------------+ 15 rows in set (0.00 sec)
一共有116w行数据
mysql> select count(*) from t; +----------+ | count(*) | +----------+ | 1167136 | +----------+ 1 row in set (0.17 sec)
想要执行update t set owner='SB';
肯定不能直接跑上面SQL,其一,这是一个大事务,会导致主从延迟,其次,SQL没有where过滤条件,会把整个表锁住
MySQL可以利用主键切片的方法对上面SQL进行切片: https://blog.csdn.net/robinson1988/article/details/106007292
对表根据主键切片之后,可以将数据分为多份,然后开多个窗口并行执行,这样就能加快UPDATE执行速度
下面是Python全自动主键切片+并行执行脚本,脚本里面是将数据切分为4分,开4个并行进程
from multiprocessing import Pool import pymysql import time import os import io def processData(txt): print('开始执行:', time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())) command='mysql -uroot -poracle -Dtest <'+txt os.system(command) print('执行完成:', time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())) return if __name__ == '__main__': conn = pymysql.connect("192.168.56.10", "scott", "tiger", "test") cur = conn.cursor() sql = "SELECT concat('update t set owner= ''SB'' where object_id>',avg_row * (n - 1),' and object_id<=', avg_row * n, ';') split_sql " \ "FROM (SELECT n, min_id, max_id, ceil(max_id / 100) avg_row FROM (WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n + 1 FROM x WHERE n < 100" \ ") SELECT * FROM x) a, (SELECT min(object_id) min_id FROM t) b, (SELECT max(object_id) max_id FROM t) c ) a" cur.execute(sql) rows = [] result = cur.fetchall() slice0 = open('slice0.txt', 'w', newline='', encoding='utf8') slice1 = open('slice1.txt', 'w', newline='', encoding='utf8') slice2 = open('slice2.txt', 'w', newline='', encoding='utf8') slice3 = open('slice3.txt', 'w', newline='', encoding='utf8') for i in range(1,len(result)+1): if i%4==0: slice0.writelines(result[i - 1][0] + '\n') elif i%4==1: slice1.writelines(result[i - 1][0] + '\n') elif i%4==2: slice2.writelines(result[i - 1][0] + '\n') elif i%4==3: slice3.writelines(result[i - 1][0] + '\n') slice0.close() slice1.close() slice2.close() slice3.close() cur.close() conn.close() with Pool(4) as pool: pool.map(processData,['slice0.txt','slice1.txt','slice2.txt','slice3.txt'])
单个窗口跑:
mysql> update t set owner='NC'; Query OK, 1167136 rows affected (40.34 sec) Rows matched: 1167136 Changed: 1167136 Warnings: 0
根据主键切片开并行跑:
[root@server ~]# python3 update.py 开始执行: 2020-05-29 15:53:38 开始执行: 2020-05-29 15:53:38 开始执行: 2020-05-29 15:53:38 开始执行: 2020-05-29 15:53:38 mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. mysql: [Warning] Using a password on the command line interface can be insecure. 执行完成: 2020-05-29 15:53:49 执行完成: 2020-05-29 15:53:49 执行完成: 2020-05-29 15:53:49 执行完成: 2020-05-29 15:53:50
可以看到开4个并行进程,总共只需要10秒钟就跑完了,单窗口跑要40秒,正好提升4倍性能
最后,要开多少个并行进程跟你机器CPU有关,不建议开超过CPU CORE这么多个并行进程
我来评几句
登录后评论已发表评论数()