Making Sense of Group Replication Consistency Levels

From the initial release, one of the biggest complaints I had about Group Replication is that it allowed “stale” reads and there was no way to prevent them or to even know that you read “stale” data. That was a huge limitation. Thankfully, Oracle released features to control the consistency levels , and it was exactly a year ago! I don’t know about you, but I personally was confused by naming it group_replication_consistency=’AFTER’ or ‘BEFORE’.

So now I want to try to make sense of it and share my understanding (even if it is one year later).

Setup:

We will start with the default group_replication_consistency=’EVENTUAL’ and work from there. So let’s consider a very simple table:

CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cnt` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
)

With over 10mln rows:

select count(*) from t1;
+----------+
| count(*) |
+----------+
| 10485760 |
+----------+

And we will do a very simple action. On Node 1 we will update the table and on Node 2 we will read data.

Node 1:

UPDATE t1 SET cnt=11;
Query OK, 10485760 rows affected (1 min 20.23 sec)
Rows matched: 10485760  Changed: 10485760  Warnings: 0

And immediately after that on the Node 2:

select cnt from t1 where id=10001;
+------+
| cnt  |
+------+
|   10 |
+------+
1 row in set (0.00 sec)

There are a few points to note:

  • On Node 1 it took 1min 20 sec for the transaction to execute
  • On Node 2 we got the result immediately, but we essentially got a “stale” read in that data was already updated, but we got an old version of it.

How can we get a better outcome:

group_replication_consistency=’BEFORE’;

Let’s look into the consistency level ‘BEFORE’. It says that a transaction on Node 2 will wait until the previously committed transaction on the Node 1 transaction is committed also on Node 2. So let’s change the set session to group_replication_consistency=’BEFORE’; on Node 2 and execute a similar update again.

Node 1 (still group_replication_consistency=’EVENTUAL’;):

UPDATE t1 SET cnt=12;
Query OK, 10485760 rows affected (1 min 18.66 sec)
Rows matched: 10485760  Changed: 10485760  Warnings: 0

And after this, on Node 2 (with set session group_replication_consistency=’BEFORE’;):

select cnt from t1 where id=10001;
+------+
| cnt  |
+------+
|   12 |
+------+
1 row in set (1 min 11.45 sec)

So there are very notable changes:

  • The transaction on Node 2 returned the correct result now.
  • But now it took 1 min 11 sec to return it (instead of 0sec as in the previous case). Basically, the transaction waited on the transaction from Node 1 to be applied on Node 2
  • The execution time on Node 1 did not change.

This mode allowed us to do exactly what we wanted – prevent stale reads. Great outcome! So what about group_replication_consistency=’AFTER’ ?

group_replication_consistency=’AFTER’;

This mode says that a transaction on Node 1 will not return OK until it makes sure that other nodes applied the transaction. To see it in action, we will put Node 1 into group_replication_consistency=’AFTER’ and Node 2 into group_replication_consistency=’EVENTUAL’;.

Node 1 (in group_replication_consistency=’AFTER’):

UPDATE t1 SET cnt=13;
Query OK, 10485760 rows affected (3 min 0.46 sec)
Rows matched: 10485760  Changed: 10485760  Warnings: 0

After that, Node 2 (in group_replication_consistency=’EVENTUAL’):

select cnt from t1 where id=10001;
+------+
| cnt  |
+------+
|   13 |
+------+
1 row in set (0.00 sec)

Here the situation is different. Now the execution time on Node 1 doubled, as the transaction waits to be committed on all nodes and after that on Node 2 the execution is immediate.

This mode still avoids “stale” reads, but in this case, we shifted the wait time from Node 2 to Node 1, and this is how we can view the difference between ‘BEFORE’ and ‘AFTER’ consistency modes in Group Replication.

Both modes provide a consistent view, but:

  • In ‘BEFORE’ mode: the readers on secondary nodes will be blocked, waiting on the moment when the consistent view is available, and
  • In ‘AFTER’ mode the writers are blocked till other nodes get a consistent view.

So which mode to choose? Actually I think it is good to have an option here. You can choose, if you want, to put wait time on your readers or on your writers; the decision is up to how your application is designed.

我来评几句
登录后评论

已发表评论数()

相关站点

热门文章