Filling merge identity ranges – unexpectedly!

Suppose you do a simple single row insert on a replicated table at the subscriber and find it fails with the message below:

The insert failed. It conflicted with an identity range check constraint in database ‘TestSub’, replicated table ‘dbo.tCity’, column ‘ID’.
If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange;
for the Subscriber, run the Distribution Agent or the Merge Agent.
The statement has been terminated.

The table is merge-replicated and we are using automatic range management of the identity column. OK – this might be normal behaviour. But what if we know we haven’t filled the whole range????

Running the code below returns the top city added.

select top 1 * from [dbo].[tCity]
order by id desc

While the merge check constraint on the table is:

CHECK NOT FOR REPLICATION (([ID]>(24501) AND [ID]<=(25001) OR [ID]>(25001) AND [ID]<=(25501)))

So – 24839 should be the next ID, is fine according to the check constraint and yet it is failing! Why?

According to select IDENT_CURRENT( ‘tCity’ ) the current value is 25504 so that is consistent with the check constraint complaining.

Well – this could be related to an insert that failed as explained in a previous article here. If this is not relevant, you should check the SQL Error log – if the SQL Service was unexpectedly restarted, the identity cache will have been flushed. The identity cache is a set of values held in memory for optimisation purposes. If SQL restarts unexpectedly all the unused identity values in the cache are lost, and SQL errs on the side of caution and resets the current identity value to be the maximum of the values it lost. You can find out more about this in the details of trace flag 272 here . We can configure it at the instance level using this trace flag or at the database level for SQL 2017 using scoped configurations.

If you want to play around with a repro there are a few around but I found this one easy – just implement replication on top of the setup.

In the case above we have 2 options – we can run the merge agent which will assign a new range, or we can reset the current identity value manually using DBCC CHECKIDENT. Remember we are talking about the subscriber here. If this is the publisher and the current identity value is the max of the current range we are lucky because the merge trigger will assign a new range for us directly. Otherwise we have to again run the merge agent, reset manually, or run sp_adjustpublisheridentityrange.

Thanks to Dave Wilce for pointing this one out.

Reducing merge traffic using sp_addmergearticle

OK – there are a few ways we might try to reduce the amount of merge traffic. The most common in my experience is by using filters. I’ll do a whole other post on static and dynamic filters and join filters later on. I used them a lot in the seismic industry to ensure only data relevant to each ship is sent down. Here I introduce 2 other interesting methods controlled in sp_addmergearticle.

(1) @subscriber_upload_options

This is the easiest. We are going to look at stopping some of the traffic going from the subscriber to the publisher. In fact we’ll look at how it can be stopped from being recorded all together.

The option is exposed in the SSMS gui when editing the publication articles. Slightly confusingly it is represented in 2 ways. You can select the checkbox below:

Or you select to set the “synchronization direction”. This way you’ll see all the available options. I’ll follow the numbering in sp_addmergearticle to be consistent. Option “0” is Bidirectional and is the default we are used to. Option “2” is identical to the checkbox above. Option “1” download-only but allowing subscriber changes – is the final option. So we have 3 options and only options 1 and 2 are anything different from default merge behavior.

If you use the checkbox or options 1 or 2, you’ll notice that the table icons are grayed out showing that there is some difference to the normal merge articles – can be quite handy for a quick check.

Remember that merge is all about triggers. So triggers are again used to enforce this at the subscriber.

These 2 new behaviors are set up by (option 1) having no triggers – we allow all changes at the subscriber but don’t record them in merge metadata tables:

Or (option 2) if we prohibit subscriber changes, we have no need to record them because there aren’t any. In this case there will be a new trigger at the subscriber which has the format below – to rollback any attempted changes.

CREATE trigger [dbo].[MSmerge_downloadonly_0584B82B40C543E8A8BF02D2AD6F3A11] on [dbo].[CitiesDownloadOnlyProhibit] for update, insert, delete
not for replication
as
set nocount on

if @@trancount > 0 rollback tran
raiserror (20063, 16, -1, ‘[dbo].[CitiesDownloadOnlyProhibit]’)

return

There’s a slight gotcha with all this. You might set it up and find that it all behaves like regular Bidirectional. You recheck the article properties, script it out etc and convince yourself that it is set up as option 1 or 2 though! It’s not a bug. In this case you have set up the subscription incorrectly. Download-only articles only work if the subscription is a “Client” one. The “Server” subscription is the default though so be aware of this.

(2) @delete_tracking

Unlike the subscriber_upload_options this is one of those properties we have to set as a parameter to sp_addmergearticle as it is not exposed in the SSMS publication properties GUI. I guess it is seen as a rarely used option.

So, when delete_tracking is set to false, rows deleted at the Subscriber are not removed at the Publisher, and rows deleted at the Publisher are not removed at the Subscriber.

This is implemented slightly differently. There is still a delete trigger – it just doesn’t add a record to msmerge_tombstone – so on sync there is no record of the delete.

(3) @partition_options

TBA

NB!!!

At this point it’s worth mentioning that the options (1) and (2) fly in the opposite direction of what we are normally trying to ensure – here we are creating non-convergence on purpose and when it comes to reinitialization we’ll lose these differences. Quite unlike filtering, which results in different data at the publisher and subscriber, but reinitializes the data without any losses. So we have to be aware that this is really for non-persistant data only.

Scheduling Validations

In merge we want to check that the data is in sync. This is done by setting up a validation:

Checking the rowcounts and the checksums is the only surefire way to get a comparison:

Nothing happens until we run the merge agent and synchronize. At that point if the data is out of sync the job fails and we’ll get an error in the merge agent history.

The best thing to do is to have a separate job which regularly triggers the validation. I have one which runs each Monday morning and has the following type of text in it:

exec sp_validatemergepublication @publication= ‘pubTestMerge’,  @level = 3

As a final point if the merge agent is run again afterwards, there is a successful sync and no mention of the invalid data.

So – perhaps the support team re-run the agent and the important info is lost. This sort of thing is done quite often because the merge agent might fail because of an innocuous network issue. Just something to be aware of.

Understanding Identity Ranges – when do they fill up?

For a table which is merge replicated and has an identity column what happens – how does it all work? Looking at the article properties we can see the range sizes allocated to the publisher and subscriber. Each one will get an allocated range with a different starting point to make sure they don’t overlap. Notice that the publisher range is a lot bigger than the subscriber but we can sort this out before publishing the table.

So – now for a quick question – what happens if I stop the merge agent from synchronizing and insert a single row in a loop at the publisher and do the same at the subscriber. Let’s say I try to insert 50,000 records this way. Remember the declared range sizes above are 10,000 and 1,000.

Do you have an idea in mind? Will it work or fail? If it fails, at what point will it fail?

The answer, which may be surprising:

  • Publisher will insert all 50,000 records
  • Subscriber will insert 2000 records then fail with the following message:

Msg 548, Level 16, State 2, Line 5
The insert failed. It conflicted with an identity range check constraint in database ‘TestRepSub’, replicated table ‘dbo.tCities’, column ‘ID’. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
The statement has been terminated.

This is interesting. First of all we can see that the range on the subscriber is actually twice the size we declared. This is always the case. Secondly, the publisher is able to assign a new range when needed on the fly, unlike the subscriber. For the subscriber we really need to synchronize to get the next range.

One final point is that this scenario involves single updates. You’ll have a big problem if your code does the following sort of logic to insert a lot of rows:

insert into tcities ([City])
select city from tcities

The relevant code to understand how this works/fails is in the merge triggers and check constraints (for those interested in looking a bit deeper). Note that for an insert, the check constraint is validated before the data change and after that the insert trigger fires. This is the order of processing and it is important, because a failed insert will still increase the IDENT_CURRENT value of the column. If we were at the boundary of the check constraint, after one failed insert all others will fail because the trigger never gets an opportunity to allocate a new range (the check constraint prevents further inserts because the next identity value is too high for the defined ranges). The only options are to refresh the publisher range by calling sp_adjustpublisheridentityrange or to run the merge agent which will do the same thing for you or to manually reset the current identity value using DBCC CHECKIDENT.

To avoid such issues on the publisher we can rework the logic to reduce the batch size. Make sure you hit the upper limit of the declared identity range and then do a single row insert and you’ll have a new range allocated. In practice, the only way to ensure this is by using single row inserts.

How to know that a conflict has happened?

Conflicts will happen in Merge Replication. There is no error on synchronization so we need to know when this occurs, before users complain! You can continually refresh the conflict viewer all day if this is your thing 🙂 or we find something slightly more sophisticated. So, I have used 2 methods:

(1) simply check the table “msmerge_conflicts_info” using a select statement. If there are records in the table send an email to the support team using sp_send_dbmail. This can easily be scripted into a regularly-running job.

(2) create a custom alert. This is not as daunting as it seems. Here is one below so you can see which counters to use…

The response will be an email to the operator defined for the alert – just have an operator for the support team. Now there are drop-downs in this interface there’s no problem creating this alert.

Merge Primary Key Conflicts – how do we fix them?

Suppose we have the publisher adds a record with a PK = ‘1’ and a subscriber also adds a record with a PK = ‘1’ and then they synchronize.

As far as merge is concerned these are different rows because they have different rowguid values.

The synchronization tells us there is an issue. No error recorded but we can see something is wrong:

So next we check the conflict viewer and see more info there;

You have 3 options in the conflict viewer;

Which one would you use to fix the issue? Each one removes the conflict from the conflict monitor. However – the next synchronization will simply recreate the conflict! A PK-PK conflict like this cannot be solved using these tools. You need to either update the PK value on one node or delete one of the rows, and then sync up. If there are loads of FK records hanging off this record this can become a huge pain as you’ll need to tackle these first. In fact the longer you leave it the bigger the problem becomes. So – keep an eye on the conflicts and resolve them ASAP. Alternatively add another column and use a composite PK to avoid the conflict in the first place :).

Partition Realignment – is this what you want?

Just a heads-up on part of the merge replication behavior that you need to be aware of, if you are not already :).

Suppose we have three tables – Region, Company and Sales. These are related by Primary Key-Foreign Key relationships such that Regions have Companies and Companies have Sales.

In the publication we create a filter so that Subscriber “Sub_North” gets the North Region records and subscriber “Sub_South” gets the “South” records.

Suppose we change the ACME company from the North to the South region….what happens? Well first of all subscriber Sub_South will now get the company ACME and all the associated ACME sales. Meanwhile, subscriber sub_North will have the ACME company deleted and all the associated ACME sales will be deleted. It is like all the data is moved from one subscriber to another. If you are new to merge replication this might catch you by surprise and we have to be aware that such a change near the top of a hierarchy can result in the same volume of traffic as a reinitialization.

It is all controlled by a parameter to sp_addmergepublication and sp_changemergepublication “@allow_partition_realignment”.

By default this parameter is set to ‘true’ meaning that deletes are sent to the subscriber when modification of the row on the publisher causes it to change its partition. ‘false’ leaves the data from an old partition on the Subscriber for historical purposes. However this is not any longer possible in practice to set it as ‘false’ as in this case you’ll have to redesign the whole publication and reset all articles to disable uploads.

So – although it isn’t practical to reset we just need to be aware of this behavior in our design :).

 

 

What is a Compensating Change?

When the merge agent synchronizes it might be the case that an insert, update or a delete cannot be applied at the partner. This could be because of a particular constraint or trigger.

What happens in this case. By default there is a conflict reported but no error. At this stage the data is out of sync and until we sort out the conflict it’ll stay like that.

There is an alternative though. The merge agent may send a compensating change back to the source partner to undo the failed change i.e. delete the source record in the case of a failed insert at the partner. This all occurs in the same synchronization session and there is no error.

In sp_addmergearticle there is a special parameter which can enable this behaviour: @compensate_for_errors. By default it is set to ‘false’.

This is not exposed in the SQL Server Management Studio so you’ll need to set it using scripts : either sp_changemergearticle or sp_addmergearticle.

When @compensate_for_errors is set to true, there will be a conflict registered in the conflict viewer. In the case of an insert being compensated by a delete, the conflict viewer will show “A row insert at ‘xxx’ could not be propagated to ‘yyy’. This failure can be caused by a constraint violation. …”. The offending row still exists at the xxx partner. On the next synchronization the row is removed from xxx by a compensating delete. We have data converged at this stage. Slight issue that the conflict still exists in the conflict viewer and needs manually removing but no big deal.