Scheduling Validations

In merge we want to check that the data is in sync. This is done by setting up a validation:

Checking the rowcounts and the checksums is the only surefire way to get a comparison:

Nothing happens until we run the merge agent and synchronize. At that point if the data is out of sync the job fails and we’ll get an error in the merge agent history.

The best thing to do is to have a separate job which regularly triggers the validation. I have one which runs each Monday morning and has the following type of text in it:

exec sp_validatemergepublication @publication= ‘pubTestMerge’,  @level = 3

As a final point if the merge agent is run again afterwards, there is a successful sync and no mention of the invalid data.

So – perhaps the support team re-run the agent and the important info is lost. This sort of thing is done quite often because the merge agent might fail because of an innocuous network issue. Just something to be aware of.

Peer-to-Peer – the “hidden” column

Sometimes it’s useful to know how peer-to-peer really works under the bonnet. This is for troubleshooting. Or just out of interest if you’re feeling geeky :). To see how rows are tracked we’ll need to make sure the dedicated admin connection is working:

exec sp_configure ‘show advanced options’, 1
reconfigure
exec sp_configure ‘remote admin connections’, 1
reconfigure

Connect up to your SQL instance using the DAC eg Admin:Servername

Now you’ll have access to a special hidden column: $sys_p2p_cd_id. In the case of my table tRegion I run:

select *, $sys_p2p_cd_id as HiddenColumn from [TestP2Pdb]..tRegion

Lo and behold I can see some funky numbers in this column….

To find out what it means, and how this data is used, we run the following query:

select originator_node, res.* from
(
select *, $sys_p2p_cd_id as HiddenColumn
,sys.fn_replvarbintoint($sys_p2p_cd_id) as OriginatorID
,sys.fn_replp2pversiontotranid($sys_p2p_cd_id) as TranID
from tregion
) res
inner join dbo.MSpeer_originatorid_history poh
on res.OriginatorID = poh.originator_id

The query above reflects the fact that the hidden column $sys_p2p_cd_id actually holds 2 separate bits of info: the node which made the change and the id of the transaction itself.

The internal function sys.fn_replvarbintoint is particularly useful here as this returns the Originator_ID. In the query above I have joined the inner results to the MSpeer_originatorid_history table in order to get the actual peer hostname (originator_node) so you can see what is going on, but the numerical Originator_ID is still important itself. This is because it is used in the conflict resolution if p2p_continue_onconflict is set to true – the Originator_ID with the highest value wins in a conflict.

Looking a bit deeper…

At this stage you might ask how can this possibly work? If the TranID changes when a row changes, the value in $sys_p2p_cd_id must change, so the value of $sys_p2p_cd_id between one changed node and another (not changed) will naturally be different. If we look for differences in $sys_p2p_cd_id as an indication that there is a conflict, then all data changes should produce such a conflict!!!

Actually, the log reader here does something quite clever. When the data change is first made, the old and the new values of the $sys_p2p_cd_id column are stored in the MSRepl_Commands table. We can use sp_browsereplcmds to see that these values are added to the arguments for the internal stored procedure sp_MSupd_dbotRegion. This way the old value of $sys_p2p_cd_id can be checked against the value at the destination, because this should still be identical. If it isn’t then there is a conflict and we then go on to take a look at the 2 Originator_IDs and check for the highest value which will predominate.

Peer-to-Peer – interesting insert-insert conflicts

Understanding conflict resolution is important in peer-to-peer if you intend to enable it. There is an argument that you should enable conflict detection and not the resolution. Knowing that a conflict has happened allows you to investigate and then to initialize one of the nodes after plugging the hole in the application logic. Even so, some people have conflict resolution enabled. It certainly helps with update-update conflicts (provided you assign the OriginatorIDs correctly – see related post on this site). For insert-insert conflicts you’d expect a failure which can’t be resolved. That’s what used to happen.

Consider a simple table tRegion. We have a PK on the RegionID column and one site inserts “Rome” and the other node “Paris”. Both have a RegionID value of 2 so there will be a conflict when we synchronize.

Publisher:

Subscriber:

Synchronization at each site looks pretty innocuous:

The nodes are defined as below – the publisher has an OriginatorID of 100 and the subscriber a value of 1 so the publisher will win any conflicts.

The conflict viewers show us an interesting story:

The crucial bit is that “The incoming insert was applied to peer 1”!!! In other words we now have data in sync and both regions are “Rome”. How is this possible?

Well the magic is in the insert stored procedure -“sp_MSins_dbotRegion1335813640” in this case. It will convert the insert into an update for the lower OriginatorID where there is an insert-insert conflict :).

Note that this could have unintended consequences! Foreign key records continue to point at this PK record. We could have eg the case that all the Paris streets added at the subscriber now belong to the Rome region. So – be very careful if you are going to use conflict resolution!

Replicating Indexes, Constraints, Triggers and so on

One thing we need to be sure of is that the subscriber and publisher “look” the same. This is from the DBA point of view. Are all aspects of the table the same at the subscriber?

We seem to be able to control this in the article properties. The defaults say that the FKs, Indexes, Triggers, Defaults etc go to the subscriber:

There is a related setting on the subscriber properties for the publication that says we want to replicate schema changes – by default set to true:

However, note that after initialization when we make a change on the publisher, not everything will go down. We can see what is ready to go in sysmergeschemachange. For the main changes we need to remember:

  • Replicated to the Subscriber: Primary Keys, Foreign Keys, Check Constraints, Default Constraints
  • Not Replicated to the Subscriber: Indexes, Triggers

So – be careful to manually keep the subscriber in sync when it comes to indexes and triggers!

Understanding Identity Ranges – when do they fill up?

For a table which is merge replicated and has an identity column what happens – how does it all work? Looking at the article properties we can see the range sizes allocated to the publisher and subscriber. Each one will get an allocated range with a different starting point to make sure they don’t overlap. Notice that the publisher range is a lot bigger than the subscriber but we can sort this out before publishing the table.

So – now for a quick question – what happens if I stop the merge agent from synchronizing and insert a single row in a loop at the publisher and do the same at the subscriber. Let’s say I try to insert 50,000 records this way. Remember the declared range sizes above are 10,000 and 1,000.

Do you have an idea in mind? Will it work or fail? If it fails, at what point will it fail?

The answer, which may be surprising:

  • Publisher will insert all 50,000 records
  • Subscriber will insert 2000 records then fail with the following message:

Msg 548, Level 16, State 2, Line 5
The insert failed. It conflicted with an identity range check constraint in database ‘TestRepSub’, replicated table ‘dbo.tCities’, column ‘ID’. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
The statement has been terminated.

This is interesting. First of all we can see that the range on the subscriber is actually twice the size we declared. This is always the case. Secondly, the publisher is able to assign a new range when needed on the fly, unlike the subscriber. For the subscriber we really need to synchronize to get the next range.

The logic for this is all in the merge triggers and check constraints for those interested.

One final point is that this scenario involves single updates. You’ll have a big problem if your code does the following sort of logic to insert a lot of rows:

insert into tcities ([City])
select city from tcities

Such an insert will fail even on the publisher and requires reworking of the logic to reduce the batch size. Make sure you hit the upper limit of the declared identity range in the insert and you’ll get a new range allocated. In reality, the only way to ensure this is by using single row inserts.

Managing Peer to Peer Identity Ranges

If you’re replicating using the peer-to-peer transactional setup, at some stage you’ll want to replicate a table which has identity columns. This is not prohibited but it comes with a warning from Microsoft: “We do not recommend the use of identity columns. When using identities, you must manually manage the ranges assigned to the tables at each participating database.”. This means that out of the box there is no support for identity columns so you’ll have to be careful to avoid any issues……

(1) Reseed the identity column at each peer

We can reseed the identity range manually using DBCC CHECKIDENT. With a little imagination you can assign ranges using the seeds which will never overlap, or use more sophisticated methods based on the seed and increment like e.g. odd and even allocations or even positive and negative allocations. While this works well, there are some obvious issues:

  1. the solution requires a lot of thought if there are many subscribers, in order to assign mutually-exclusive ranges
  2. there is a lack of transparency. Which peers have which ranges? Where is it all being managed from – a spreadsheet somewhere?
  3. Setting up new peers always involves manually reseeding the range.
(2) Use a composite key

The PK could be composed of 2 columns: the identity column and additionally a LocationID column. This means that the identity column is allowed to have overlapping values from different peers and there is no manual reseeding required – the default seed and increment of (1,1) is fine. The combination of the Identity column and the LocationID is always unique across peers, because each peer has a different LocationID. The added benefit is that the record is immediately identifiable in terms of the originator as we simply refer to the value of the LocationID.

(3) Use a persisted computed column

This is similar to the solution above and I came across this from the excellent RBI Dev DBA Group. In this case we now have three columns: the Identity column, the LocationID column and a third persisted computed column that combines the two:

LocationID ID PK_ID
10000 1 10000+1=10001
10000 2 10000+2=10002

The PK_ID is the persisted computed column and is set as the primary key. Now we have a narrow primary key and also a guarantee of uniqueness across peers. To be honest I haven’t used persisted computed columns much and at first I thought that it wouldn’t work because the replication-generated insert stored procedure wouldn’t cater for the automatically-computed PK value, or that the updates to LocationID would be “confused” on synchronization with the PK itself having changed. Anyway, I tested it and it all works fine. The persisted column formula was simply the LocationID + Identity – with a large LocationID as a seed to ensure no overlap.

Conceptually this is very similar to (1) with the LocationID performing a similar function to the seed of the DBCC CHECKIDENT. However IMO it is far more preferable as it is completely transparent. If you chose to implement this solution you’ll just need to allocate a different default LocationID to each peer and ensure they don’t overlap. This can be done via an additional mapping table maintained in the publication which maps HostName to LocationID, and using a scalar function as the default for the LocationID column which performs the lookup. This means that a backup from one server can be used as a starting point for another peer with no intervention at all assuming the mapping table is already populated.

How fast is my Transactional replication throughout the day?

If you want to test the speed of your Transactional replication there is a neat functionality called Tracer Tokens. This will tell you how quick is it to get a record from the publisher to the distributor and from there to the subscriber. In the old days you’d have to create a dummy table to do all this but since SQL 2005 it’s out of the box :).

To add a tracer token use the replication monitor and edit the Transactional publication as below:

Basically, a token is written to the transaction log of the publisher. This is treated as a normal replicated command, and passes from the transaction log to the distribution database and then is ‘run’ by the distribution agent.

You’ll see the various latency times recorded for each subscription. If you have a distribution agent set to work on a schedule (i.e. not running continuously) you’ll see the following sort of scenario until the distribution agent runs.

To add one programatically for reporting purposes just roll the following type of script:

DECLARE @myTokenID AS int
EXEC sp_posttracertoken
@publication = ‘YourPubName’,
@tracer_token_id = @myTokenID OUTPUT;
select @myTokenID 

Trapping the TokenID is actually optional but is very useful if you want to check on the resulting latencies for this particular token. So,

EXEC sp_helptracertokenhistory
@publication = ‘YourPubName’,
@tracer_id = -2147483641 

where -2147483641 is the ID of the tracer token.

 

Or just roll your own report based on;

select * from distribution..MStracer_tokens

This way you can create a custom alert so the support team know when there is an issue. There are SQL Agent alerts based on the Log Reader Agent and Distribution Agents which can be configured, but the method above is a little more under our control and can be used for reporting purposes. This way you can add a Tracer Token every 10 minutes and plot performance over the course of a day.

How to know that a conflict has happened?

Conflicts will happen in Merge Replication. There is no error on synchronization so we need to know when this occurs, before users complain! You can continually refresh the conflict viewer all day if this is your thing 🙂 or we find something slightly more sophisticated. So, I have used 2 methods:

(1) simply check the table “msmerge_conflicts_info” using a select statement. If there are records in the table send an email to the support team using sp_send_dbmail. This can easily be scripted into a regularly-running job.

(2) create a custom alert. This is not as daunting as it seems. Here is one below so you can see which counters to use…

The response will be an email to the operator defined for the alert – just have an operator for the support team. Now there are drop-downs in this interface there’s no problem creating this alert.