SQL Data Sync – Group Changes and Schema Changes

As promised, here is the 2nd article in my series based on SQL Data Sync. SQL Data Sync is the replication framework provided as a cloud service in Azure. You can find more background details in my article on Conflicts in SQL Data Sync and I’ll do another article on the basics of setting it all up later on. In this article we’ll look at various Sync Group and schema changes – adding and removing tables and making changes to existing tables – and see how this works and what we have to be careful with.

Adding a new table

Just select the ellipsis button on the tables icon then put a tick in the box of the new table article. If it isn’t shown there you can refresh the schema.

Nothing happens immediately – you’ll need to Sync up for the additional table to propagate to the Member servers.

If you need to check exactly what is in the Sync group – tables, columns, triggers etc –  you can run this script at the Member :

SELECT config_data
FROM [DataSync].[scope_config_dss]

In our case this will return an xml string which can be opened in SSMS to show that the new table has been added to the group.

We have to be a little careful here – if the table already exists on the Member, it will not be dropped and recreated, unlike during the default application of a snapshot in traditional replication. There is no option to force a drop and create. SQL Data Sync assumes that if the table is already there, the data is in sync. So if there are data discrepancies they will remain in the Member table and can cause problems later on.

You could fix the data yourself provided you know how to find the data discrepancies. Updating all the rows on the Hub will add in any missing ones at the Member, and update any existing but different rows. Extra rows at the Member will be harder to sort out in a simple fashion.

So unless you are 100% sure you understand how to find and fix all data differences between the hub and the Member, it is safest to delete the table at the Member and then add it in from scratch by relying on an initial Sync.

As mentioned above, if the table doesn’t already exist on the Member, it will be created from scratch and populated, so all is well.

Removing a Table

Deselecting a table from the Sync Group and starting a Sync will remove the user table triggers and the tracking table at the Member. The user table itself is not removed from the Member.

Adding a Column

The new column must be added on the Hub via an Alter Table command. Making SQL Data Sync aware of this schema change is not automatic – there’s no equivalent of @replicate_ddl with SQL Data Sync. You can create a DDL trigger to alert you about schema changes if you are concerned it’ll happen without you being aware of it (details here). Once the extra column is added – in this example I add it to the Region table – you’ll need to let SQL Data Sync know it has been added. But we must first add the extra column to the Member servers as well as the Hub – all manually.

Once the extra column has been added, in the portal we click on “Refresh Schema” and put a tick in the box for the new column “testextracol”.

After running a Sync, changes will propagate as expected, because the new column is part of the SQL Data Sync schema. We can check that this is the case as before using:

SELECT config_data
FROM [DataSync].[scope_config_dss]

If you add the column on the Hub and then forget to add it to the Sync Group there is no real issue. The Sync Group treats it like a vertical partition. And remember that the tracking table only really holds the PK values – not all the other columns and data.

Removing a Column

Removing a column in the portal changes the scope_config – the table used when the comparisons are made at a Sync. The tracking table holds the PK values only, so the column removal doesn’t make a big change. Remember however that we are really changing the metadata of the Sync Group. Removing a column in the portal will not remove the actual column in the table at the Member. This is all well if the column is NULLable, but if eg I remove the RegionDescription column at the Hub and then do an insert, the insert will fail during the Sync as mentioned below. So the simplest solution is that either we replicate the whole table, or the schemas at the Member should match what is defined on the portal.

Changing a Column

Changing datatypes can be troublesome sometimes. Suppose on the Hub I increase the RegionDescription column from varchar(50) to varchar(500). Then I edit a RegionDescription value to be 100 characters long. Syncing up will result in a truncation error.

Although not mentioned explicitly 8152 means “String or binary data would be truncated”. So – we have to either keep the schemas in sync, or the data has to match the lowest column specifications.

Well – that’s all for now. I’ll add in any other schema change idiosyncrasies as I come across them.

SQL Data Sync and Conflicts

Hi all. I just wanted to do a few articles around the use of SQL Data Sync as most of my work these days seems to be based on the cloud versions of SQL in one form or another. For those who don’t know, SQL Data Sync is the native bi-directional replication framework which is provided for use in SQL Azure.

Data is replicated between a hub (publisher) and a member (subscriber). Funny how the groups who work on replication, data sync, log shipping and failover groups all choose quite different names for these same logical concepts! For those who haven’t seen SQL Data Sync before is a bit like a cross between merge and peer-to-peer transactional replication but it has a few extra idiosyncrasies. First of all the data changes are logged by insert, update and delete triggers to audit tables – one audit table for each user table. There is no requirement to have a rowguid column but each table must have a primary key. The sync framework which is used is provided as a service in Azure SQL so we have done away with the idea of traditional agents. The synchronization can be triggered manually in the portal or set up with a schedule – also in the portal. With SQL Data Sync the tables are created on the member servers without FK constraints if the corresponding PK table isn’t being included in the Synchronization group. I won’t go over all the details here – that’s just enough so that you can broadly compare this architecture to vanilla replication and see how I did the tests below. I’ll do other articles on setting it all up, making schema changes, logging and other topics later on. For now I want to concentrate on the different data change use cases, the conflict resolution methodology and the behavior we should expect. Remember that conflicts in merge replication and peer-to-peer transactional replication are sometimes resolved in strange ways and attempted conflict resolution can cause agent failures. Let’s see how the SQL Data Sync compares….

In my setup I included 2 simple Northwind tables – Regions and Territories.

For some conflict tests I needed a single table so only one of the two was included. I specified that in any conflict resolution the Hub would win over the Member. These are the scenarios I examined…

Update-Update

Here an update of the territorydescription on the Hub simply overwrites a different territorydescription update made on the Member. The Hub was set to win in conflicts so that is as expected.

Update-Delete

A Territories record was deleted on the Hub and the same record updated on the Member. The delete on the Hub removes the updated record on the Member. If we did the inverse – the update was done on the Hub and the delete at the Member – the updated record is reinserted on the Member. This is interesting and potentially very useful as it means that updates can be used to replace missing records. I’ll do another separate article soon on how records can become “missing” using SQL Data Sync!

Failed Syncs
(a) Failed Insert at Hub

The insert from Member could not be applied at the Hub. For this test I just used the Territories table in the Sync Group, but the Hub had both Territories and Region tables with a FK constraint between them. In this case an insert of a Territory on the Member fails on the Hub because the associated Region record (10) is missing:

Insert Into Territories Values (‘xxx’,’bbb’,10)

The error message below is returned.

Surprisingly, this is not resolved with a delete of the offending record at the Member. From the message we don’t know which is the problem record but remember that these changes are recorded. In this case the insert trigger will have written a row to the tracking table. To find the problem record we can examine the Territories_dss_tracking table and look at the latest records which haven’t been synced.

SELECT *
FROM [DataSync].[Territories_dss_tracking]
where scope_update_peer_key is null
order by last_change_datetime desc

Then we delete the offending record from the Territories table at the Member database, resync and all is well.

 

(b) Failed Delete at Hub

For this test I restricted the Sync Group to only include the Region table. Both tables were still on the Hub with a FK relationship between them, but only the Region table was on the Member. Deleting the “Eastern” region record on the Member syncs back to the Hub as a failed Sync due to a FK violation as there are related Territories. Again, the record is not reinserted at the Member and you’ll have to try to find the details of the offending delete and manually sort it out.

 

SELECT *
FROM [DataSync].[Region_dss_tracking]
where sync_row_is_tombstone = 1
order by last_change_datetime desc

Interestingly I had assumed that I’d have to edit the tracking table data. I’d assumed that a reinsert of the “Eastern” region at the Member would fail because syncing would try to add a duplicate “Eastern” region at the Hub. However that is not the case. You can reinsert at the Member, sync and all is ok. In fact you could add a region called “East” with a PK of 1 on the Member and this works fine and syncs up as an update at the Hub.

Conclusions

Hopefully you can get a sense of how data changes are being propagated. As I mentioned above it is quite distinct from peer-to-peer transactional and merge replication conflict behavior. In some cases this might work out of the box in an “everyone can change everything” sort of topology, but I’d expect not and you’ll need to understand what the logged messages mean and how to fix any data non-convergence afterwards. I’m hoping that this article will help in this sort of case!

Filling merge identity ranges – unexpectedly!

Suppose you do a simple single row insert on a replicated table at the subscriber and find it fails with the message below:

The insert failed. It conflicted with an identity range check constraint in database ‘TestSub’, replicated table ‘dbo.tCity’, column ‘ID’.
If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange;
for the Subscriber, run the Distribution Agent or the Merge Agent.
The statement has been terminated.

The table is merge-replicated and we are using automatic range management of the identity column. OK – this might be normal behaviour. But what if we know we haven’t filled the whole range????

Running the code below returns the top city added.

select top 1 * from [dbo].[tCity]
order by id desc

While the merge check constraint on the table is:

CHECK NOT FOR REPLICATION (([ID]>(24501) AND [ID]<=(25001) OR [ID]>(25001) AND [ID]<=(25501)))

So – 24839 should be the next ID, is fine according to the check constraint and yet it is failing! Why?

According to select IDENT_CURRENT( ‘tCity’ ) the current value is 25504 so that is consistent with the check constraint complaining.

Well – this could be related to an insert that failed as explained in a previous article here. If this is not relevant, you should check the SQL Error log – if the SQL Service was unexpectedly restarted, the identity cache will have been flushed. The identity cache is a set of values held in memory for optimisation purposes. If SQL restarts unexpectedly all the unused identity values in the cache are lost, and SQL errs on the side of caution and resets the current identity value to be the maximum of the values it lost. You can find out more about this in the details of trace flag 272 here . We can configure it at the instance level using this trace flag or at the database level for SQL 2017 using scoped configurations.

If you want to play around with a repro there are a few around but I found this one easy – just implement replication on top of the setup.

In the case above we have 2 options – we can run the merge agent which will assign a new range, or we can reset the current identity value manually using DBCC CHECKIDENT. Remember we are talking about the subscriber here. If this is the publisher and the current identity value is the max of the current range we are lucky because the merge trigger will assign a new range for us directly. Otherwise we have to again run the merge agent, reset manually, or run sp_adjustpublisheridentityrange.

Thanks to Dave Wilce for pointing this one out.

SQL 2017 Distributor Changes

 

There’s not much new stuff in SQL 2017 for the replication administrator. One thing I did notice though is the distributor properties have changed. You can see 2 new options below:

Using sp_helpdistributor (or select * FROM msdb.dbo.MSdistributiondbs) you can more clearly see that there are 2 options to control deletions.

Remember that we retain the commands and transactions in the distribution database, so we’d expect that the deletions must relate to these records? There’s no mention of these parameters in the Books Online help guide so I did a little digging to confirm where they are used.

The distribution cleanup job executes dbo.sp_MSdistribution_cleanup.

This proc , through nested procs, calls:

(1) sp_MSdelete_publisherdb_trans

DELETE TOP(@deletebatchsize_commands) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands

(2) sp_MSdelete_dodelete

delete TOP(@deletebatchsize_transactions) MSrepl_transactions WITH (PAGLOCK) from MSrepl_transactions

So – we previously only had a cutoff point to use in deleting old transactions. When there are millions of commands in this table it is time-consuming and prone to blocking issues. Now we have a neat way of doing batch deletes. I’ve recently been working with some tables having a billion rows where batching and decent indexing is essential so I can see exactly why this has been introduced.

Reducing merge traffic using sp_addmergearticle

OK – there are a few ways we might try to reduce the amount of merge traffic. The most common in my experience is by using filters. I’ll do a whole other post on static and dynamic filters and join filters later on. I used them a lot in the seismic industry to ensure only data relevant to each ship is sent down. Here I introduce 2 other interesting methods controlled in sp_addmergearticle.

(1) @subscriber_upload_options

This is the easiest. We are going to look at stopping some of the traffic going from the subscriber to the publisher. In fact we’ll look at how it can be stopped from being recorded all together.

The option is exposed in the SSMS gui when editing the publication articles. Slightly confusingly it is represented in 2 ways. You can select the checkbox below:

Or you select to set the “synchronization direction”. This way you’ll see all the available options. I’ll follow the numbering in sp_addmergearticle to be consistent. Option “0” is Bidirectional and is the default we are used to. Option “2” is identical to the checkbox above. Option “1” download-only but allowing subscriber changes – is the final option. So we have 3 options and only options 1 and 2 are anything different from default merge behavior.

If you use the checkbox or options 1 or 2, you’ll notice that the table icons are grayed out showing that there is some difference to the normal merge articles – can be quite handy for a quick check.

Remember that merge is all about triggers. So triggers are again used to enforce this at the subscriber.

These 2 new behaviors are set up by (option 1) having no triggers – we allow all changes at the subscriber but don’t record them in merge metadata tables:

Or (option 2) if we prohibit subscriber changes, we have no need to record them because there aren’t any. In this case there will be a new trigger at the subscriber which has the format below – to rollback any attempted changes.

CREATE trigger [dbo].[MSmerge_downloadonly_0584B82B40C543E8A8BF02D2AD6F3A11] on [dbo].[CitiesDownloadOnlyProhibit] for update, insert, delete
not for replication
as
set nocount on

if @@trancount > 0 rollback tran
raiserror (20063, 16, -1, ‘[dbo].[CitiesDownloadOnlyProhibit]’)

return

There’s a slight gotcha with all this. You might set it up and find that it all behaves like regular Bidirectional. You recheck the article properties, script it out etc and convince yourself that it is set up as option 1 or 2 though! It’s not a bug. In this case you have set up the subscription incorrectly. Download-only articles only work if the subscription is a “Client” one. The “Server” subscription is the default though so be aware of this.

(2) @delete_tracking

Unlike the subscriber_upload_options this is one of those properties we have to set as a parameter to sp_addmergearticle as it is not exposed in the SSMS publication properties GUI. I guess it is seen as a rarely used option.

So, when delete_tracking is set to false, rows deleted at the Subscriber are not removed at the Publisher, and rows deleted at the Publisher are not removed at the Subscriber.

This is implemented slightly differently. There is still a delete trigger – it just doesn’t add a record to msmerge_tombstone – so on sync there is no record of the delete.

(3) @partition_options

TBA

NB!!!

At this point it’s worth mentioning that the options (1) and (2) fly in the opposite direction of what we are normally trying to ensure – here we are creating non-convergence on purpose and when it comes to reinitialization we’ll lose these differences. Quite unlike filtering, which results in different data at the publisher and subscriber, but reinitializes the data without any losses. So we have to be aware that this is really for non-persistant data only.

Dropping Replicated Tables

I recently upgraded some of our SQL 2014 servers to SP2. When going through the list of hotfixes I came across this gem: “This article describes an improvement that allows a table that’s included as an article in transactional replication publication to be dropped from the database and the publication(s).”

You might have tried at some time to drop a replicated table. You’ll see this type of message…

Remember in SQL 2000 we used to have a similar case when altering columns on replicated tables unless we used sp_repladdcolumn and sp_repldropcolumn. Then along came the @replicate_ddl argument for sp_addpublication that made our lives easier and we could then make changes seamlessly. Well, there is now a similar possibility for dropping replicated tables:

exec sp_changepublication @publication = ‘<Publication Name>’, @property = ‘allow_drop‘, @value = ‘true’

There are some caveats to be aware of:

  • The publisher needs to be running SQL Server 2016 SP1 or 2014 SP2.
  • If the same article exists in more than one transactional publication, you have to enable this feature on all of them or you’ll still get the error that a table is in replication and can’t be dropped.
So, once you have set it up you can run the following type of statement with no errors:
drop table [dbo].[City]
Even if it is the last article in the publication and you have 100 subscribers this is still no problem.
The table is removed from the publication articles, deleted at the publisher and remains at the subscribers.
To see if it has been enabled you can run
exec sp_helppublication ‘mainpublication’
There is a new column returned “allow_drop” which will tell you it’s there.
Unfortunately if you need to check this programatically there isn’t a nice column in syspublications which shows the value. It is hidden in the options bitmap. I suppose this was the easiest place to put it – hopefully it’ll be explicit in SQL 2017. The value to look for is
0x80 (Books Online is not yet updated with this value or the “allow_drop” in sp_helppublication (as of 13th Sep 2017)).

Dropping Transactional Articles

In MSDN in the article “Add Articles to and Drop Articles from Existing Publications” there is a section which says “For snapshot or transactional publications, articles can be dropped with no special considerations prior to subscriptions being created. If an article is dropped after one or more subscriptions is created, the subscriptions must be dropped, recreated, and synchronized.”. This is pretty scary and sounds almost like a reinitialization is needed!

Let’s see what happens when we use SSMS. If you drop an article from the publication you’ll see the following dialogue:

“This will not impact existing subscriptions” sounds reassuring, but is slightly ambiguous – perhaps it is only referring to the invalidation of the snapshot. However there is clearly no mention that “the subscriptions must be dropped, recreated, and synchronized”. If you click on “Yes” you’ll find that the next time you start the distribution agent all runs fine. There will be some orphaned objects left on the subscriber – stored procedures related to the article and the like – but nothing to worry about. So perhaps the MSDN article is incorrect?

Not really! The SSMS interface is clever enough to do the hard work for you. If you try to drop the article yourself using this type of script:

EXEC sp_droparticle
@publication = ‘pubtestarticle’,
@article = ‘tcity3’,
@force_invalidate_snapshot = 1;
GO

You’ll receive an error:

Msg 14046, Level 16, State 1, Procedure sp_MSrepl_droparticle, Line 286 [Batch Start Line 0]
Could not drop article. A subscription exists on it.

What you need to do is to manually drop the subscriptions it has and then you can drop the article.

EXEC sp_dropsubscription ‘pubtestarticle’,’tcity3′,’desktop-nsonkmo’,’transub’

This is what SSMS does behind the scenes and this is all the MSDN article is really telling us – there is no need to reinitialize, just drop the relevant subscriptions and then the article and then synchronize.

Dropping replicated tables…..impossible?

Did you ever try to drop a replicated table? Of course this is disallowed! It was always that way right?

If you do try it what you’ll see is a variation on the following message:

Well actually this has now changed and you can merrily drop the table.

It’s not enabled by default but if you run the following you’ll be permitted to do it.

EXEC sp_changepublication
@publication = ‘yourpublicationname’,
@property = N’allow_drop’,
@value = ‘true’
GO

If when you run the above you get a nasty error message as follows:

Msg 21183, Level 16, State 1, Procedure sp_MSrepl_changepublication, Line 274 [Batch Start Line 0]
Invalid property name ‘allow_drop’.

……that means that you have other problems: your SQL server is not up to date! Naughty DBA! This facility came in with SQL 2014 Service Pack 2  and SQL 2016 Service Pack 1. Sorry but nothing doing for SQL 2012/2008R2/2008!

Once it is set, you can happily drop tables in the publication database and the replication agents won’t error. The log reader will clean up the metadata related the article and ignore any remaining unprocessed commands related to the table. It also removes the article from the publication. The distribution agent processes the article records already sent to it by the log reader and when it reaches the cleanup command it has completed the processing for this article.

Note that the distribution agent doesn’t report the fact that the drop table has been processed during a synchronization. So we will not really be aware this has happened. Also the distribution agent doesn’t send the “Drop” statement down to the Subscriber either so the table is left there intact. This is quite unlike other DDL changes which keep the publisher and subscriber in sync. It’s safer this way but it’s a pity an additional option to send the delete doesn’t exist.

Finally, once it is set you’ll not see it so easily in syspublications. Another explicit column for this setting would have been nice but instead it is hidden in the logic of the options bitmap. However you can run the following to check if it is enabled:

EXEC sp_helppublication @publication = ‘yourpublicationname’

And then you’ll have a clear report of the flag setting….