Replication and Temporal tables

Background

As a bit of background, there are a few good articles out there relating to problems and solutions when mixing replication and temporal tables. These articles are all related to getting temporal history onto the subscriber. Here we’ll go into that a bit and look at other options like getting source data onto the subscriber and how they relate.

Temporal History at the Subscriber

At first before doing any research I thought that you could replicate the temporal history table as an indexed view to the subscriber but although this is possible, I realised that this isn’t a great solution! This is because of the way you are meant to interact with temporal tables.
There are several TSQL clauses eg “FOR SYSTEM_TIME ALL” to help you get the data you need, and these effectively union the source table and history table data then apply filters to the whole data.
If you don’t use these TSQL clauses and instead just select data from the history table directly you’ll see that inserts into the source table aren’t registered in the history table (because the union mentioned above means that this particular data comes from the source table), so after replication of the history table you have only a partial view of changes sent to the subscriber ie you’ll see the deletes and updates only.

As this approach doesn’t provide all the info we need, the general solution is to not have temporal data on the publisher and make the subscriber table versioned, replicate the source data and have SQL Server generate the history there. Good article here:
https://www.mssqltips.com/sqlservertip/5281/sql-server-replication-for-temporal-tables/.

Source (versioned) Table at the Subscriber

If the problem is pretty much the opposite – you need a copy of the system-versioned temporal table at the subscriber rather than the history – the solution is easier. First of all , you can not replicate a system-versioned temporal table – it is one of the restrictions in place when dealing with temporal tables:

However in this case indexed views does provide us with a viable solution.

The method is that we create an indexed view on the publisher and add it to a publication.

If you haven’t done this before you’d assume that we can go ahead and publish this indexed view as though it was a table. However if you do that only the view definition is replicated (as an empty table) and there is no consequent flow of data. This behaviour is not properly controlled in SSMS – it is one of those cases where the full article properties are not exposed in the GUI. To get the data flowing, we script out the publication and manually update the article parameter in sp_addarticle:

@type = N’indexed view logbased’

Then we create the replication publication from this script and it’s ready to go.

Once this table is created on the subscriber, you’ll find that there are no keys or constraints or indexes, so you might want to add a clustered index on the PK and add any other non-clustered indexes which existed on the source table. This could be done manually, or ideally in a post-snapshot script.

Source (versioned) Table and Temporal Table at the Subscriber

As a final point, if you really must have both the source and history tables on both environments, theoretically you could achieve this using 2 indexed views – one of the source table and one of the history table.
However although the data would all be there at the subscriber, you’d need to roll your own version of all the temporal TSQL clauses like “FOR SYSTEM_TIME ALL” because the source table isn’t truly a versioned one.

To be able to use all the TSQL history-related clauses on the publisher and the subscriber, the simplest way is to set up a read-replica on an Always On Availability Group.

SQL 2017 Distributor Changes

 

There’s not much new stuff in SQL 2017 for the replication administrator. One thing I did notice though is the distributor properties have changed. You can see 2 new options below:

Using sp_helpdistributor (or select * FROM msdb.dbo.MSdistributiondbs) you can more clearly see that there are 2 options to control deletions.

Remember that we retain the commands and transactions in the distribution database, so we’d expect that the deletions must relate to these records? There’s no mention of these parameters in the Books Online help guide so I did a little digging to confirm where they are used.

The distribution cleanup job executes dbo.sp_MSdistribution_cleanup.

This proc , through nested procs, calls:

(1) sp_MSdelete_publisherdb_trans

DELETE TOP(@deletebatchsize_commands) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands

(2) sp_MSdelete_dodelete

delete TOP(@deletebatchsize_transactions) MSrepl_transactions WITH (PAGLOCK) from MSrepl_transactions

So – we previously only had a cutoff point to use in deleting old transactions. When there are millions of commands in this table it is time-consuming and prone to blocking issues. Now we have a neat way of doing batch deletes. I’ve recently been working with some tables having a billion rows where batching and decent indexing is essential so I can see exactly why this has been introduced.

Dropping Replicated Tables

I recently upgraded some of our SQL 2014 servers to SP2. When going through the list of hotfixes I came across this gem: “This article describes an improvement that allows a table that’s included as an article in transactional replication publication to be dropped from the database and the publication(s).”

You might have tried at some time to drop a replicated table. You’ll see this type of message…

Remember in SQL 2000 we used to have a similar case when altering columns on replicated tables unless we used sp_repladdcolumn and sp_repldropcolumn. Then along came the @replicate_ddl argument for sp_addpublication that made our lives easier and we could then make changes seamlessly. Well, there is now a similar possibility for dropping replicated tables:

exec sp_changepublication @publication = ‘<Publication Name>’, @property = ‘allow_drop‘, @value = ‘true’

There are some caveats to be aware of:

  • The publisher needs to be running SQL Server 2016 SP1 or 2014 SP2.
  • If the same article exists in more than one transactional publication, you have to enable this feature on all of them or you’ll still get the error that a table is in replication and can’t be dropped.
So, once you have set it up you can run the following type of statement with no errors:
drop table [dbo].[City]
Even if it is the last article in the publication and you have 100 subscribers this is still no problem.
The table is removed from the publication articles, deleted at the publisher and remains at the subscribers.
To see if it has been enabled you can run
exec sp_helppublication ‘mainpublication’
There is a new column returned “allow_drop” which will tell you it’s there.
Unfortunately if you need to check this programatically there isn’t a nice column in syspublications which shows the value. It is hidden in the options bitmap. I suppose this was the easiest place to put it – hopefully it’ll be explicit in SQL 2017. The value to look for is
0x80 (Books Online is not yet updated with this value or the “allow_drop” in sp_helppublication (as of 13th Sep 2017)).

Dropping Transactional Articles

In MSDN in the article “Add Articles to and Drop Articles from Existing Publications” there is a section which says “For snapshot or transactional publications, articles can be dropped with no special considerations prior to subscriptions being created. If an article is dropped after one or more subscriptions is created, the subscriptions must be dropped, recreated, and synchronized.”. This is pretty scary and sounds almost like a reinitialization is needed!

Let’s see what happens when we use SSMS. If you drop an article from the publication you’ll see the following dialogue:

“This will not impact existing subscriptions” sounds reassuring, but is slightly ambiguous – perhaps it is only referring to the invalidation of the snapshot. However there is clearly no mention that “the subscriptions must be dropped, recreated, and synchronized”. If you click on “Yes” you’ll find that the next time you start the distribution agent all runs fine. There will be some orphaned objects left on the subscriber – stored procedures related to the article and the like – but nothing to worry about. So perhaps the MSDN article is incorrect?

Not really! The SSMS interface is clever enough to do the hard work for you. If you try to drop the article yourself using this type of script:

EXEC sp_droparticle
@publication = ‘pubtestarticle’,
@article = ‘tcity3’,
@force_invalidate_snapshot = 1;
GO

You’ll receive an error:

Msg 14046, Level 16, State 1, Procedure sp_MSrepl_droparticle, Line 286 [Batch Start Line 0]
Could not drop article. A subscription exists on it.

What you need to do is to manually drop the subscriptions it has and then you can drop the article.

EXEC sp_dropsubscription ‘pubtestarticle’,’tcity3′,’desktop-nsonkmo’,’transub’

This is what SSMS does behind the scenes and this is all the MSDN article is really telling us – there is no need to reinitialize, just drop the relevant subscriptions and then the article and then synchronize.

Dropping replicated tables…..impossible?

Did you ever try to drop a replicated table? Of course this is disallowed! It was always that way right?

If you do try it what you’ll see is a variation on the following message:

Well actually this has now changed and you can merrily drop the table.

It’s not enabled by default but if you run the following you’ll be permitted to do it.

EXEC sp_changepublication
@publication = ‘yourpublicationname’,
@property = N’allow_drop’,
@value = ‘true’
GO

If when you run the above you get a nasty error message as follows:

Msg 21183, Level 16, State 1, Procedure sp_MSrepl_changepublication, Line 274 [Batch Start Line 0]
Invalid property name ‘allow_drop’.

……that means that you have other problems: your SQL server is not up to date! Naughty DBA! This facility came in with SQL 2014 Service Pack 2  and SQL 2016 Service Pack 1. Sorry but nothing doing for SQL 2012/2008R2/2008!

Once it is set, you can happily drop tables in the publication database and the replication agents won’t error. The log reader will clean up the metadata related the article and ignore any remaining unprocessed commands related to the table. It also removes the article from the publication. The distribution agent processes the article records already sent to it by the log reader and when it reaches the cleanup command it has completed the processing for this article.

Note that the distribution agent doesn’t report the fact that the drop table has been processed during a synchronization. So we will not really be aware this has happened. Also the distribution agent doesn’t send the “Drop” statement down to the Subscriber either so the table is left there intact. This is quite unlike other DDL changes which keep the publisher and subscriber in sync. It’s safer this way but it’s a pity an additional option to send the delete doesn’t exist.

Finally, once it is set you’ll not see it so easily in syspublications. Another explicit column for this setting would have been nice but instead it is hidden in the logic of the options bitmap. However you can run the following to check if it is enabled:

EXEC sp_helppublication @publication = ‘yourpublicationname’

And then you’ll have a clear report of the flag setting….

 

Replicating to Memory-Optimised Tables

In SQL 2016 you might have wondered about the strange new setting when adding a new subscription:

This is the option for “Memory Optimised” tables and applies to Transactional and Snapshot subscriptions only. Memory Optimised tables are a newish (SQL 2014 and 2016) design feature which can be really useful for reporting purposes, and this fits in nicely with the use of many Transactional Replication systems, so it’s definitely something worth taking a look. As an additional bonus in SQL 2016 it’s also available in any edition.

Just a quick clarification that we are talking about replication subscribers here, not publishers. We can’t replicate memory optimised tables from the publisher – we can only create them at the subscriber.

First of all I’ll mention that selecting the option above when adding a subscription on its own does nothing! You’ll just get a normal table on the subscriber and we need to do a little more to get this to work.

You need to take a look at each table in the publication and set the article properties there. The 2 options you’ll need are shown below.

If you set “Enable Memory Optimisation” to true and leave the second option as false which is the default, most likely this will fail. This is because usually Primary Keys are created with a clustered index but Memory Optimised tables don’t play happily with clustered indexes. If your primary keys have a non-clustered index and there are no other clustered indexes on the table you’ll be ok, but this is a pretty unlikely scenario. Anyway, the wizard will check for you and you’ll get the following error if you’ve omitted the second option and it was required.

Really there’s no reason I can think of to leave the second option set to false.

So, having set the article properties and the subscription properties we are now almost ready to roll.

One final check – is the subscriber database set up to accommodate memory optimised tables? You can take a look in SSMS or run the following query replacing the “Northwind” with your database name to see:

select * from  [Northwind].sys.filegroups where type = ‘FX’

If nothing is returned, adapt and run the following replacing “Northwind” with your database name:

ALTER DATABASE northwind ADD FILEGROUP northwind_mod CONTAINS MEMORY_OPTIMIZED_DATA
ALTER DATABASE northwind ADD FILE (name=’northwind_mod2′, filename=’c:\data\northwind_mod2′) TO FILEGROUP northwind_mod
ALTER DATABASE northwind SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON

At this stage we are good to go. Create the subscription and after initialization run the following at the subscriber to check for any memory optimised tables.

SELECT OBJECT_NAME(object_id) as TableName, * FROM sys.dm_db_xtp_table_memory_stats;  

All being well you’ll get a row returned like the following and we’re done :).

 

Distribution agents and another reason to not use continuous schedules

Well strictly speaking this is a post to remind you to not accept the defaults when using continuous schedules. Although the problem I’ll describe never arises when you use a repeating schedule.

So – first of all are you running the distribution agent continuously? Remember back to when you set up the subscription there was an option mentioning this…

Can’t remember? Just take a look at the job schedule of the Distribution Agent. You’ll see the description will tell you “Start automatically when SQL Server Agent starts”.

This means a continuous schedule. It is the default and therefore is often the one selected. Also it provides the lowest latency so it seems like the obvious choice. However what if you ever hit a data consistency error where someone has messed up the subscriber data. It happens! If the replication command cannot be applied to the subscriber the job runs, retries, retries, retries and so on. No error is reported. Things just seem to hang. Perhaps it is the report user who tells us that the data seems stale, but being professionals we should know this before them!

The setting we need to be aware of is the one on the job step which runs the distribution agent. It is set to retry indefinitely by default. Well actually 2147483647 times but this amounts to indefinitely. Pretty soon the subscriber will default and have to be reinitialized if we don’t spot that there is a problem.

How does this look with a repeating schedule? Is it any different? Here it is:

Far more reasonable and after 10 mins we have our error, receive a pagerduty alert and then connect up and sort out the problem. Why is the number of retries different here compared to that set in a continuous schedule? Absolutely no reason I know of, and every system I’ve worked on I reset them as soon as I realise people are using continuous.

So – either reset to something reasonable (10 seems ok) or use a repeating schedule.

Solving data consistency errors and the need for monitoring….

Let’s consider what happens when transactional replication fails. Say there’s been a dodgy release and the application is now pointing at a replication subscriber where data changes have happened. Realising the problem, the application is re-pointed at the publisher database and when we start the distribution agent our problems begin to surface. We’ll see some info about retrying commands and after 10 retries the distribution agent will fail. I’m assuming you have the distribution agent on a timed schedule and there’ll be another post to explain why this is a good idea :). Failed inserts and updates are the usual issues we’ll see. This has happened to me in a production environment and there are a few ways of dealing with it:

(1) reinitialize. Most safe option but not always possible in production until we can negotiate some downtime. Sometimes we just need to get things working.

(2) use sp_setsubscriptionxactseqno. We just need to get hold of the transaction number (xact_seqno) of the most recent failing command and tell the distribution agent to skip it and move to the next command. We could use the -Output parameter in the distribution agent job (see here). This used to be a common way of getting xact_seqno numbers but the sp_helpsubscriptionerrors stored proc is much easier:

sp_helpsubscriptionerrors @publisher = ‘DESKTOP-NSONKMO’,
@publisher_db = ‘testtran’,
@publication = ‘ptesttran’,
@subscriber = ‘DESKTOP-NSONKMO\PAULSINST’,
@subscriber_db = ‘testtran’

The proc returns enough info to help us understand what has happened eg “Violation of PRIMARY KEY constraint ‘PK_tRegion’. Cannot insert duplicate key in object ‘dbo.tRegion’. The duplicate key value is (3).”. The main thing we need though is the xact_seqno which is also returned.

Then we run the following stored procedure at the Subscriber on the subscription database:

sp_setsubscriptionxactseqno @publisher = ‘DESKTOP-NSONKMO’
, @publisher_db = ‘testtran’
, @publication = ‘ptesttran’
, @xact_seqno = 0x00000022000005FF000300000000

Note that this can be a pain if there are a lot of errors. Each time you’ll skip the last tried command and then find the next xact_seqno and then skip that one. If there are a lot of errors you could be doing this for a while.

(3) we can set the distribution agent profile to skip all these type of data consistency errors – missing rows and already existing rows.

In fact this is what the previous admin had set as a default profile on the system I was looking at. Most often it is the fastest way to get things up and running and it avoids the iterative approach of sp_setsubscriptionxactseqno.

The issue I have is once this is set, there is no error recorded in the distribution agent while it happily skips commands which don’t work.

We should of course run regular validations, but in the meanwhile, if you are using this profile, what you need is a separate job which alerts you when commands have been skipped, and then you can take action. Just create a regular job which checks for consistency errors and sends the email if any are found. The syntax for the check is below:

SELECT id, time, error_text ,
error_code
FROM MSrepl_errors msre
WHERE msre.id IN (SELECT error_id
FROM MSdistribution_history)
and error_code in (‘2601′,’2627′,’20598’)
ORDER by msre.time desc

MSrepl_errors.error_code is of type sysname and may contain non-integer values. (for example: ‘HYT00′ and ’08S01′). So your query should specify the error values as string literals to avoid error:

Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the nvarchar value ’08S01’ to data type int.