SQL 2017 Distributor Changes

 

There’s not much new stuff in SQL 2017 for the replication administrator. One thing I did notice though is the distributor properties have changed. You can see 2 new options below:

Using sp_helpdistributor (or select * FROM msdb.dbo.MSdistributiondbs) you can more clearly see that there are 2 options to control deletions.

Remember that we retain the commands and transactions in the distribution database, so we’d expect that the deletions must relate to these records? There’s no mention of these parameters in the Books Online help guide so I did a little digging to confirm where they are used.

The distribution cleanup job executes dbo.sp_MSdistribution_cleanup.

This proc , through nested procs, calls:

(1) sp_MSdelete_publisherdb_trans

DELETE TOP(@deletebatchsize_commands) MSrepl_commands WITH (PAGLOCK) from MSrepl_commands

(2) sp_MSdelete_dodelete

delete TOP(@deletebatchsize_transactions) MSrepl_transactions WITH (PAGLOCK) from MSrepl_transactions

So – we previously only had a cutoff point to use in deleting old transactions. When there are millions of commands in this table it is time-consuming and prone to blocking issues. Now we have a neat way of doing batch deletes. I’ve recently been working with some tables having a billion rows where batching and decent indexing is essential so I can see exactly why this has been introduced.

Dropping Replicated Tables

I recently upgraded some of our SQL 2014 servers to SP2. When going through the list of hotfixes I came across this gem: “This article describes an improvement that allows a table that’s included as an article in transactional replication publication to be dropped from the database and the publication(s).”

You might have tried at some time to drop a replicated table. You’ll see this type of message…

Remember in SQL 2000 we used to have a similar case when altering columns on replicated tables unless we used sp_repladdcolumn and sp_repldropcolumn. Then along came the @replicate_ddl argument for sp_addpublication that made our lives easier and we could then make changes seamlessly. Well, there is now a similar possibility for dropping replicated tables:

exec sp_changepublication @publication = ‘<Publication Name>’, @property = ‘allow_drop‘, @value = ‘true’

There are some caveats to be aware of:

  • The publisher needs to be running SQL Server 2016 SP1 or 2014 SP2.
  • If the same article exists in more than one transactional publication, you have to enable this feature on all of them or you’ll still get the error that a table is in replication and can’t be dropped.
So, once you have set it up you can run the following type of statement with no errors:
drop table [dbo].[City]
Even if it is the last article in the publication and you have 100 subscribers this is still no problem.
The table is removed from the publication articles, deleted at the publisher and remains at the subscribers.
To see if it has been enabled you can run
exec sp_helppublication ‘mainpublication’
There is a new column returned “allow_drop” which will tell you it’s there.
Unfortunately if you need to check this programatically there isn’t a nice column in syspublications which shows the value. It is hidden in the options bitmap. I suppose this was the easiest place to put it – hopefully it’ll be explicit in SQL 2017. The value to look for is
0x80 (Books Online is not yet updated with this value or the “allow_drop” in sp_helppublication (as of 13th Sep 2017)).

Replication Agents: Profiles and Parameters

OK – a few pointers to agent profiles, job parameters and such….

We usually find that our replication agents function quite nicely out of the box but at some stage we’ll need to tweak them. Perhaps they need to continue when an error is found, log more troubleshooting details to a file, have a longer query timeout and so on. This can be done by using one of the inbuilt profiles or by creating a custom one and setting up the values there. Either way we then assign the profile to the agent.

When we go down this route and take a look at agent profile parameters and compare to the list of agent executable parameters here it seems there is a big difference. Many are missing in our profile! First thing is to untick the checkbox at the bottom of the profile (below) and hey presto many more appear. Still not the full list of agent parameters, but 90% are there now. I’ll mention the missing 10% in a minute.

So, we create our profile, untick the checkbox,  set values and set the agent to use it. Now maybe we have 100s of agents and many different profiles. It is quite possible that local subscriptions are optimised to use a different profile than remote subscriptions which have flaky connections. Some agents should continue on data consistency errors while others should fail and alert us. And so on. This one-to-many relationship between the profiles and agents is not represented in the Replication Monitor -we have to go into each agent to see which profile it is using. You can use the query below to see what is what:

SELECT
[publication] as Publication
,c.srvname as SubscriberName
,b.profile_name as Profile
,[name] as DistributionJobName
FROM [distribution].[dbo].[MSdistribution_agents] a
inner join msdb.[dbo].[MSagent_profiles] b
on a.profile_id = b.profile_id
inner join master..sysservers c
on a.subscriber_id = c.srvid
order by b.profile_name

This returns a recordset which lists the agents and their profiles:

Remember that above I said 90% parameters are available when we untick the checkbox on the profile? There are some others such as “Output”, “Hostname”, “AltSnapshotFolder”, “UseInprocLoader” and a few more which have to be added to the job step directly as they don’t exist in the profile.

Now with this level of complexity – agents, different profiles, job step parameters and such we have a tricky situation. When an agent runs, which profile and parameters did it use, and what job values were used? We can use my query above to help, but perhaps the job parameter values have been changed since the job was run, and the agent might have been reset to use a different profile since then anyway, or the profile itself might have been changed. You can take a look in the distribution agent history but it is not detailed enough to help us and only gives basic info. To find out what our agent really used when it ran we need to check the job history.

You can see below that the job history separates the output into 2 logical sets of parameters – “user-supplied” and “from agent profile”. The profile name itself is not mentioned here (pity) but the particular values it used are, which anyway is the main level of detail we need.

One other point to note is that in the above example I’ve highlighted in blue the QueryTimeout parameter which has a value of 7000 seconds. Why? Because as an experiment I had added it to the agent job step and it was also in the profile with a different value of 3600 seconds. Now we see that the job step parameter takes priority and overrides the profile value when they are both entered. Really we should use one or the other and not mix the 2 methods and generally I’d advise to use the profile as it is more transparent.

I look at this output to confirm the values being used are the ones I expected to see. There are a few reasons for this. I might have many agents using a particular profile and just want to change one agent on an ad-hoc basis (and remove the parameter afterwards!), or I want to investigate why my custom profile value is not being applied or the parameter value is in the 10% not available in the profile and I just want to check my input value is being used…

Credentials and Proxies and why bother using them?

When we are setting up a subscriber there is a screen where we decide on the agent security.

The three dots ellipsis button above takes us to a detailed setup form. Most often I think DBAs select the highlighted option in the security form below.

Certainly the installations I’ve seen imply that this has always been chosen. This was the only possible setup used in SQL 2000 and despite the warning that “this is not a recommended security best practice” it is still definitely the easiest option.

In this scenario the distribution.exe or merge.exe is being run by the windows account that SQL Server Agent is set to use. This account is impersonated to give access to the distributor and the subscriber. Often the SQL Agent account is a sysadmin on the publisher and a local windows administrator. Sometimes this exact same account is also a sysadmin and local admin on the subscriber and in some extreme cases it is a domain admin. I’m not recommending any of these – just pointing out what seems commonplace. This is the crux of the issue as Microsoft always recommend using a least privilege user account for services.

But why bother with this least privilege philosophy for replication? Well, things can be run through the replication engine – nasty things. The most obvious example is if you take a look at sp_addscriptexec. You have to be db_owner to run it but essentially a db_owner can become sysadmin through this means. Another example would be all those text files in the snapshot directory. Another would be the DDL change scripts. What this really amounts to is the possibility to implement SQL-injection via replication.

So – if you really want to lock it all down and ensure that the replication engine cannot do too much harm in the hands of a malicious user we need to consider using credentials and proxies and get away from impersonating the SQL Agent account. A credential refers to details of a windows user and password. This has reduced privileges compared to the SQL Agent service account. If the credential account and the SQL Agent account have the same level of access to SQL and to the OS then you are going down the wrong street!

A proxy is a reference to the credential to be used for a particular function – “Replication Distributor” above. You might want the same credential to be used for merge and transactional agents – so in this case one credential and 2 proxies. Both credentials and proxies are easy to set up in SSMS (above) and you’ll just need to create the credential first and then the proxy.

Once this is done the job step that runs the particular subscriber agent will have another entry in the drop down – the name of the Proxy account that you added. In the case below I select the second job step of the distribution job to use the “DistributorProxy” proxy account (sorry about my naming convention!).

You might wonder how this all relates to the option below in the subscription wizard which has the option to state an alternative Windows account. Seems similar?

Well, this does all the Proxy/Credential/Job configuration for you in the background! The only problem is that if you leave it to the wizard you’ll end up with a mess of duplicates and horrible proxy and credential names like the one below, so it’s definitely worth designing it yourself.

Replicating Indexes, Constraints, Triggers and so on

One thing we need to be sure of is that the subscriber and publisher “look” the same. This is from the DBA point of view. Are all aspects of the table the same at the subscriber?

We seem to be able to control this in the article properties. The defaults say that the FKs, Indexes, Triggers, Defaults etc go to the subscriber:

There is a related setting on the subscriber properties for the publication that says we want to replicate schema changes – by default set to true:

However, note that after initialization when we make a change on the publisher, not everything will go down. We can see what is ready to go in sysmergeschemachange. For the main changes we need to remember:

  • Replicated to the Subscriber: Primary Keys, Foreign Keys, Check Constraints, Default Constraints
  • Not Replicated to the Subscriber: Indexes, Triggers

So – be careful to manually keep the subscriber in sync when it comes to indexes and triggers!

When initializing do I need to drop FKs at the subscriber?

This is an interesting question. In the publication properties there is the option on the snapshot tab to run extra scripts before and after the snapshot is applied. This is the same for both merge and transactional publications (below).

Many DBAs will have scripts which drop all the FKs on the subscriber and readd them after the snapshot is applied so the initialization runs smoothly and we don’t get the following sort of error:

Could not drop object ‘dbo.tCity’ because it is referenced by a FOREIGN KEY constraint“.

However the snapshot generation is different between Transactional and Merge. In Transactional all the FKs are dropped for you and re-added later on. This doesn’t happen for merge. There’s probably a good reason for it but can’t see why they should run differently at present. Anyway, the message is that you don’t need to roll your own logic to deal with subscriber FKs in Transactional but you still do in merge!

Replication Backups

As a DBA we’re always considering disaster recovery, business continuity, backups etc. Replication is no different – we need to get a copy of the whole setup. Manually this can be done from SSMS:

This is OK if you remember to always do it when making changes.

In practice people forget or haven’t got the time.

So, the Powershell script below will create a text file and you can run this script as a job – easy!

########################################################################################################################
# Script created by:
# Date created:
# Changes:
# Purpose: Script creates the replication file for DR purposes
# Then it copies the file to the DR destination
# Example Usage: powershell -Command “C:\DBA\scripts\BackupReplicationScripts.ps1” -SourceReplicationFilename “\\servername\replication\Publications.sql” -DestinationReplicationFilename “\\servername2\replication\Publications.sql” -SQLServerWithPublications “mainserver,port” -$ComputerWithPublications “mainserver”
########################################################################################################################

param(
[Parameter(Position=0,mandatory=$true)]
[string]$SourceReplicationFilename,
[Parameter(Position=1,mandatory=$true)]
[string]$DestinationReplicationFilename,
[Parameter(Position=2,mandatory=$true)]
[string]$SQLServerWithPublications,
[Parameter(Position=3,mandatory=$true)]
[string]$ComputerWithPublications
)

Try
{
write-host “$SourceReplicationFilename”
write-host “$DestinationReplicationFilename”
write-host “$SQLServerWithPublications”
write-host “$ComputerWithPublications”
$ScriptBlockContent = {

param ( $SourceReplicationFilename, $SQLServerWithPublications, $ComputerWithPublications)

#Reference RMO Assembly
[Reflection.Assembly]::LoadWithPartialName(“Microsoft.SqlServer.Rmo”)
[Reflection.Assembly]::LoadWithPartialName(“Microsoft.SqlServer.Replication”)

function errorhandler([string]$errormsg)
{
writetofile (“Replication Script Generator run at: ” + (date)) $SourceReplicationFilename 1
writetofile (“[Replication Script ERROR] ” + $errormsg) $SourceReplicationFilename 0
write-host(“[Replication Script ERROR] ” + $errormsg) -Foregroundcolor Red
}
function writetofile([string]$text, [string]$myfilename, [int]$cr_prefix)
{
if ($cr_prefix -eq 1) { “” >> $myfilename }
$text >> $myfilename
}
function initializefile([string]$myfilename)
{
“” > $myfilename
}

$repsvr=New-Object “Microsoft.SqlServer.Replication.ReplicationServer” $SQLServerWithPublications

Clear-host

initializefile $SourceReplicationFilename

# if we don’t have any replicated databases then there’s no point in carrying on
if ($repsvr.ReplicationDatabases.Count -eq 0)
{
writetofile (“Replication Script Generator run at: ” + (date)) $SourceReplicationFilename 0
writetofile “ZERO replicated databases on $SQLServerWithPublications!!!” $SourceReplicationFilename 1
EXIT
}
# similarly, if we don’t have any publications then there’s no point in carrying on
[int] $Count_Tran_Pub = 0

foreach($replicateddatabase in $repsvr.ReplicationDatabases)
{
$Count_Tran_Pub = $Count_Tran_Pub + $replicateddatabase.TransPublications.Count
}
if (($Count_Tran_Pubb) -eq 0)
{
writetofile (“Replication Script Generator run at: ” + (date)) $SourceReplicationFilename 0
writetofile “ZERO Publications on $SQLServerWithPublications!!!” $SourceReplicationFilename 1
EXIT
}

# if we got this far we know that there are some publications so we’ll script them out
# the $scriptargs controls exactly what the script contains
# for a full list of the $scriptargs see the end of this script
$scriptargs =
[Microsoft.SqlServer.Replication.scriptoptions]::Creation `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IIncludePublications `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludePublicationAccesses `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeGo `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeArticles `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludePublisherSideSubscriptions `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeCreateSnapshotAgent `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeSubscriberSideSubscriptions `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::EnableReplicationDB `
-bor [Microsoft.SqlServer.Replication.scriptoptions]::IncludeCreateLogreaderAgent
writetofile (“– Replication Script Generator run at: ” + (date)) $SourceReplicationFilename 0
writetofile “– PUBLICATIONS ON $SQLServerWithPublications” $SourceReplicationFilename 1
writetofile “– TRANSACTIONAL PUBLICATIONS ($Count_Tran_Pub)” $SourceReplicationFilename 1
foreach($replicateddatabase in $repsvr.ReplicationDatabases)
{
if ($replicateddatabase.TransPublications.Count -gt 0)
{
foreach($tranpub in $replicateddatabase.TransPublications)
{
write-host “–********************************************************************************” -Foregroundcolor Blue
“–***** Writing to file script for publication: ” + $tranpub.Name
write-host “–********************************************************************************” -Foregroundcolor Blue
writetofile “–********************************************************************************” $SourceReplicationFilename 0
writetofile (“–***** Writing to file script for publication: ” + $tranpub.Name) $SourceReplicationFilename 0
writetofile “–********************************************************************************” $SourceReplicationFilename 0
[string] $myscript=$tranpub.script($scriptargs)
writetofile $myscript $SourceReplicationFilename 0
}
}
}

}

Invoke-Command -ComputerName “$ComputerWithPublications” -ScriptBlock $ScriptBlockContent -ArgumentList $SourceReplicationFilename, $SQLServerWithPublications, $ComputerWithPublications

Copy-Item $SourceReplicationFilename $DestinationReplicationFilename

}
Catch [Exception]
{
  write-host $_.Exception.Message $true
}
Finally
{

}

 

Dealing with large Snapshots

If you have poor network links (replicating to ships like I used to?) or you have large snapshots, or unluckily you have both together (been there!) you’ll need to tweak things a bit. Here are some options I’ve used:

(1) Local Initialization

This is one of the biggest time saving options. Chris Skorlinski explains the methodology here. If I can paraphrase….. You run the snapshot agent on the publisher and this creates a whole load of files in a snapshot folder. Transfer the folder to the subscriber using Robocopy or FTP. Set the distribution agent to use the -AltSnapshotFolder parameter. Strangely this can be done even if you have a push subscriber. You can run the distribution.exe / merge.exe on the subscriber and sync up locally.

(2) Use Compression

As a part of the above process, zip up the snapshot folder (use 7-Zip or such like if this is going to be >2GB).  Unzip at the subscriber. There is an option in the snapshot tab to do compression but this is really slow for large snapshots and using a 3rd party tool performs much better.

You’ll be amazed how much time this saves. I had one publication that takes a week to complete initialization and it was changed to complete in a single day.

Things that didn’t work….

As a final note I tested the use of the UseInprocLoader parameter of the Distribution Agent and the Merge Agent . This parameter causes the agent to use the BULK INSERT command when the snapshot is applied. I find no improvement in the agent synchronization times using this parameter. Also I tried to use the –MaxBCPThreads parameter of the Snapshot Agent, Merge Agent, and Distribution Agent. This parameter specifies the number of bulk copy operations that can be performed in parallel when the snapshot is created and applied. Once again I saw no improvement for  Transactional replication.