Metadata Cleanup Improvements in SQL Server 2005
This is a follow on article to The Hidden Cost of Metadata Cleanup in Merge Replication showing the advances that SQL2005 has made in this area.
More Metadata Tables in SQL2005
In SQL there are an ever increasing number of tables to store merge replication metadata. As well as our old favourites msmerge_contents, msmerge_tombstone and msmerge_genhistory that have existed since SQL Server 7.0, we potentially have 3 more important tables when pre-computed partitions are in use. No doubt this would lead you to believe that cleanup times would be increased over SQL2000.
Core Metadata tables required for merge replication:
Auxiliary Metadata tables required for precomputed partitions:
**MSmerge_generation_partition_mappings is not discussed in BOL, but is cleaned up by the stored procedure sp_mergemetadataretentioncleanup.
I immediately groaned at the thought of yet more metatdata to cleanup - potentially slowing down sync times over SQL2000, but I wanted to find out more about their uses and if the cleanup times really did increase.....
Anyway, to give you some idea of the size of these tables, I ran sp_spacesued on these tables on a production system with a lot of complex filtering in use.
New tables for pre-computed partitions
Behaviour Changes in Metadata Cleanup in SQL2005
When I started testing SQL2005 I was amazed at the lack of unwanted (and very frequent) cleanups compared to SQL 2000. These SQL 2000 cleanups can cause dramatically increased sync times with little practical benefit i.e. the facts are that the cleanup process is very IO and CPU intensive, even when it is doing very little cleaning up. Anyway, was I seeing this right, or was the SQL 2005 test environment fooling me? On further investigation we see a new field in sysmergesubscriptions holding the last datetime of metadatacleanup – things were looking promising.
SELECT metadatacleanuptime,* FROM sysmergesubscriptions WITH (NOLOCK)
Then running profiler confirmed the good news – here is an extract from one of the commands run during a sync
.....@dometadata_cleanup = case when sys.fn_add_units_to_date(-1, @retention_period_unit, getdate()) > metadatacleanuptime then 1....
Someone at Microsoft had decided to sort the problem – and decided to clean up data based on the (new feature) "retention units" set by the administrator (see below). Very neat and simple. Even better news is that the previous design that ran cleanup at the startup of every sync for scheduled merge, but only once at the very start of a continuous merge, has been dumped in favour of checking the cleanup time on every sync in both circumstances resulting in a much better and more consistent behaviour. In typical default circumstances (14 day retention period) cleanup is about once per day which is exactly the right outcome for most users most of the time, or every sync if the last sync was at least one day ago.
It’s likely that the Merge developers had little choice but to do some optimisation of metadata cleanup because purging those extra tables, with potentially very large numbers of records to scan, was proving just too expensive. Indeed, if you look at a sync using the profiler you can see other (undocumented) cleanup processes being called. The good news is that they did such a good job.
SQL 2005 retention period units
By the way it is now possible to have different values for the retention period units - the default is still days (the only option in SQL 2000), but you can also select week, month or year(!) if your publication compatibility level is at least 90RTM (according to BOL). If the publication compatibility level is less than 90RTM then ‘days’ is still the only option for retention period unit.
There is an error in BOL in this area as it only mentions a retention_period_unit of 0 = day, 1 = week, 2 = month and 3 = year but there appear to be ‘undocumented’ settings of 4 and 5 for hours and minutes, which are also available in the GUI. Why ‘minutes’ is available as a sensible merge retention period unit is beyond me, especially as the merge process allows 24 hours grace anyway before expiration.
Manually Invoking Metadata Cleanup without using sp_mergemetadataretentioncleanup (technical interest only!?)
If you want to cleanup more frequently than the default schedule (I cannot think of any obvious reasons) you could manually update the sysmergesubscriptions.metadatacleanuptime field to fool the system to think it has not been cleaned up recently.
On my test system (all subscriptions have same expiration period of 14 days) this manual cleanup works fine without any obvious problems – I set the last cleanuptime to be more than 1 day old and the system cleans up on next sync and sets the cleanuptime back to now. However, I recommend you use great care and testing if you decide to clean up more frequently than the system thinks is necessary with this mechanism on a production system. The usual provisos apply - updating system tables in this manner is totally unsupported my Microsoft – rightly so as you could put rubbish etc in them, and there may be other serious ramifications that I have not uncovered – use this technique at your own risk. The supported way is to use sp_mergemetadataretentioncleanup ‘manually’ – but this ‘setting back the date’ technique is easier to use in my opinion since the ‘approved’ stored procedure technique needs to be run twice, once at the publisher and once at the subscriber. Note if you are using subscribers earlier than SQL2000 SP1 check BOL and use sp_mergecleanupmetadata instead.
On the production system in question (without any manual tweaks/workarounds etc setup), sync times (on a 15 minute schedule) on a non filtered publication with small numbers of records to sync have dropped from 15 seconds with SQL 2000 to 2 seconds with SQL 2005, and system load is dramatically reduced. This is primarily due to the more sensible scheduling of metadata cleanup.
The Microsoft replication team has done a great job of optimising metadata cleanup in SQL2005. Workarounds that sometimes had to be used to increase system performance can be removed. In my case this new methodology saves at least 100,000,000 disk reads per day (yes I have got the right number of zeroes!) which was a total waste of resource.
Paul Ibison, Copyright Â© 2013