Metadata Cleanup
Improvements in SQL2005
Article by Nigel
Maneffa
This is a follow on article to
The Hidden Cost of Metadata Cleanup in Merge Replication
showing the advances that SQL2005
has made in this area.
More Metadata Tables in SQL2005
In SQL there are an ever increasing
number of tables to store merge replication metadata. As well as our old
favourites msmerge_contents, msmerge_tombstone and msmerge_genhistory that
have existed since SQL Server 7.0, we potentially have 3 more important
tables when precomputed partitions are in use.
No doubt this would lead you to believe that cleanup times would be
increased over SQL2000.
Core Metadata tables required for
merge replication:
MSmerge_contents (Transact-SQL)
MSmerge_tombstone (Transact-SQL)
MSmerge_genhistory (Transact-SQL)
Auxiliary Metadata tables required
for precomputed partitions:
MSmerge_current_partition_mappings
MSmerge_past_partition_mappings (Transact-SQL)
MSmerge_generation_partition_mappings (Transact-SQL) **
**MSmerge_generation_partition_mappings
is not discussed in BOL, but is cleaned
up by the stored procedure sp_mergemetadataretentioncleanup.
I immediately groaned at the thought
of yet more metatdata to cleanup - potentially slowing down sync times over
SQL2000, but I wanted to find out more about their uses and if the cleanup
times really did increase.....
Anyway, to give you some idea of the
size of these tables, I ran sp_spacesued on these tables on a production
system with a lot of complex filtering in use.
| |
Rows
|
Size reserved |
|
MSmerge_contents |
465245
|
481672 KB |
|
MSmerge_tombstone |
44687
|
11072 KB |
|
MSmerge_genhistory |
99601
|
28000 KB |
New
tables for precomputed partitions
| |
Rows
|
Size reserved |
|
MSmerge_current_partition_mappings |
734865(!!)
|
148800 KB |
|
MSmerge_past_partition_mappings |
9045
|
3744 KB |
|
MSmerge_generation_partition_mappings |
206105
|
13760 KB |
Behaviour Changes in Metadata Cleanup in SQL2005
When I started testing
SQL2005 I was amazed at the lack of unwanted (and very frequent) cleanups
compared to SQL 2000. These SQL 2000 cleanups can
cause dramatically increased sync times with little practical benefit ie the
facts are that the cleanup process is very IO and CPU intensive, even when
it is doing very little cleaning up.
Anyway, was I
seeing this right, or was the SQL 2005 test environment fooling me? On further
investigation we see a new field in sysmergesubscriptions holding the last
datetime of metadatacleanup – things were looking promising.
SELECT
metadatacleanuptime,*
FROM sysmergesubscriptions
WITH (NOLOCK)
Then running profiler confirmed the
good news – here is an extract from one of the commands run during a sync
.....@dometadata_cleanup =
case when
sys.fn_add_units_to_date(-1,
@retention_period_unit,
getdate())
> metadatacleanuptime
then 1....
Someone at Microsoft had decided to
sort the problem – and decided to clean up data based on the (new feature)
"retention units" set by the administrator (see below). Very neat and simple. Even
better news is that the previous design that ran cleanup at the startup of
every sync for scheduled merge, but only once at the very start of a
continuous merge, has been dumped in favour of checking the cleanup time on
every sync in both circumstances resulting in a much better and more consistent behaviour.
In typical default circumstances (14 day retention period) cleanup is
about once per day which is exactly the right outcome for most users most of the
time, or every sync if the last sync was at least one day ago.
It’s likely that the Merge
developers had little choice but to do some optimisation of metadata cleanup
because purging those extra tables, with potentially very large numbers of
records to scan, was proving just too expensive. Indeed, if you look at
a sync using the profiler you can see other (undocumented) cleanup processes
being called. The good news is that they did such a good job.
SQL 2005 retention period units
By the way it is now possible to
have different values for the retention period units - the default is still
days (the only option in SQL 2000), but you can also select week, month or
year(!) if your publication compatibility level is at least 90RTM (according
to BOL). If the publication compatibility level is less than 90RTM then
‘days’ is still the only option for retention period unit.
There is an error in BOL in this
area as it only mentions a retention_period_unit of
0 = day, 1
= week, 2 = month and 3 = year but there appear to be
‘undocumented’ settings of 4 and 5 for hours and minutes, which are also
available in the GUI. Why ‘minutes’ is available as a sensible merge
retention period unit is beyond me, especially as the merge process allows
24 hours grace anyway before expiration.

Manually Invoking Metadata Cleanup without using
sp_mergemetadataretentioncleanup (technical interest only!?)
If you want to cleanup more
frequently than the default schedule (I cannot think of any obvious reasons)
you could manually update the sysmergesubscriptions.metadatacleanuptime
field to fool the system to think it has not been cleaned up recently.
On my test system (all subscriptions
have same expiration period of 14 days) this manual cleanup works fine
without any obvious problems – I set the last cleanuptime to be more than 1
day old and the system cleans up on next sync and sets the cleanuptime back
to now. However, I recommend you use great care and testing if you decide to
clean up more frequently than the system thinks is necessary with this
mechanism on a production system. The usual provisos apply - updating system
tables in this manner is totally unsupported my Microsoft – rightly so as
you could put rubbish etc in them, and there may be other serious
ramifications that I have not uncovered – use this technique at your own
risk. The supported way is to use sp_mergemetadataretentioncleanup
‘manually’ – but this ‘setting back the date’ technique is easier to use in
my opinion since the ‘approved’ stored procedure technique needs to be run
twice, once at the publisher and once at the subscriber. Note if you are
using subscribers earlier than SQL2000 SP1 check BOL and use
sp_mergecleanupmetadata instead.
Results
On the production system in question
(without any manual tweaks/workarounds etc setup), sync times (on a 15
minute schedule) on a non filtered publication with small numbers of records
to sync have dropped from 15 seconds with SQL 2000 to 2 seconds with SQL 2005,
and system load is dramatically reduced. This is primarily due to the more
sensible scheduling of metadata cleanup.
Conclusions
The Microsoft replication team has
done a great job of optimising metadata cleanup in SQL2005. Workarounds that
sometimes had to be used to increase system performance can be removed.
In my case
this new methodology saves at least 100,000,000 disk reads per day (yes I
have got the right number of zeroes!) which was a total waste of resource.