Average 28000 requests/month












The Hidden Cost of Metadata Cleanup in Merge Replication (SQL Server 2000)

Article by Nigel Maneffa

Introduction: A Brief History of Metadata Table Cleanup

Those using merge replication will know that it uses several system tables, three of which are particularly important: MSmerge_genhistory, MSmerge_contents and MSmerge_tombstone with the first 2 holding all the metadata relating to inserts and updates, and Msmerge_tombstone maintainin metadata on deletes.

Early adopters of Merge replication (introduced in SQL Server 7.0) found that these merge replication metadata tables would consistently grow over time, causing ever increasing slowdowns and intense locking/blocking issues. One possible 'fix' was to completely remove replication and subsequently reinstall it from scratch, which shows that this was without doubt real “version 1.0” stuff. Also, the fact that servers at that time were dramatically less powerful further magnified these issues.

Soon after there followed unofficial, non-Microsoft stored procedures (eg Michael Hotek) to clear out the unnecessary metadata, and ultimately an official MS stored procedure (sp_mergecleanupmetadata) which had the same aim. Having to run these type of procedures manually at infrequent intervals was a real pain as I remember – many years ago now thankfully. There was also the added problem that the system had to be quiesced during this process.

Eventually everyone's prayers were answered when SQL 2000 SP1 introduced automatic metadata cleanup at every synchronisation. The parameter “–MetadataRetentionCleanup” was added to the Merge profile and everyone was home and dry – cleaned up metadata by default and no more manual work. 

Performance Implications of Cleaning up the Metadata

Like most things in computing you never get something for nothing, and my findings are that the metadata cleanup process absorbs very large amounts of system processes so if you are running merge synchronizations on a frequent schedule the performance hit on the servers in question can be very noticeable and we need to consider the best ways to manage this impact. To put the problem in perspective, let’s look at a real OLTP system that I know very well. It consists of 1 publisher and 15 subscribers and is a 200 user system.

Looking at the main metadata table MSmerge_contents we find:

Publisher – 380,464 rows

Typical subscriber – 215,769 rows (there is partitioning etc)

This is typical of such an OLTP system with a 14 day (default) retention time. Very crudely we could guess that 27,000 metadata records will be flushed each day (380,000/14) on the publisher.

This system uses dial-up comms with scheduled synchronisation; each server synchronises every 15 minutes during business hours. As there are 4 synchronisations per hour we see a total of 60 publisher synchronisations in total per hour (or 1 synchronisation per minute from the viewpoint of the publisher).

To give you some idea of the performance hit here is a simple processor trace from task manager:

You can see the massive processor spike from standard user load to metadata clean up starting just over half-way through the trace. This server is a Xeon Dell 2600 with 2GB of RAM (almost all being used by SQL) and good disk IO. It is syncing with a similar server. The sync time is 20 seconds – ˝ of which is taken clearing up metadata! However this usually consists of only a handful of records.

This is the performance hit data (taken from profiler) when running sp_mergemetadataretentioncleanup

CPU(milliseconds)

Disk Reads

Duration(milliseconds)

Publisher

10234

192616

10830

Subscriber

10125

126026

10186

To ensure reliable answers I ran a dbcc dbreindex on merge tables shortly beforehand.

I repeat: over 50% of the 20 second sync time and large amounts of server resources is taken cleaning up a handful of merge records – happening once per minute in this topography. 

Solutions

OK, so what can we do to mitigate this?

Well, we need to initially distinguish between syncing actual data and the purging of metadata which occurs during the synchronization process. As a general rule, purging the metadata records once per day should be sufficient but how do we achieve this? 

The merge profile controls whether metadata cleanup occurs at synchronisation and has a default of “ON”. We can turn it "OFF" in the GUI via Enterprise Manager or using TSQL:

Run: sp_help_agent_profile, get the id of the profile in question, eg 16
Run: sp_change_agent_parameter @profile_id = 16,
@parameter_name = '-MetadataRetentionCleanup',
@parameter_value = '0'

You have now turned off metadataRetentionCleanup and the next sync will have much reduced load. So in the above case we have reduced server load but are not cleaning up metadata any more. The best of both worlds is to switch the profile on to cleanup on for a couple of syncs per day and then turn it off until the next day or whatever suits your environment. For example we could have 2 jobs:

J ob1 – 7AM - MetadataRetentionCleanup set to 1

All subscribers sync from 7 – 7.30 – cleaning up metadata – high server load

J ob2 – 7.30AM MetadataRetentionCleanup set to 0

No metadataRetentionCleanup load for the rest of the day.

This scenario allows the merge agents to concentrate on synchronizing just the data we are interested in. 

You can also create the above effect by modifying the job steps in the Merge Agents folder and add the MetadataRetentionCleanup parameter and any specific settings in the 'command line' will override the settings in the profile. . Here is an example:

-Publisher [MANEFFA_LAPTOP] -PublisherDB [Merge] -Publication [Merge] -Subscriber [MANEFFA_LAPTOP] -SubscriberDB [MergeSub] -Distributor [MANEFFA_LAPTOP] -MetadataRetentionCleanup 1 -DistributorSecurityMode 1

You will need to consider how frequently to reindex MSmerge_genhistory, MSmerge_contents and MSmerge_tombstone to keep optimum performance. Look at DBCC SHOWCONTIG information. You can expect 10% faster merge time (MS tests) by frequent reindexing. Remember that that the merge meta data tables are quite likely to be the most frequently modified tables in your whole system, so be careful when you reindex. The server in the example above takes about 1 minute to reindex the three metadata tables when MSmerge_contents has 500,000 records.

You might be wondering why you cannot schedule the sp_mergecleanupmetadata to run as a manual job. Microsoft state that this stored procedure should not be run standalone unless you have a system prior to SQL2000 sp1. The system needs to be in a quiesced state before running this procedure, almost impossible in practise.

The agent profile continuous setting has the opposite effect to scheduled mode; metadata is cleared up at the start of the synchronisation and not removed again until the agent is stopped and started again – this could be a considerable time (sometimes days or weeks), but in practise there is usually some stoppage that means the continuous profile will need to be restarted, either manually or by pointing the failed job step back at itself.

I personally prefer setting a one minute scheduled job with managed metadata cleanup to a continuous job with a one minute polling interval.

Metadata cleanup is modified somewhat in SQL Derver 2005. We will discuss that in another article.

Conclusions

Frequent scheduled merge agent synchronisations can cause very heavy server load with little benefit. We have investigated ways to mitigate this problem without loss of metadata clean up.

(Technical Editor Paul Ibison)

 
 

July 2007