Home  |  Articles  |  Scripts  |  Consultancy  |  FAQs  |  Links

 

Merge Filters Issue in SQL 2005/8 (17th Jan 2010)

Introduction

I've recently come across a non-convergence issue in merge replication. It only affects those people who are using filters and join filters, and they need to be set up in a particular way. Basically, the join filters need to be set up in a direction that opposes the DRI setup. I know this sounds complicated so please take a look at the schema below which'll hopefully clarify a bit:

If you have something like the setup above, but the publication filters are declared in reverse as below then you might have a problem.

In the above setup, we add 3 related records then remove them - all done on the publisher.

insert into mergepub.dbo.tCompany(cid,companyname) values(1,'acme')
insert into mergepub.dbo.tDepartment(did,departmentname,cid) values(1,'dept1',1)
insert into mergepub.dbo.tStaff([sid],staffname,did) values(1,'astaff1',1)

delete mergepub.dbo.tStaff
delete mergepub.dbo.tDepartment
delete mergepub.dbo.tCompany

Leaves no records remaining on the publisher or the subscriber. This is OK. However removing the records in one batch from the subscriber as follows and then synchronizing.....

delete mergesub.dbo.tStaff
delete mergesub.dbo.tDepartment
delete mergesub.dbo.tCompany

Leaves the company record remaining at the publisher!!!!

Background info

I thought I'd check the nicknames but there's nothing strange going on there:

select name, nickname from sysmergearticles order by nickname desc
-- merge processes DELETEs in descending article nickname order.

Further digging and collaborative research shows that the rows are processed in the order of tstaff, tdepartment, tcompany for deletes. The issue is that when the tdepartment delete is applied at publisher, the corresponding tcompany record has its lineage updated. When applying the delete on tcompany, an old cached value of lineage for tcompany is compared with the new lineage value for tcompany.  When the lineages differ, the tcompany delete fails because it thinks there is a metadata conflict. Consequently the tcompany row on the publisher never gets deleted. 

So - how to fix this?

The Solutions

(1) set @use_partition_groups = 'false' is the parameter value that correctly propagates the delete of tCompany to the publisher from the subscriber (with @partition_options = 0 for the 3 articles). This forces the use of the SetupBelongs code set which is slower but seems more robust for this particular use. Using @use_partition_groups = 'true' leaves the final row in tCompany at the publisher and this is the default.

(2) Deleting the records at the subscriber one by one starting with the tStaff record idoes not work, as the removal of the tStaff record also removes all the related tDepartment and tCompany records at the subscriber on synchronization. This means that the tDepartment and tCompany records are left hanging at the publisher. Deleting the records at the publisher one by one starting with the tStaff record works fine and all publisher and subscriber records are removed. This can be done to remediate the issue afterwards if required.

(3) this problem only manifests where there are more than 2 tables involved. For just 2 tables I haven't seen an issue. Not really a solution as such because you might simply need 3 tables in the filters.

(4) increasing downloadgenerationsperbatch and uploadgenerationsperbatch? might be relevant here. I'll investigate more, but the problem first manifested when these were set to very small values (1 for each). Possibly a larger value is less likely to cause a problem because it'd mean that the company delete might end up in the same batch as the other deletes.

Finally

I'll update this page once the issue has been sorted out in a hotfix.