| |
Replication Republishing
Introduction
Republishing in a replication scenario occurs
when a subscriber to articles from a publisher is itself a publisher of these
same articles. In transactional replication, changes made at the publisher will
propagate in a double-hop to the final subscriber. In merge replication, changes
may be made by any link in the chain and will in turn propagate to all other
links. This sounds like an unnecessarily complicated setup, so why would you
want to arrange this topology? The most obvious case is when you have a
particularly slow network connection. By using republishing, you ensure that
data is only sent once over the slow link, and faster distribution occurs at the
other side of the link:

The shipping industry is the main
area where I have come across the need for this technology. The satellite links
to a ship are slow and often unreliable (large waves obscure the receiver and
consequently break the connection), so rather than replicating from head office to several
databases on each ship, we replicate once to the ship and then republish.
This article will examine how to set up republishing on SQL Server 2005 although
the basic setup is much the same for SQL Server 2000.
Setting it up
For the most part, the setup is straightforward,
provided you pay attention to a few key areas. Let's look at the initial data
migration then the required
'tweaks' to transactional and merge replication separately:
The Initialization Process
Provided the topology is set up from left to
right in the above diagram, there are not really any difficulties, and normal
initializations can be used. Of course this is the easiest method. In the case
of P1 being set up later than P2, we could use a backup and restore method,
however this becomes very complicated and is IMO best left alone. For a start,
P2 must be treated as RO or else there will be non-convergence of data. Next,
there will be issues of identity columns because in the correct setup only
tables on P1 should have the identity attribute, and not P2, S3 or S4.
Merge
(a) type of subscription
There are 2 types of merge subscription in SQL
Server 2005: "Client" and "Server". Client subscribers
assume the priority of the publisher during conflict resolution with other
subscribers, while Server subscribers are each assigned priority numbers, with
the highest values winning. In SQL Server 2000 these are referred to as
"Local" and "Global" (Local=Client, Global=Server). Only
Server subscribers (those having priority numbers) can republish data to
other subscribers. So, in the diagram above, P2(S) should be set up as
a subscriber to P1 with a priority number assigned before it is allowed to
republish. Failure to do so will result in the following message (at the
bottom). This error message still uses the terminology of SQL Server 2000,
although the meaning is clear.

So, the subscriber P2(S) must have a priority
number assigned, as must S3 and S4. The particular values must also be chosen
carefully: P1 > P2(S) > S3/4. If you've partitioned the data to ensure no
conflicts, this is probably a bit academic, however it is still mandatory.

(b) Identity Ranges
Identity ranges are managed by the publisher
and propagated to subscribers by the merge agent. As we have a republishing
hierarchy, the ranges are managed both by the root publisher and the republisher.
As per usual, the identity values are assigned from a pool at the publisher,
but the subscribers S3 and S4 must be assigned their range from the P2(S)
values. If the subscriber PS(2) has a range of 100 assigned to it from the
Publisher P1, this means that PS(2), S3 and S4 (and any other outer subscribers)
must together amount to 100 (controlled by the article properties of the
republisher). This means that the size of the range itself limits the number of
outer subscribers - something worth noting if you are planning for a lot of
subscribers. Fortunately the default size of the ranges has increased vastly in
SQL Server 2005 to 1000 per subscriber by default so forgetting to check this is
less likely to be a problem in cases with a few republishing subscribers.
When S3 runs out of values it requests them from P2
and when all of P2(S)s have run out or when S3 requests another range and there
aren't sufficient values available, a new range is requested by PS(2) from P1.
Whether this is from the merge agent or via a call to
sp_adjustpublisheridentityrange doesn't really matter - it follows the same
rules as for a normal merge subscription.
Transactional
Transactional replication is limited to the
standard vanilla setup ie you are not able to use updating Subscribers. Also
worth noting is the fact that in transactional replication, tracer tokens are
not forwarded by republishing Subscribers. In this case we could add a tracer
token to P1 and add one to P2(S) independently.
Schema Changes, Retention Periods and SQL Server 2000 Considerations
Schema
changes should be made only at the publisher, not at the republishing
subscriber. Changes propagated to a republishing subscriber are by
default propagated to its subscribers. In SQL Server 2000 we
would use sp_repladdcolumn and sp_repldropcolumn but these are NOT
supported in a republishing tolology: in the above diagram, if we added a column
to P1 it would be added to P1 and P2(S) but NOT S3 or S4, and so result in a
need to reinitialize. In SQL Server 2005 the ALTER TABLE statement can be used
provided @replicate_ddl is set to 1, and new columns added to P1
will successfully be added to P2(S) and subsequently to S3 and S4. The retention
period for any republisher must be set to a value equal to or less than the
retention period set at the original publisher and using different values may
lead to non-convergence.
|
|