| |
Using the TABLEDIFF Utility
Introduction
This article details the use of the new TableDiff
utility which is used to solve problems associated with non-convergence. Typically, this is related to replication, where the
subscriber and publisher's data doesn't match (and once synchronised, you
subsequently do a nosync initialization), however the utility could really be used in any comparable situation. We won't go into the exact causes of the
non-convergence - very generally speaking these can include lax DBA protocols, internal replication errors and intentional use of
the (non-updatable) subscriber - the focus here is what can this tool do and how to best use it.
NB It is possible to run TableDiff as a
standalone utility if you simply copy the executable to another machine - the
only requirement is that TableDiff needs the correct version of the .NET
Framework installed. It will be possible in this way to use the utility to
compare SQL Server 2000 databases (thx to Lee Greene for passing this on).
TABLEDIFF Arguments Listed
First of all, where is it? This tablediff.exe utility is a command-line program
which exists in the
C:\Program Files\Microsoft SQL Server\90\COM folder.
To find the available options we use "tablediff -?" and the output is:
User-specified agent parameter values:
-?
Replication Diff Tool Command Line Options
usage: tablediff
-- Source Options --
-sourceserver Source Host
-sourcedatabase Source Database
-sourceschema Source Schema Name
-sourcetable Source Table or View
-sourceuser Source Login
-sourcepassword Source Password
-sourcelocked Lock the source table/view durring tablediff
-- Destination Options --
Note: For multiple Destinations use a comma delimited list
-destinationserver Destination Host
-destinationdatabase Destination Database
-destinationschema Destination Schema Name
-destinationtable Destination Table or View
-destinationuser Destination Login
-destinationpassword Destination Password
-destinationlocked Lock the destination table/view durring tablediff
-- Misc Options --
-t Timeout
-c Column Level Diff
-f Generate Fix SQL (You may also specify a file name and path)
-q Quick Row Count
-et Specify a table to put the errors into
-dt Drop the error table if it exists
-o Output file
-b Number of bytes to read for blob data types (Default 1000)
-strict Strict compare of source and destination schema
Note that this is slightly different to the options shown in BOL, where "-bf" is
additionally included (explanation below).
A typical execution is shown below:

TABLEDIFF Argument Details
In the above example, I
was comparing 2 tables in the same database, for testing purposes. The tables are
called tCustomers and tCustomersCopy. They had very simple table schemas and
consisted of 2 rows which differed slightly. The actual command that was run is
shown at the top, which is then formatted in the output to reiterate the
selections made - a useful facility if you decide to redirect the output to a
file (using '>'). After that comes a more interesting part and we discover
that there are 2 differences along with the execution time. To find details of
the actual differing rows, some useful TSQL to synchronize the data, output the
results to a file and several other facilities, we must provide more than the
default arguments, so let's examine all the options available.
Connection Arguments.
The connection details (-source... and -destination... options)
are mostly self-explanatory. The login options refer to SQL authentication, and for
trusted security you simply omit the login names and passwords entirely, assuming
you are logged in with a domain user that exists as a windows login in each SQL
Server.
No options / -o option. With no additional arguments the entire output
is simply written to the cmd window as above. An identical output is sent to a separate text
output file if the -o argument is used.
-c option This returns the column names which are different. In
the case where I have rows differing in the SName column I receive:
Err ID Col
Mismatch 2 SName
Note that although this is defined as 'Column Level Diff', it has nothing to do with differing
numbers of columns or differing column orders. Different column orders is not a
problem, however horizontally partitioned tables (different numbers of columns)
are not permitted.
-f option generates the TSQL to synchronize the two tables.
Note that this creates a script to synchronize the subscriber with the publisher ie it
works in one direction only. If you need it to be used in the other direction, you can swap the
connection parameter values for the publisher and the subscriber. The filename
and path is by default of the type:
"C:\Program Files\Microsoft SQL Server\90\COM\DIFFIX.632722665324687500"
This can be changed by using the -o option and specifying a filename. In either case the file output looks
like that shown below:
-- Host: sql2005pc
-- Database: [pub1]
-- Table: [dbo].[tcustomerscopy]
SET IDENTITY_INSERT [dbo].[tcustomerscopy] ON
UPDATE [dbo].[tcustomerscopy] SET [SName]='Smith' WHERE [ID] = 2
SET IDENTITY_INSERT [dbo].[tcustomerscopy] OFF
You might ask how
the tool 'knows' which where clause to create. To understand, if you try to compare two tables
consisting only of simple columns, you'll receive the error message below:
The replication table difference tool requires the comparison tables/views to have either a primary key, identity, rowguid or unique key column.
-b option. This is the number of bytes to compare for large object data type
columns. Basically, there is a max of 8000 bytes and data after that isn't
compared.
-q option. This performs a simple rowcount, so even though the 2 rows are different we receive the
following output:
Table [pub1].[dbo].[tCustomers] on sql2005pc and Table
[pub1].[dbo].[tcustomerscopy] on sql2005pc are identical.
Table [pub1].[dbo].[tCustomers] on sql2005pc has 2 rows.
Table [pub1].[dbo].[tcustomerscopy] on sql2005pc has 2 rows.
The requested operation took 0.234375 seconds.
-et option. This allows us to put the output into a table. The table is created
on the subscriber, and in the case of -f, it has the following type of
structure, which is probably the lost useful aspect of this tool:
|
ID
|
MSdifftool_ErrorCode
|
MSdifftool_ErrorDescription
|
MSdifftool_FixSQL
|
MSdifftool_OffendingColumns
|
|
2
|
0
|
Mismatch
|
UPDATE [dbo].[tcustomerscopy] SET [SName]='Smith' WHERE [ID] =
2
|
[SName]
|
-strict option Is used to disallow implicit datatype conversion mappings
(such as int -> bigint, varchar(max) -> text) and therefore perform a 'strict' validation.
Unlike normally, when you use the -strict option, the destination table must also have a
primary key, identity, or ROWGUID column - NB this is not necessarily there in the case of transactional replication.
The resulting error message is shown below:
Table [pub1].[dbo].[tCustomers] on sql2005pc and Table
[pub1].[dbo].[tcustomerscopy] on sql2005pc have different schemas and cannot be compared.
The requested operation took 1.0625 seconds.
-f -bf#num option. When this is used, several files are created each with #num commands in
them in order to synchronize the tables. Note that this command is not included in
list of parameters shown by issuing -?. This facility could be important for huge tables with loads of mismatched
rows, where the maximum filesize might be exceeded.
-destinationlocked and -sourcelocked
options
Why do we want to lock the tables, and how does
TABLEDIFF work by default? To find the rows that differ, behind the scenes
the tool uses the following query:
SELECT [dbo].[tCustomers].[ID],BINARY_CHECKSUM([dbo].[tCustomers].[FName],[dbo].[tCustomers].[ID],[dbo].[tCustomers].[SName])
as MShash_54267293 FROM [dbo].[tCustomers] WITH (READUNCOMMITTED) ORDER BY [dbo].[tCustomers].[ID]
Once the different checksums were identified, we see this query to find one of the
actual column values:
SELECT [dbo].[tCustomers].[FName],[dbo].[tCustomers].[ID],[dbo].[tCustomers].[SName] FROM [dbo].[tCustomers] WITH
(READUNCOMMITTED) WHERE [ID] = 2
So, this is useful in that we notice it used dirty reads which explains the usefulness of
the final parameters: -destinationlocked and -sourcelocked.
Comments
I think this is a pretty useful, albeit simple
tool. I currently use Redgate's DataCompare which is in several ways a more
sophisticated tool, but on those client sites which don't have a Redgate licence
I'll definitely consider using it.
|
|