VSSADMIN命令无法列出SQL writer的问题

 

我们通常有两种方式来备份SQL
Server的数据库。一种是通过在SQL
Server中运行BACKUP
DATABASE的T-SQL语句来备份数据库。另一种方法是使用SQL
writer service来备份数据库。SQL
writer service提供了一种通过Volume
Shadow Copy Service (VSS)框架来备份SQL
Server数据库的途径。第三方厂商可以开发自己的备份应用程序,通过调用SQL
writer service来实现SQL数据库的备份。微软自己的一些产品,比如Data
Protection Manager (DPM)也是使用的SQL
writer service来做备份的。

关于SQL
Writer service和VSS的更多内容,可以参考:

https://technet.microsoft.com/en-us/library/cc966520.aspx

https://msdn.microsoft.com/en-us/library/ms175536.aspx

 

当我们遇到SQL
writer service无法正常工作的问题时,我们通过会通过以下命令来对SQL
writer service做一个快速而简单的测试:

vssadmin list writers

该命令会列出服务器上所有使用VSS的writers,如果SQL
writer service无法被列出,那么自然证明SQL
writer service和VSS之间的协同工作有问题。那么SQL
writer service无法正常工作也在情理之中了。

要进一步研究为什么SQL
writer service无法被列出,一个很有效的方法是使用SQL
writer的trace。开启trace步骤如下:

(1). 以下步骤都在SQL Server上执行

(2).备份注册表键值“HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\SQLWriter\”.

(3). 向注册表中添加如下键值:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\SQLWriter\Debug\Tracing]

"TraceEnterExit"=dword:00000001

"TraceFile"="c:\\trace.txt"

"TraceFileLineInfo"=dword:00000001

"TraceForceFlush"=dword:00000001

"TraceLevel"=dword:ffffffff

"TraceTimestamp"=dword:00000001

"TraceToDebugger"=dword:00000001

"TraceToFile"=dword:00000001

(4). 重启SQL Writer Service。此时我们配置的trace就会开始运行。

(5). 打开命令行,运行:

vssadmin list writers

(6). 运行完成后,打开C:\trace.txt文件来查看是否有任何错误。如果需要关闭trace,只要还原之前备份的键值,并且重启SQL writer service即可。

 

 

根据经验,大多数SQL
writer service无法被列出的问题是由以下两种问题造成的:

(1)     
当VSSADMIN尝试列出SQL writer service时,他们尝试去连接服务器上“所有”的SQL
Server实例,一旦有一个实例无法连接上,则SQL
writer service就无法被列出。

(2)     
当VSSADMIN连接上SQL Server后,他会运行一句查询语句,如果这句查询语句失败,则SQL
writer service就无法被列出。

这两个问题都可以在trace文件里发现。以下是一个trace文件的示例。

---------------------------------------------------------------------------------

[0082394203,0x0008b0:0x1c10:0xfc9d47b3]
sqlwriter.yukon\sqllib\sqlconnect.cpp(0589): SqlConnection::Connect: Connecting
to server A000S-ITSQL05...

[0082394234,0x0008b0:0x1c10:0xfc9d47b3]
sqlwriter.yukon\sqllib\sqlconnect.cpp(0630): SqlConnection::Connect: Connected

[0082394250,0x0008b0:0x1c10:0xfc9d47b3]
sqlwriter.yukon\sqllib\sqlconnect.cpp(0652): SqlConnection::Connect: Version:
09.00.3042

[0082394265,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::Connect}: hr: 0x00000000

[0082394281,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlConnection::SetCommand}:

[0082394296,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlConnection::ReleaseRowset}:

[0082394312,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::ReleaseRowset}: hr: 0x00000000

[0082394328,0x0008b0:0x1c10:0xfc9d47b3]
sqlwriter.yukon\sqllib\sqlconnect.cpp(0916): SqlConnection::SetCommand: SetCommand (select
rtrim(physical_name),rtrim(type_desc),rtrim(state_desc), is_name_reserved from
sys.master_files where DB_ID(N'RFTHSOWMS ') = database_id)

[0082394359,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::SetCommand}: hr: 0x00000000

[0082394375,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlConnection::ExecCommand}:

[0082394390,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlConnection::ReleaseRowset}:

[0082394406,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::ReleaseRowset}: hr: 0x00000000

[0082394453,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::ExecCommand}: hr: 0x00000000

[0082394468,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlConnection::FetchFirst}:

[0082394484,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::FetchFirst}: hr: 0x00000000

[0082394500,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlEnumerator::FirstFile}: hr: 0x00000000

[0082394531,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlEnumerator::NextFile}:

[0082394546,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlConnection::FetchNext}:

[0082394578,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::FetchNext}: hr: 0x00000000

[0082394593,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlEnumerator::NextFile}: hr: 0x00000000

[0082394640,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlEnumerator::NextFile}:

[0082394656,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlConnection::FetchNext}:

[0082394687,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::FetchNext}: hr: 0x00040ec6

[0082394718,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlEnumerator::NextFile}: hr: 0x00040ec6

[0082394750,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {CSqlWriter::CheckFilesBeforeAddComponent}: hr: 0x00040ec6

[0082394765,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {CSqlWriter::AddComponentAndFiles}:

[0082397921,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlEnumerator::FirstFile}:

[0082397937,0x0008b0:0x1c10:0xfc9d47b3]
ENTER {SqlConnection::FetchFirst}:

[0082397968,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlConnection::FetchFirst}: hr: 0x00000000

[0082397984,0x0008b0:0x1c10:0xfc9d47b3]
EXIT {SqlEnumerator::FirstFile}: hr: 0x00000000

[0082401375,0x0008b0:0x1c10:0xfc9d47b3]
sqlwriter.yukon\sqlwriter\sqlwriter.cpp(2233):
CSqlWriter::AddComponentAndFiles: IVssCreateWriterMetadata::AddFilesToFileGroup
failed. hr = 0x80042308

---------------------------------------------------------------------------------

 

首先我们可以看到SQL
Server实例已经被成功连接到了(SqlConnection::Connect:
Connected)。如果这里显示连接失败的话,我们可以到SQL
errorlog里查看是否在对应时间里有login
failed的错误(error
18456)被记录。如果有的话,我们就可以做进一步工作来检查为什么会有这样的连接失败产生。

连接成功后,VSSADMIN就开始执行语句:

SqlConnection::SetCommand: SetCommand (select
rtrim(physical_name),rtrim(type_desc),rtrim(state_desc), is_name_reserved from
sys.master_files where DB_ID(N'RFTHSOWMS ') = database_id

 

在上面我们给出的实例中,这句语句的运行失败了。于是导致了VSSADMIN无法列出SQL
writer service。

为什么查询会失败呢?该查询语句中,数据库名“RFTHSOWMS ”在末尾处有一个空格。就是这个空格导致了查询的时候。事实上,只要数据库名的开头或者末尾有空格,又或是数据库名包含打印不出的字符的话,就会导致查询失败。关于这个问题,在微软的知识库文章中已经有讲到过,大家有兴趣的话可以去看看这篇文章:

 

https://support.microsoft.com/default.aspx?scid=kb;en-US;2014054

 

SQL writer
service的trace是非常有用的。它不但可以发现vssadmin
list writers时产生的问题,对于一些在使用SQL
writer或者MSDE
writer备份数据库的过程中发生的问题也能提供非常有用的信息。(MSDE
writer是SQL
Server 2000的时候SQL使用VSS框架的writer。)

 

我以前遇到过一个使用MSDE
writer备份数据库报错的问题。我使用了trace来排查这个问题。打开MSDE
writer的trace的方法和SQL
writer service类似,只是注册表的键值不同。对于MSDE
writer,我们要添加的键值是:

  [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VSS\Debug\Tracing]
"TraceFile"="c:\\trace.txt"
"TraceLevel"=dword:ffffffff
"TraceEnterExit"=dword:00000001
"TraceToFile"=dword:00000001
"TraceToDebugger"=dword:00000000
"TraceFileLineInfo"=dword:00000001
"TraceForceFlush"=dword:00000000

 

添加完后,重启MSDE
writer service使trace生效。然后运行backup软件来重现问题。然后查看Trace文件,发现:

 

---------------------------------------------------------------------------------

[3012184656,0x0011f8:0x1290:0x5d93de09]
modules\sqllib\sqlconnect.cpp(0198): DumpErrorInfo: Error state: 2

Severity:
14

[3012184656,0x0011f8:0x1568:0x5d93de09]
modules\sqllib\sqlconnect.cpp(0239): DumpErrorInfo: Error
message:
Database
'ISALOG_20081021_FWS_000' cannot be opened due to inaccessible files or
insufficient memory or disk space. See the SQL Server errorlog for
details.

[3012184671,0x0011f8:0x131c:0x5d93de09]
modules\sqllib\sqlconnect.cpp(0605): SqlConnection::SetCommand: SetCommand
(BACKUP DATABASE [ISALOG_20100512_FWS_000] TO
VIRTUAL_DEVICE='{D431F72B-C2B1-49C0-9B80-01D7B2095BF9}9' WITH
SNAPSHOT,BUFFERCOUNT=1,BLOCKSIZE=1024)

[3012184671,0x0011f8:0x15a8:0x5d93de09]
ENTER {SqlConnection::ExecCommand}:

[3012184671,0x0011f8:0x0cc0:0x5d93de09]
EXIT {SqlConnection::ReleaseRowset}: hr: 0x00000000

[3012184671,0x0011f8:0x0a9c:0x5d93de09]
modules\sqllib\sqlconnect.cpp(0151): DumpErrorInfo:
HRESULT: 0X80040E14

[3012184687,0x0011f8:0x0b7c:0x5d93de09]
modules\sqllib\sqlconnect.cpp(0198): DumpErrorInfo: Error state: 1

Severity:
16

[3012184687,0x0011f8:0x164c:0x5d93de09]
modules\sqllib\sqlconnect.cpp(0239): DumpErrorInfo: Error
message:
BACKUP DATABASE is
terminating abnormally.

[3012184687,0x0011f8:0x0b98:0x5d93de09]
modules\sqllib\sqlconnect.cpp(0158): DumpErrorInfo: SQLSTATE: 42000

---------------------------------------------------------------------------------

 

从错误上看问题应该是发生在SQL
Server内部。于是看了SQL
Server的errorlog。Errorlog里在相应的时间里记载了如下信息:

2010-08-05 18:30:33.51
spid10 This SQL Server has been optimized for 8 concurrent
queries. This limit has been exceeded by 10 queries and performance may be
adversely affected.

2010-08-05 18:30:34.65
spid63 BackupMedium::ReportIoError: write failure on backup
device '{D431F72B-C2B1-49C0-9B80-01D7B2095BF9}9'. Operating system error
995(error not found).

2010-08-05 18:30:34.65
spid63 Internal I/O request 0x43333C28: Op: Write, pBuffer:
0x04430400, Size: 1024, Position: 0, UMS: Internal: 0x0, InternalHigh: 0x0,
Offset: 0x10001, OffsetHigh: 0x0, m_buf: 0x00030001, m_len: 0, m_actualBytes:
0, m_errcode: 995, BackupFile: {D431F72B-C2B1-49C0-9B80-01D7B2095BF9}9

2010-08-05 18:30:34.65
backup BACKUP failed to complete the command BACKUP DATABASE
[ISALOG_20100512_FWS_000] TO VIRTUAL_DEVICE='{D431F72B-C2B1-49C0-9B80-01D7B2095BF9}9'
WITH SNAPSHOT,BUFFERCOUNT=1,BLOCKSIZE=1024

原来客户使用的SQL
Server是MSDE的版本。MSDE版本的SQL
Server限制最大工作线程数是8个。但是客户使用了备份工具同时备份了多个数据库。每个数据库的备份都占用了工作线程,最后导致有部分数据库备份的请求由于得不到工作线程而失败。