November 9, 2017 SQLCORE

[Repost] Corrupção de dados direto das trincheiras

Olá,

Hoje venho com mais um caso de corrupção de dados extraído de um ambiente real, direto das “trincheiras”.

Tive um caso onde há alguns dias o SQL vinha gerando dumps de um problema até então não identificado. Como os dumps não vinham sendo verificados o problema ficou por alguns dias e só fomos “informados” dele quando um dia o SQL foi derrubado pelo erro. Vamos à investigação.

Primeiro erro crítico encontrado, que derrubou o SQL:

=====================================================================
BugCheck Dump
=====================================================================

This file is generated by Microsoft SQL Server
version 9.00.5000.00
upon detection of fatal unexpected error. Please return this file,
the query or program that produced the bugcheck, the database and
the error log, and any other pertinent information with a Service Request.

Computer type is AT/AT COMPATIBLE.
Current time is 01:53:50 09/20/12.
8 Unknown CPU 9., 3 Mhz processor (s).
Windows NT 5.2 Build 3790 CSD Service Pack 2.

Memory
MemoryLoad = 87%
Total Physical = 8189 MB
Available Physical = 1048 MB
Total Page File = 9779 MB
Available Page File = 2785 MB
Total Virtual = 8388607 MB
Available Virtual = 8379991 MB
***Stack Dump being sent to E:\MSSQL.1\MSSQL\LOG\SQLDump0129.txt
SqlDumpExceptionHandler: Process 63 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is
terminating this process.
* *******************************************************************************
*
* BEGIN STACK DUMP:
*   09/20/12 01:53:50 spid 63
*
*
*   Exception Address = 0000000001D20C81 Module(sqlservr+0000000000D20C81)
*   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION
*   Access Violation occurred writing address 0000000000000002

Destaquei os pontos que me chamaram a atenção. Pensei, provavelmente algo grave mas vamos continuar investigando… Este erro ocorreu ainda mais uma vez e depois começaram a surgir dumps com esta característica:


Begin transaction log dump for database 'dbcorporativo'; for Index maintenance - filter on ObjId.

Current LSN = 002907c7:000019ea:01da
Operation = LOP_MODIFY_ROW
Context = LCX_CLUSTERED
Transaction ID = 0000:7241332b
Tag Bits = 0000
Log Record Fixed Length = 62
Log Record Length = 116
Previous LSN = 002907c7:000019ea:01d8
Flag Bits = 0200
AllocUnitId = 372292930109440
AllocUnitName = dbo.mk45_pf.pk_mk45_pf
Page ID = 0001:004cfa63
Slot ID = 3
Previous Page LSN = 002907c6:00039c20:015a
Number of Locks = 3
Lock Information = HoBt 372292930109440:ACQUIRE_LOCK_IX OBJECT: 6:1385771994:0 ;ACQUIRE_LOCK_IX PAG
E: 6:1:5044835;ACQUIRE_LOCK_X KEY: 6:372292930109440 (0f0088d0eb93)
Description =
Log Record = 00003e00c7072900ea190000d80102002b3341720000040263fa4c0001000300da339952c6072900
209c03005a0101000000da33995201009c000100000004000100010008001a00cd00772ad18a2e48160fe84c020100000101000c0000da339
9520000010200040204000a0f0088d0eb939952

Current LSN = 002907c7:000019ea:01d9
Operation = LOP_MODIFY_COLUMNS
Context = LCX_CLUSTERED
Transaction ID = 0000:7241331c
Tag Bits = 0000
Log Record Fixed Length = 62
Log Record Length = 144
Previous LSN = 002907c7:000019ea:01d7
Flag Bits = 0200
AllocUnitId = 372292930109440
AllocUnitName = dbo.mk45_pf.pk_mk45_pf
Page ID = 0001:0059445a
Slot ID = 3
Previous Page LSN = 002907c7:000019ea:01d7
Number of Locks = 3
Lock Information = HoBt 372292930109440:ACQUIRE_LOCK_IX OBJECT: 6:1385771994:0 ;ACQUIRE_LOCK_IX PAG
E: 6:1:5850202;ACQUIRE_LOCK_X KEY: 6:372292930109440 (5200e0c454d0)
Description =
Log Record = 00003e00c7072900ea190000d70102001c334172000006025a44590001000300da339952c7072900
ea190000d70101000000da339952010000000000000008000800040008001a000200020001000100330033009c009c000200010016521b530
20100000101000c0000da3399520000010200040204000a5200e0c454d04172530104020a004e00d0000100d1339952

E no final deste dump, a mensagem:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

**Dump thread - spid = 63, PSS = 0x00000000BB17DEA0, EC = 0x00000000BB17DEB0
***Stack Dump being sent to E:\MSSQL.1\MSSQL\LOG\SQLDump0131.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
* 09/20/12 03:05:40 spid 63
*
* CPerIndexMetaQS::ErrorAbort - Index corruption

Já fiquei um pouco mais tranquilo. Trata-se de uma corrupção de índice. Ótimo! Algo relativamente simples de ser tratado. Restava ainda ter absoluta certeza de que se trava de uma corrupção de índice e tratá-la. Vamos ao DBCC CHECKDB. O resultado do CHECKDB na base em questão foi o seguinte (resumo):

Msg 8951, Level 16, State 1, Line 1
Table error: table 'mk45_pf' (ID 1385771994). Data row does not have a matching index row in the index 'IK_mk45_pf' (ID 24). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:4206449:9) identified by (cd_pes = 34916789) with index values 'cd_cpf_pes = 16614763254. and cd_sgm_pes_pri = '006' and cd_sgm_pes_sec = 'AKO' and cd_pes = 34916789'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'mk45_pf' (ID 1385771994). Data row does not have a matching index row in the index 'IK_mk45_pf' (ID 24). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:5060407:26) identified by (cd_pes = 38871114) with index values 'cd_cpf_pes = NULL and cd_sgm_pes_pri = '006' and cd_sgm_pes_sec = 'AKO' and cd_pes = 38871114'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'mk45_pf' (ID 1385771994). Data row does not have a matching index row in the index 'IK_mk45_pf' (ID 24). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:5061350:3) identified by (cd_pes = 38871291) with index values 'cd_cpf_pes = NULL and cd_sgm_pes_pri = '106' and cd_sgm_pes_sec = 'AUB' and cd_pes = 38871291'.
...
CHECKDB found 0 allocation errors and 28 consistency errors in table 'mk45_pf' (object ID 1385771994).
CHECKDB found 0 allocation errors and 28 consistency errors in database 'dbcorporativo'.
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (dbcorporativo).

No dump, ele gerou o resultado do DBCC CHECKDB desta forma:

Table error: table 'mk45_pf' (ID 1385771994). Index row in index 'IK_mk45_pf' (ID 24) does not match any data row. Possible extra or invalid keys for:
Index row (1:15540904:69) with values (cd_cpf_pes = 9746759701
. and cd_sgm_pes_pri = '006' and cd_sgm_pes_sec = '080' and cd_pes = 46266481) pointing to the data row identified by (cd_pes = 46266481).
Table error: table 'mk45_pf' (ID 1385771994). Index row in index 'IK_mk45_pf' (ID 24) does not match any data row. Possible extra or invalid keys for:
Index row (1:15577461:126) with values (cd_cpf_pes = 859179826
00. and cd_sgm_pes_pri = '106' and cd_sgm_pes_sec = '133' and cd_pes = 38872341) pointing to the data row identified by (cd_pes = 38872341).
CHECKDB found 0 allocation errors and 28 consistency errors in table 'mk45_pf' (object ID 1385771994).
CHECKDB found 0 allocation errors and 28 consistency errors in database 'dbcorporativo'.
DBCC CHECKDB (dbcorporativo) WITH all_errormsgs, no_infomsgs executed by BSBR\dbadmin found 28 errors and repaired 0 errors. Elapsed time: 1 hours 9 minutes 23 seconds.
repair_rebuild is the minimum repair level for the errors foun
d by DBCC CHECKDB (dbcorporativo).

**Dump thread - spid = 59, PSS = 0x0000000193777C00, EC = 0x0000000193777C10
***Stack Dump being sent to E:\MSSQL.1\MSSQL\LOG\SQLDump0161.txt
* *******************************************************************************
*
* BEGIN STACK DUMP:
* 10/05/12 15:08:22 spid 59
*
* DBCC database corruption
*
* Input Buffer 142 bytes -
* dbcc checkdb(dbcorporativo) with all_errormsgs, no_infomsgs
*
* *******************************************************************************

Problema diagnosticado com exatidão. Realmente trata-se de uma corrupção de índice. A minha suspeita seria de que este é um índice NON-CLUSTERED e alguma referência da PK foi corrompida, ou mesmo o próprio dado do índice. Não tive dúvidas, confirmei que se tratava de um índice NC (exec sp_helpindex ‘minha_tabela’), dropei o índice e o recriei logo em seguida.

Rodei o DBCC CHECKDB na sequência e, sucesso. Problema resolvido. Exp + 1. =)

Abraços e bons estudos.
Erickson Ricci

Leave a Reply

Your email address will not be published. Required fields are marked *