數據庫平台:SunOS 5.8 Generic_108528-23 sun4u sparc SUNW,Ultra-Enterprise
數據庫版本:8.1.5.0.0
數據庫症狀:數據庫響應緩慢,應用請求無法返回,業務操作陷於停頓,此時需要DBA介入並進行問題診斷及故障處理。
1. 登錄數據庫進行檢查
首先我們登錄數據庫,檢查故障現象。
經過檢查發現,數據塊的所有重做日志組除current外都處於active狀態:
oracle:/oracle/oracle8>sqlplus "/ as sysdba"
SQL*Plus: Release 8.1.5.0.0 - Production on Thu Jun 23 18:56:06 2005
(c) Copyright 1999 Oracle Corporation. All rights reserved.
Connected to:
Oracle8i Enterprise Edition Release 8.1.5.0.0 - Production
With the Partitioning and Java options
PL/SQL Release 8.1.5.0.0 - Production
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 520403 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
2 1 520404 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
3 1 520405 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
4 1 520406 31457280 1 NO CURRENT 1.3861E+10 23-JUN-05
5 1 520398 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05
6 1 520399 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05
7 1 520400 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05
8 1 520401 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05
9 1 520402 104857600 1 NO ACTIVE 1.3861E+10 23-JUN-05
9 rows selected.
SQL> /
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 520403 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
2 1 520404 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
3 1 520405 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
4 1 520406 31457280 1 NO CURRENT 1.3861E+10 23-JUN-05
5 1 520398 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05
6 1 520399 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05
7 1 520400 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05
8 1 520401 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05
9 1 520402 104857600 1 NO ACTIVE 1.3861E+10 23-JUN-05
9 rows selected.
我們知道,當數據庫發生日志切換時(Log Switch),Oracle會觸發一個檢查點(Checkpoint),檢查點進程(Checkpoint Process,CKPT)會通知DBWR(Database?Writer)進程去執行寫操作。在日志文件所保護的處於Buffer cache中的髒數據(dirty buffer)未寫回磁盤之前,日志文件不能被覆蓋或重用。
如果數據庫異常繁忙,或者DBWR的寫出過慢,就可能出現檢查點未完成,Oracle卻已經用完所有日志文件的情況。在這種情況下,數據庫的日志無法生成,整個數據庫將處於停頓狀態,此時日志文件中會記錄類似如下信息:
Mon Jan 23 16:11:39 2006Thread 1 cannot allocate new log,
檢查v$session_wait視圖,我們可以從中看到很多session處於log file switch (checkpoint incomplete) 的等待。
sequence 5871Checkpoint not complete Current log# 2 seq# 5870 mem# 0:
+ORADG/danaly/onlinelog/group_2.260.600173851
Current log# 2 seq# 5870 mem# 1:
+ORADG/danaly/onlinelog/group_2.261.600173853