oracle中config shard的状态不正常的处理

我写关于12.2 sharding database的文章已经好久了,今天再次把环境启动了起来,启动了主机之后,依次启动了listener和shardcat数据库和shard node数据库实例。检查shard状态的时候,发现报错:

(a)问题一:status显示warnings:

GDSCTL>config shard
Name                Shard Group         Status    State       Region    Availability
----                -----------         ------    -----       ------    ------------
sh1                 primary_shardgroup  Ok        Deployed    region1   ONLINE      
sh2                 primary_shardgroup  Warnings  Deployed    region1   -           
 
GDSCTL>

遇到这样的情况,我们可以使用recover shard的命令。注意recover shard -h出来的帮助是错误的。

GDSCTL>recover shard -h
Syntax
RECOVER SHARD -gdspool pool -database db_name [-skip_first|-ignore_first]      
[-full]                                                                        
 
Purpose
Executes all DDL statements on specified database starting from the one, that  
was previusly executed with errors. The command is intended to perform all     
skipped DDL changes after database administrator fixes shard issues.           
 
Usage Notes
Use SKIP_FIRST to skip first DDL. This is typically required after manual fix  
done by database administrator. For example, if CREATE TABLE statement fails   
because of lack of space, database administrator fixes the issue and re-executes
CREATE TABLE. To avoid ORA-39151 (table exists)in RECOVER SHARD he has to      
specify -SKIP_FIRST.                                                           
 
Use IGNORE_FIRST to mark first DDL as obsolete. This is required when wrong DDL
statement was specified and failed on all shards. Thus you need to mark it down
as obsolete.                                                                   
 
Keywords and Parameters
full:         Full recovery mode.
gdspool:      the GDS pool (If not specified and there is only one gdspool with access
              granted to user, it will be used by default)
ignore_first: make first failed DDL statement obsolete.
shard:        The name of the shard.
skip_first:   skip first failed DDL statement
 
Examples
RECOVER SHARD -database shd1
 
GDSCTL>
正确的用法应该是recover shard -shard 。

我们运行这个命令,发现是gsm没有启动:

GDSCTL>recover shard -shard sh2
GSM-45076: GSM is not running
GDSCTL>
GDSCTL>start gsm
GSM is started successfully
GDSCTL>

启动后,稍等片刻(大约10秒),检查状态就恢复成正常了(在这里,我们还没有真正用到recover shard,在下面一个问题时才正常用到):

GDSCTL>config shard
Name                Shard Group         Status    State       Region    Availability
----                -----------         ------    -----       ------    ------------
sh1                 primary_shardgroup  Ok        Deployed    region1   ONLINE      
sh2                 primary_shardgroup  Warnings  Deployed    region1   -           
 
GDSCTL>
GDSCTL>--等待大约10秒
GDSCTL>
GDSCTL>config shard
Name                Shard Group         Status    State       Region    Availability
----                -----------         ------    -----       ------    ------------
sh1                 primary_shardgroup  Ok        Deployed    region1   ONLINE      
sh2                 primary_shardgroup  Ok        Deployed    region1   ONLINE      
 
GDSCTL>
所以,重新启动之后,需要启动的有database,listener,和gsm。

(b)问题二:state显示ddl error
然后,在接着的实验中,我drop掉shardcat上的一个duplicate table,发现此时shard状态又不正常了。config shard的state显示ddl error。

首先,在shardcat上操作,drop掉duplicate表

SQL> conn app_schema/oracle
Connected.
SQL>
SQL> drop table products;
drop table products
*
ERROR at line 1:
ORA-02557: cannot operate on sharded objects when shard DDL is disabled
 
 
SQL>                               
SQL>
SQL>
SQL> alter session enable shard ddl;
 
Session altered.
 
SQL> select count(*) from products;
 
  COUNT(*)
----------
      1000
 
SQL>
SQL> drop table products;
 
Table dropped.
 
SQL>
发现此时数据并没有同步,在shard node上,还是可以查询到数据:

SQL> !hostname
sdb2
 
SQL>
SQL> conn app_schema/oracle
Connected.
SQL>
SQL> select count(*) from products;
 
  COUNT(*)
----------
      1000
 
SQL>
 
 
 
SQL> !hostname
sdb3
 
SQL>
SQL> conn app_schema/oracle
Connected.
SQL>
SQL> select count(*) from products;
 
  COUNT(*)
----------
      1000
 
SQL>
此时检查shard状态,变成DDL error:

GDSCTL>config shard
Name                Shard Group         Status    State       Region    Availability
----                -----------         ------    -----       ------    ------------
sh1                 primary_shardgroup  Ok        DDL error   region1   ONLINE      
sh2                 primary_shardgroup  Ok        DDL error   region1   ONLINE      
 
GDSCTL>
我们可以在gsm的alertlog中看到一些提示信息:
(注:gsm的日志位置在/u01/ora12c/app/oracle/diag/gsm/sdb1/sharddirector1/trace/alert_shardcat.log)

……
07-NOV-2016 01:35:19 * (CONNECT_DATA=(SERVICE_NAME=GDS$CATALOG.oradbcloud)(CID=(PROGRAM=oracle)(HOST=sdb3)(USER=oracle12c))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.23)(PORT=34247)) * establish * GDS$CATALOG.oradbcloud * 0
2016-11-07T01:35:22.021482+08:00
07-NOV-2016 01:35:22 * service_update * shardcat%11 * 0
07-NOV-2016 01:35:22 * service_update * shardcat%1 * 0
2016-11-07T01:35:35.543868+08:00
GSM-40135: Catalog request: "DDL REQUEST" (80). Target -"33". Pool- - "". Id="72".
Payload:"A"
GSM-40135: Catalog request: "DDL REQUEST" (80). Target -"34". Pool- - "". Id="73".
Payload:"E"
2016-11-07T01:35:36.652686+08:00
GSM-40148: Database task failed:database:"sh1", status 4, message:"ORA-01031: insufficient privileges
ORA-06512: at "SYS.EXECASUSER", line 35
ORA-06512: at "SYS.DBMS_SYS_SQL", line 1434
ORA-06512: at "SYS.EXECASUSER", line 23
ORA-06512: at "GSMADMIN_INTERNAL.EXECUTEDDL", line 68
ORA-06512: at line 1 \(ngsmoci_execute\)
 
"
GSM-40148: Database task failed:database:"sh2", status 4, message:"ORA-01031: insufficient privileges
ORA-06512: at "SYS.EXECASUSER", line 35
ORA-06512: at "SYS.DBMS_SYS_SQL", line 1434
ORA-06512: at "SYS.EXECASUSER", line 23
ORA-06512: at "GSMADMIN_INTERNAL.EXECUTEDDL", line 68
ORA-06512: at line 1 \(ngsmoci_execute\)
 
"
2016-11-07T01:35:36.700593+08:00
GSM-40157: GSM response Id=72 Type=80. Status:"Success with info".
2016-11-07T01:35:37.010475+08:00
GSM-40135: Catalog request: "RUNTIME DATABASE" (40). Target -"sh2". Pool- - "". Id="74".
Payload:"(PARAMETERS=(DDLID=33)(AUTORCV=0))"
GSM-40135: Catalog request: "RUNTIME DATABASE" (40). Target -"sh1". Pool- - "". Id="75".
Payload:"(PARAMETERS=(DDLID=33)(AUTORCV=0))"
GSM-40157: GSM response Id=73 Type=80. Status:"Success".
2016-11-07T01:35:41.834561+08:00
07-NOV-2016 01:35:41 * service_update * SHARDDIRECTOR1 * 0
……
其实,从这个log中,我们只是看到有ORA-01031 insufficient privileges的报错,但究竟是什么权限不足,我们还是得通过config shard命令来进一步看:

GDSCTL>config shard -shard sh1
Name: sh1
Shard Group: primary_shardgroup
Status: Ok
State: Deployed
Region: region1
Connection string: sdb2:1521/sh1:dedicated
SCAN address:
ONS remote port: 0
Disk Threshold, ms: 20
CPU Threshold, %: 75
Version: 12.2.0.0
Last Failed DDL: alter database link "PRODUCTSD...
DDL Error: ORA-01031: insufficient privileges
ORA-06512: at "SYS.EXECASUSER", line 35
ORA-06512: at "SYS.DBMS_SYS_SQL", line 1434
ORA-06512: at "SYS.EXECASUSER", line 23
ORA-06512: at "GSMADMIN_INTERNAL.EXECUTEDDL", line 68
ORA-06512: at line 1 \(ngsmoci_execute\)

Failed DDL id: 33
Availability: ONLINE

Supported services
------------------------
Name                                                            Preferred Status   
----                                                            --------- ------   
oltp_rw_srvc                                                    Yes       Enabled  

 

GDSCTL>

我们看到这个last failed ddl,是alter database link的时候出现权限不足。

所以我们找到了原因,进行修复:

--在sh1上:
[oracle12c@sdb2 ~]$ sqlplus "/ as sysdba"

SQL*Plus: Release 12.2.0.0.2 Beta on Mon Nov 7 01:50:25 2016

Copyright (c) 1982, 2015, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.0.2 - 64bit Beta
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> grant alter database link to app_schema;

Grant succeeded.

SQL>

--在sh2上
[oracle12c@sdb3 ~]$ sqlplus "/ as sysdba"

SQL*Plus: Release 12.2.0.0.2 Beta on Mon Nov 7 01:51:13 2016

Copyright (c) 1982, 2015, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.0.2 - 64bit Beta
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> grant alter database link to app_schema;

Grant succeeded.

SQL>

--在shardcat上,gsm_env下:
GDSCTL>config shard
Name                Shard Group         Status    State       Region    Availability
----                -----------         ------    -----       ------    ------------
sh1                 primary_shardgroup  Ok        DDL error   region1   ONLINE      
sh2                 primary_shardgroup  Ok        DDL error   region1   ONLINE      

GDSCTL>
GDSCTL>recover shard -shard sh1
GDSCTL>
GDSCTL>recover shard -shard sh2  
GDSCTL>
GDSCTL>config shard
Name                Shard Group         Status    State       Region    Availability
----                -----------         ------    -----       ------    ------------
sh1                 primary_shardgroup  Ok        Deployed    region1   ONLINE      
sh2                 primary_shardgroup  Ok        Deployed    region1   ONLINE      

GDSCTL>

--在sh1上:
[oracle12c@sdb2 ~]$ sqlplus "/ as sysdba"
 
SQL*Plus: Release 12.2.0.0.2 Beta on Mon Nov 7 01:50:25 2016
 
Copyright (c) 1982, 2015, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.0.2 - 64bit Beta
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
SQL> grant alter database link to app_schema;
 
Grant succeeded.
 
SQL>
 
--在sh2上
[oracle12c@sdb3 ~]$ sqlplus "/ as sysdba"
 
SQL*Plus: Release 12.2.0.0.2 Beta on Mon Nov 7 01:51:13 2016
 
Copyright (c) 1982, 2015, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.0.2 - 64bit Beta
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
 
SQL> grant alter database link to app_schema;
 
Grant succeeded.
 
SQL>
 
--在shardcat上,gsm_env下:
GDSCTL>config shard
Name                Shard Group         Status    State       Region    Availability
----                -----------         ------    -----       ------    ------------
sh1                 primary_shardgroup  Ok        DDL error   region1   ONLINE      
sh2                 primary_shardgroup  Ok        DDL error   region1   ONLINE      
 
GDSCTL>
GDSCTL>recover shard -shard sh1
GDSCTL>
GDSCTL>recover shard -shard sh2  
GDSCTL>
GDSCTL>config shard
Name                Shard Group         Status    State       Region    Availability
----                -----------         ------    -----       ------    ------------
sh1                 primary_shardgroup  Ok        Deployed    region1   ONLINE      
sh2                 primary_shardgroup  Ok        Deployed    region1   ONLINE      
 
GDSCTL>
可以看到,在修复问题的基础上,运行recover shard -shard 之后,就恢复了正常。

注意需要注意的是,如果root cause没找到,不知道是database link的权限问题,仅仅运行recover shard是没有效果的。

时间: 2024-10-01 20:00:52

oracle中config shard的状态不正常的处理的相关文章

oracle中11.2 RAC安装新主机,识别老存储-3-配置老存储的数据库

安装Oracle 11.2.0.4数据库软件,然后执行root.sh,这个没有特别的东西,略. 之后,我们需要修改ORACLE RDBMS的oracle二进制文件的权限,让oracle 数据库进程可以获取ASM磁盘组. [root@lunar5 ~]# su - grid [grid@lunar5 ~]$ $ORACLE_HOME/bin/setasmgidwrap o=/u01/app/oracle/product/11.2.0.4/dbhome_1/bin/oracle [grid@luna

在ASP.NET应用程序中捕捉身份验证状态的变化

asp.net|程序 前年我写过一篇随笔抱怨Microsoft在ASP.NET架构中Session_End事件上处理,说来惭愧,其实当年我对ASP.NET运行时的复杂性理解不足.实话说,捕捉通过身份验证和注销身份验证对我来说,意义重大.例如: 在登录前先检查是否已经存在SSO提供器: 登录完成后加载相关的权限,这些加载过程可能与具体应用项目完全无关: 登录结束后通知SSO提供器清除Cookie内容: ...... 目前的ASP.NET提供的解决方案是在Global.cs中加上FormsAuthe

Oracle中Kill session的研究

oracle|session Oracle中Kill session的研究   作者: Eygle link: http://www.eygle.com/faq/Kill_Session.htm 我们知道,在Oracle数据库中,可以通过kill session的方式来终止一个进程,其基本语法结构为: alter system kill session 'sid,serial#' ;   被kill掉的session,状态会被标记为killed,Oracle会在该用户下一次touch时清除该进程

Oracle中基于SCHEDULER的job

基于scheduler的job的设置,基本的创建脚本: sys.dbms_scheduler.create_job( job_name => '"SYS"."REBUILD_JOB1"', program_name => '"SYS"."EMP_IND_REBUILD"', schedule_name => '"SYS"."DAILYREBUILD"', job_cla

如何在oracle中手工创建数据库

oracle中手工创建数据库的步骤: 1. 创建必要的相关目录 2. 创建初始化参数文件 3. 设置环境变量Oracle_sid 4. 创建实例 5. 创建口令文件 6. 启动数据库到nomount(实例)状态 7. 执行建库脚本 8. 执行catalog脚本创建数据字典 9. 执行catproc创建package包 10.      执行pupbld 11.      由初始化参数文件创建spfile文件 12.      执行scott脚本创建scott模式 13.      用命令测试数据

Oracle中的锁(LOCK)机制

 本文结合示例简要的介绍了一下Oracle中锁的机制. 为了解决多用户环境下并发操作相同的资源而造成的错误修改数据的问题.单用户环境下不需要考虑锁,因为所有操作都是串行的.下面的文章简要的介绍了一下 锁的分类异常复杂,enqueue.latch.mutex等,都是为了解决并发存在的,自己也有些混乱,所以也不过多解释了.下面列举一些对于lock的要点内容. l 排他锁: 不允许相关的资源被共享.一个资源在一个时间点内只有一个事务能够获取该资源的排他锁,只有持有该锁的事务能够修改相关的资源, 其他想

Oracle中怎样用自治事务保存日志表

数据库事务是一种单元操作,要么是全部操作都成功,要么全部失败.在Oracle中,一个事务是从执行第一个数据治理语言(DML)语句开始,直到执行一个COMMIT语句,提交保存这个事务,或者执行一个ROLLBACK语句,放弃此次操作结束. 事务的"要么全部完成,要么什么都没完成"的本性会使将错误信息记入数据库表中变得很困难,因为当事务失败重新运行时,用来编写日志条目的INSERT语句还未完成. 针对这种困境,Oracle提供了一种便捷的方法,即自治事务.自治事务从当前事务开始,在其自身的语

ORACLE中死锁的知识点总结

  死锁的概念       什么是死锁呢? 其实我们生活中也有很多类似死锁的例子. 我先举一个生活中的例子:过年回家,父亲买了一把水弹枪,儿子和侄子争抢着要先玩,谁也不让谁,拆开包装后,一个抢了枪, 一个逮住了子弹和弹夹.两个都争着要先玩,但是都互不相让.结果两个人都玩不了.如果儿子要先玩,就必须让侄子把子弹和弹夹给他,如果侄子要先玩,就必须让儿子把枪给侄子.他们就这样对峙了十几分钟,互不相让. 我出来调停,让儿子把枪先给侄子玩,每个人玩十分钟.然后两个人开开心心一起玩起来.其实这就是一个活生生

Oracle中删除用户和表空间的常见问题(比如:ORA-01940无法删除当前已连接用户的解决方案)

这时候以管理员身份进入sqlplus命令窗口. 在删除用户的时候有时候会出现以下问题: ORA-01940无法删除当前已连接用户 这时候的解决方案是: 1)查看用户的连接状况   select username,sid,serial# from v$session (2)找到要删除用户的sid,和serial,并删除 例如:你要删除用户'WUZHQ',可以这样做: alter system kill session'532,4562'; 这里的532表示的是sid,4562表示的是seria.