fanng's blog: DB

显示标签为“DB”的博文。显示所有博文

2009年3月26日星期四

ORA-12737错误

今天群里一个哥们碰见ORA-12737错误，情况是这样的：服务器装在win2003上，客户端使用了instant client（linux平台）。

ORA-12737: Instant Client Light: unsupported server character set ZHS16GBK

看来是即时客户端不支持服务器的ZHS16GBK字符集。

ORACLE给的解释如下：

ORA-12737: Instant Client Light: unsupported server character set string
Cause: The character set specified is not allowed for this operation or is invalid. Instant Client Light has only minimal character sets.
Action: Do not use Instant Client Light for this character set

翻译过来就是说即时客户端支持的字符集有限，不要用这个东西。
感觉ORACLE有时候就是这么操蛋，给这种解释。

2009年3月18日星期三

判断表空间是否还可以压缩的脚本

declare

cursor c_dbfile is

select tablespace_name

,file_name

,file_id

,bytes

from sys.dba_data_files

where status !='INVALID'

order by tablespace_name,file_id;

cursor c_space(v_file_id in number) is

select block_id,blocks

from sys.dba_free_space

where file_id=v_file_id

order by block_id desc;

blocksize number;

filesize number;

extsize number;

begin

select value

into blocksize

from v$parameter

where name = 'db_block_size';

for c_rec1 in c_dbfile

loop

filesize := c_rec1.bytes;

for c_rec2 in c_space(c_rec1.file_id)

loop

extsize := ((c_rec2.block_id - 1)*blocksize + c_rec2.blocks*blocksize);

if extsize = filesize

then

filesize := (c_rec2.block_id - 1)*blocksize;

else

exit outer;

end if;

end loop outer;

if filesize = c_rec1.bytes

then

dbms_output.put_line('Tablespace: '

||' '||c_rec1.tablespace_name||' Datafile: '||c_rec1.file_name);

dbms_output.put_line('Can not be resized, no free space at end of file.');

dbms_output.put_line('.');

else

if filesize <>

then

dbms_output.put_line('Tablespace: '

||' '||c_rec1.tablespace_name||' Datafile: '||c_rec1.file_name);

dbms_output.put_line('Can be resized to: '||2*blocksize

||' Bytes, Actual size: '||c_rec1.bytes||' Bytes');

dbms_output.put_line('.');

else

dbms_output.put_line('Tablespace: '

||' '||c_rec1.tablespace_name||' Datafile: '||c_rec1.file_name);

dbms_output.put_line('Can be resized to: '||filesize

||' Bytes, Actual size: '||c_rec1.bytes);

dbms_output.put_line('.');

end if;

end loop;

end;

群里WU叔写的代码，用于判定一个表空间能被压缩到什么程度。知道resize的程度之后，只需要运行alter database datafile '......' resize 15000M就可以了。

2009年3月11日星期三

ORACLE sql调整（三）

使用SPM

SQL PLAN MANAGEMENT是ORACLE11g中引入的新功能。SQL PROFILE实际上一些hint和统计信息，来帮助优化器得到更好的执行计划；stored outline是将稳定的执行计划存储起来，供以后使用；而SPM则是将生成的执行计划放在一个库中进行管理，使用了spm，优化器就不会突然选择效率很低的查询。

capture SQL plan baseline：
捕捉sql plan baseline在默认情况下是关闭的，所以需要打开:

alter session
set optimizer_capture_sql_plan_baselines = true;

运行以下语句：
select * /* ARUP */ from sales
where quantity_sold > 1 order by cust_id;
-- Change the optimizer mode
alter session set optimizer_mode = first_rows;
-- Second execution. Opt Mode changed
select * /* ARUP */ from sales
where quantity_sold > 1 order by cust_id;
-- Gather stats now
begin
  dbms_stats.gather_table_stats (
      ownname            => 'SH',
      tabname             => 'SALES',
      cascade              => TRUE,
      no_invalidate      => FALSE,
      method_opt        => 'FOR ALL INDEXED COLUMNS SIZE AUTO',
      granularity          => 'GLOBAL AND PARTITION',
      estimate_percent => 10,
      degree                => 4
  );
end;
/
-- Third execution. After stats
select * /* ARUP */ from sales
where quantity_sold > 1 order by cust_id;

/* ARUP */并不是sql hint主要是方便我们比较容易的表示这条语句。

然后可以在EM里点击 Server tab -> SQL Plan Control->SQL Plan Baseline,enable之。

Using Baselines：

alter session set
optimizer_use_sql_plan_baselines = true;

SQL> explain plan for select * /* ARUP */ from sales
2 where quantity_sold > 1 order by cust_id;
Explained.

SQL> select * from table(dbms_xplan.display(null, null, 'basic'));
PLAN_TABLE_OUTPUT
---------------------------
Plan hash value: 143117509
--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT ORDER BY | |
| 2 | PARTITION RANGE ALL | |
| 3 | TABLE ACCESS BY LOCAL INDEX ROWID| SALES |
| 4 | BITMAP CONVERSION TO ROWIDS | |
| 5 | BITMAP INDEX FULL SCAN | SALES_TIME_BIX |
--------------------------------------------------------------

-- disable baselines

SQL> alter session set optimizer_use_sql_plan_baselines = false;
Session altered.

SQL> explain plan for select * /* ARUP */ from sales
2 where quantity_sold > 1 order by cust_id;
Explained.

SQL> select * from table(dbms_xplan.display(null, null, 'basic'));

PLAN_TABLE_OUTPUT
----------------------------
Plan hash value: 3803407550

--------------------------------------
| Id | Operation | Name |
--------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT ORDER BY | |
| 2 | PARTITION RANGE ALL | |
| 3 | TABLE ACCESS FULL | SALES |
--------------------------------------


其实很多操作完全可以在EM里完成，具体可参照：
http://www.oracle.com/technology/obe/11gr1_db/manage/spm/spm.htm

ORACLE sql调整（二）

Stored Outlines

stored outline实际上是ORACLE存储的比较稳定的执行计划。
第一步：
Creating Outlines
修改系统/会话参数

-- Switch on automatic creation of stored outlines.
ALTER SYSTEM SET create_stored_outlines=TRUE;
ALTER SESSION SET create_stored_outlines=TRUE;

-- Switch on automatic creation of stored outlines.
ALTER SYSTEM SET create_stored_outlines=FALSE;
ALTER SESSION SET create_stored_outlines=FALSE;

赋予用户权限

-- Grant the necessary privileges.
CONN sys/password AS SYSDBA
GRANT CREATE ANY OUTLINE TO SCOTT;
GRANT EXECUTE_CATALOG_ROLE TO SCOTT;

CONN scott/tiger

-- 为特定的sql创建outline.
CREATE OUTLINE emp_dept FOR CATEGORY scott_outlines
ON SELECT e.empno, e.ename, d.dname FROM emp e, dept d WHERE e.deptno = d.deptno;

--检查outline是否正确创建.
COLUMN name FORMAT A30
SELECT name, category, sql_text FROM user_outlines WHERE category = 'SCOTT_OUTLINES';

NAME                           CATEGORY
------------------------------ ------------------------------
SQL_TEXT
--------------------------------------------------------------------------------
EMP_DEPT                       SCOTT_OUTLINES
SELECT e.empno, e.ename, d.dname FROM emp e, dept d WHERE e.deptno = d.deptno


1 row selected.

-- 列出所有的hint列表.
COLUMN hint FORMAT A50
SELECT node, stage, join_pos, hint FROM user_outline_hints WHERE name = 'EMP_DEPT';

     NODE      STAGE   JOIN_POS HINT
---------- ---------- ---------- --------------------------------------------------
        1          1          0 NO_EXPAND(@"SEL$1" )
        1          1          0 PQ_DISTRIBUTE(@"SEL$1" "E"@"SEL$1" NONE NONE)
        1          1          0 USE_MERGE(@"SEL$1" "E"@"SEL$1")
        1          1          0 LEADING(@"SEL$1"  "D"@"SEL$1" "E"@"SEL$1")
        1          1          0 NO_STAR_TRANSFORMATION(@"SEL$1" )
        1          1          0 NO_FACT(@"SEL$1" "E"@"SEL$1")
        1          1          0 NO_FACT(@"SEL$1" "D"@"SEL$1")
        1          1          2 FULL(@"SEL$1" "E"@"SEL$1")
        1          1          1 INDEX(@"SEL$1" "D"@"SEL$1" ("DEPT"."DEPTNO"))
        1          1          0 NO_REWRITE(@"SEL$1" )
        1          1          0 NO_REWRITE(@"SEL$1" )

11 rows selected.

下面的方式使用DBMS_OUTLN.CREATE_OUTLINE的方式建立outline

-- 运行语句
SELECT e.empno, e.ename, d.dname, e.job FROM emp e, dept d WHERE e.deptno = d.deptno AND d.dname = 'SALES';

    EMPNO ENAME      DNAME          JOB
---------- ---------- -------------- ---------
     7499 ALLEN      SALES          SALESMAN
     7698 BLAKE      SALES          MANAGER
     7654 MARTIN     SALES          SALESMAN
     7900 JAMES      SALES          CLERK
     7844 TURNER     SALES          SALESMAN
     7521 WARD       SALES          SALESMAN

6 rows selected.

-- 获取hash_value
SELECT hash_value, child_number, sql_text FROM v$sql WHERE sql_text LIKE 'SELECT e.empno, e.ename, d.dname, e.job%';

HASH_VALUE CHILD_NUMBER
---------- ------------
SQL_TEXT
----------------------------------------------------------------------------------------------------
3909283366            0
SELECT e.empno, e.ename, d.dname, e.job FROM emp e, dept d WHERE e.deptno = d.deptno AND d.dname = '
SALES'


1 row selected.

-- Create outline .
BEGIN
 DBMS_OUTLN.create_outline(
   hash_value    => 3909283366,
   child_number  => 0,
   category      => 'SCOTT_OUTLINES');
END;
/

-- 检查.
COLUMN name FORMAT A30
SELECT name, category, sql_text FROM user_outlines WHERE category = 'SCOTT_OUTLINES';

NAME                           CATEGORY
------------------------------ ------------------------------
SQL_TEXT
--------------------------------------------------------------------------------
SYS_OUTLINE_05092314510581419  SCOTT_OUTLINES
SELECT e.empno, e.ename, d.dname, e.job FROM emp e, dept d WHERE e.deptno = d.de

EMP_DEPT                       SCOTT_OUTLINES
SELECT e.empno, e.ename, d.dname FROM emp e, dept d WHERE e.deptno = d.deptno


2 rows selected.

-- list
COLUMN hint FORMAT A50
SELECT node, stage, join_pos, hint  FROM user_outline_hints WHERE name = 'SYS_OUTLINE_05092314510581419';


     NODE      STAGE   JOIN_POS HINT
---------- ---------- ---------- --------------------------------------------------
        1          1          0 NO_EXPAND(@"SEL$1" )
        1          1          0 PQ_DISTRIBUTE(@"SEL$1" "E"@"SEL$1" NONE NONE)
        1          1          0 USE_MERGE(@"SEL$1" "E"@"SEL$1")
        1          1          0 LEADING(@"SEL$1"  "D"@"SEL$1" "E"@"SEL$1")
        1          1          0 NO_STAR_TRANSFORMATION(@"SEL$1" )
        1          1          0 NO_FACT(@"SEL$1" "E"@"SEL$1")
        1          1          0 NO_FACT(@"SEL$1" "D"@"SEL$1")
        1          1          2 FULL(@"SEL$1" "E"@"SEL$1")
        1          1          1 INDEX(@"SEL$1" "D"@"SEL$1" ("DEPT"."DEPTNO"))
        1          1          0 NO_REWRITE(@"SEL$1" )

10 rows selected.

Using Outlines

--检查outline是否已经被使用
SELECT name, category, used FROM user_outlines;

NAME                           CATEGORY                       USED
------------------------------ ------------------------------ ------
SYS_OUTLINE_05092314510581419  SCOTT_OUTLINES                 UNUSED
EMP_DEPT                       SCOTT_OUTLINES                 UNUSED

2 rows selected.

-- 对下面的语句建大纲
SELECT e.empno, e.ename, d.dname FROM emp e, dept d WHERE e.deptno = d.deptno;
SELECT e.empno, e.ename, d.dname, e.job FROM emp e, dept d WHERE e.deptno = d.deptno AND d.dname = 'SALES';

-- 检查outline的使用情况
SELECT name, category, used FROM user_outlines;

NAME                           CATEGORY                       USED
------------------------------ ------------------------------ ------
SYS_OUTLINE_05092314510581419  SCOTT_OUTLINES                 UNUSED
EMP_DEPT                       SCOTT_OUTLINES                 UNUSED

2 rows selected.

-- 启用outline
ALTER SESSION SET query_rewrite_enabled=TRUE;
ALTER SESSION SET use_stored_outlines=SCOTT_OUTLINES;

-- 运行语句
SELECT e.empno, e.ename, d.dname FROM emp e, dept d WHERE e.deptno = d.deptno;
SELECT e.empno, e.ename, d.dname, e.job FROM emp e, dept d WHERE e.deptno = d.deptno AND d.dname = 'SALES';

-- 检查使用情况
SELECT name, category, used FROM user_outlines;

NAME                           CATEGORY                       USED
------------------------------ ------------------------------ ------
SYS_OUTLINE_05092314510581419  SCOTT_OUTLINES                 USED
EMP_DEPT                       SCOTT_OUTLINES                 USED

2 rows selected.

Dropping Outlines

BEGIN
 DBMS_OUTLN.drop_by_cat (cat => 'SCOTT_OUTLINES');
END;
/

2009年3月9日星期一

几种SQL模式

小结果集,直接条件(access by index)

小结果集,间接条件 (join)

多个宽泛条件的交集是小结果集

多个宽泛条件的交集（不一定小,考虑使用sql hint）

大结果集（full table scan+并行查询）

基于一个表的自连接（考虑使用分析函数）

通过聚合获取结果集（尽量减少聚合数据量）

基于日期的搜索（考虑使用子查询优化）

ORACLE sql调整（一）

sql profile:
基本使用方法：
fy@ORCL> drop table t;

Table dropped.

fy@ORCL> create table t(id constraint t_pk primary key,pad) as
2 select rownum,lpad('*',4000,'*')
3 from all_objects
4 where rownum<=10000; Table created. sys@ORCL> grant advisor to fy;

fy@ORCL> var tn varchar2(30)
fy@ORCL> declare
2 l_sqltext clob:='select count(*) from t where id+42=126';
3 begin
4 :tn:=dbms_sqltune.create_tuning_task(sql_text=>l_sqltext);
5 dbms_sqltune.execute_tuning_task(:tn);
6 end;
7 /

PL/SQL procedure successfully completed.

fy@ORCL> select dbms_sqltune.report_tuning_task(:tn) from dual;

DBMS_SQLTUNE.REPORT_TUNING_TASK(:TN)
-------------------------------------------------------------------------------

GENERAL INFORMATION SECTION
-------------------------------------------------------------------------------
Tuning Task Name : TASK_122
Tuning Task Owner : FY
Workload Type : Single SQL Statement
Scope : COMPREHENSIVE
Time Limit(seconds): 1800
Completion Status : COMPLETED
Started at : 03/10/2009 10:34:02
Completed at : 03/10/2009 10:34:02

-------------------------------------------------------------------------------
Schema Name: FY
SQL ID : 8m7h2qakgcajz
SQL Text : select count(*) from t where id+42=126

-------------------------------------------------------------------------------
FINDINGS SECTION (3 findings)
-------------------------------------------------------------------------------

1- SQL Profile Finding (see explain plans section below)
--------------------------------------------------------
A potentially better execution plan was found for this statement.

Recommendation (estimated benefit<=10%) --------------------------------------- - Consider accepting the recommended SQL profile. execute dbms_sqltune.accept_sql_profile(task_name => 'TASK_122',
task_owner => 'FY', replace => TRUE);

Validation results
------------------
The SQL profile was tested by executing both its plan and the original plan
and measuring their respective execution statistics. A plan may have been
only partially executed if the other could be run to completion in less time.

Original Plan With SQL Profile % Improved
------------- ---------------- ----------
Completion Status: COMPLETE COMPLETE
Elapsed Time(ms): 0 0
CPU Time(ms): 0 0
User I/O Time(ms): 0 0
Buffer Gets: 23 21 8.69%
Disk Reads: 0 0
Direct Writes: 0 0
Rows Processed: 1 1
Fetches: 1 1
Executions: 1 1

2- Index Finding (see explain plans section below)
--------------------------------------------------
The execution plan of this statement can be improved by creating one or more
indices.

Recommendation (estimated benefit: 95.23%)
------------------------------------------
- Consider running the Access Advisor to improve the physical schema design
or creating the recommended index.
create index FY.IDX$$_007A0001 on FY.T("ID"+42);

Rationale
---------
Creating the recommended indices significantly improves the execution plan
of this statement. However, it might be preferable to run "Access Advisor"
using a representative SQL workload as opposed to a single statement. This
will allow to get comprehensive index recommendations which takes into
account index maintenance overhead and additional space consumption.

3- Restructure SQL finding (see plan 1 in explain plans section)
----------------------------------------------------------------
The predicate "T"."ID"+42=126 used at line ID 2 of the execution plan
contains an expression on indexed column "ID". This expression prevents the
optimizer from selecting indices on table "FY"."T".

Recommendation
--------------
- Rewrite the predicate into an equivalent form to take advantage of
indices. Alternatively, create a function-based index on the expression.

Rationale
---------
The optimizer is unable to use an index if the predicate is an inequality
condition or if there is an expression or an implicit data type conversion
on the indexed column.

-------------------------------------------------------------------------------
EXPLAIN PLANS SECTION
-------------------------------------------------------------------------------

1- Original With Adjusted Cost
------------------------------
Plan hash value: 454320086

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX FAST FULL SCAN| T_PK | 13 | 52 | 7 (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("ID"+42=126)

2- Original With Adjusted Cost
------------------------------
Plan hash value: 454320086

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Tim

使用以下语句可以生成类似EM的建议：
select a.command as type,
f.message as findings,
a.message as recommendation,
t.message as rationable
from dba_advisor_actions a,
dba_advisor_recommendations r,
dba_advisor_findings f,
dba_advisor_rationale t
where
a.task_id=122
and a.task_id=r.task_id
and a.rec_id=r.rec_id
and a.task_id=t.task_id
and a.rec_id=t.rec_id
and f.task_id=r.task_id
and f.finding_id=r.finding_id;

fy@ORCL> alter session set sqltune_category=TEST;

Session altered.

fy@ORCL> begin
2 dbms_sqltune.accept_sql_profile(
3 task_name=>'TASK_122',
4 name=>'frist_rows.sql',
5 category=>'TEST',
6 force_match=>true
7 );
8 end;
9 /

PL/SQL procedure successfully completed.

dbms_sqltune.alter_sql_profile(
name=>'fisrt_rows.sql',
attribute_name=>'CATEGORY',
value=>'DEFAULT'
)

关于sql profile的基本操作，可以看看dbms_sqltune这个包（好像可以直接导入），明天研究一下sqlprofile的机理。

2009年3月8日星期日

使用ORACLE高级查询重写

在无法修改sql的情况下，有以下几种方法可以调整sql:
sql outline,
sql profile,
advanced rewrite

这里主要说明高级查询重写的功能。
首先建立测试表格和数据：
create table t1(c1, c2, c3, c4)
as
select to_char(level), to_char(level), to_char(level), to_char(level)
from dual
connect by level <= 1000000
;

create index t1_n1 on t1(c1); -- c1 has index;

这时查看下列语句的执行计划，可以发现是走了全表扫描，
fy@ORCL> set autot traceonly explain
fy@ORCL> select * from t1 where c1 like '%11%';

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 | 1318K| 1257 (2)| 00:00:16 |
|* 1 | TABLE ACCESS FULL| T1 | 50000 | 1318K| 1257 (2)| 00:00:16 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("C1" LIKE '%11%')

将查询重写为：

fy@ORCL> select /*+ leading(x) use_nl(x t1) */ t1.* from (select /*+ index_ffs(t
1) */ rowid as row_id, c1 from t1 where c1 like '%11%') x,t1 where t1.rowid = x.
row_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 2525156207

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 | 2246K| 50704 (1)| 00:10:09 |
| 1 | NESTED LOOPS | | 50000 | 2246K| 50704 (1)| 00:10:09 |
|* 2 | INDEX FAST FULL SCAN | T1_N1 | 50000 | 927K| 688 (2)| 00:00:09 |
| 3 | TABLE ACCESS BY USER ROWID| T1 | 1 | 27 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C1" LIKE '%11%')

这时可以看出，这个查询走了索引，虽然效果并不好。

赋予相应的权限和修改会话属性：

grant execute on dbms_advanced_rewrite to fy;
grant create materialized view to fy;
alter session set query_rewrite_integrity = trusted;

建立高级查询重写：
begin
sys.dbms_advanced_rewrite.declare_rewrite_equivalence (
name => 'rewrite2',
source_stmt =>
'select * from t1 where c1 like ''%11%''',
destination_stmt =>
'select /*+ leading(x) use_nl(x t1) */ t1.* from (select /*+ index_ffs(t1) */ rowid as row_id, c1 from t1 where c1 like ''%11%'') x,t1 where t1.rowid = x.row_id',
validate => false,
rewrite_mode => 'text_match');
end;
/

然后再执行第一条语句：
fy@ORCL> select * from t1 where c1 like '%11%';

Execution Plan
----------------------------------------------------------
Plan hash value: 2525156207

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 | 1660K| 50704 (1)| 00:10:09 |
| 1 | NESTED LOOPS | | 50000 | 1660K| 50704 (1)| 00:10:09 |
|* 2 | INDEX FAST FULL SCAN | T1_N1 | 50000 | 341K| 688 (2)| 00:00:09 |
| 3 | TABLE ACCESS BY USER ROWID| T1 | 1 | 27 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("C1" LIKE '%11%')

最后，删除查询重写：
begin
sys.dbms_advanced_rewrite.drop_rewrite_equivalence( name=> 'rewrite2');
end;
/

不过我本来是想将查询重写成全文索引的，但是如果重写成全文索引，就会报错：
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 5280
Session ID: 170 Serial number: 8

不知道是为什么，准备问问OTN上的高人。

2009年2月24日星期二

关于增量备份的小故事

所谓增量备份，顾名思义即是每次备份仅操作那些发生了"变化"的数据块。RMAN中增量备份有两种：Differential方式和Cumulative方式。

说起Differential，相当有意思，大家可以这样理解。有一家名为Differential的红社会组织，他们民主自由善良博为人忠恳正直(以下省略5000个褒义形容词)，总之呢，黑黑，他们会按照你与其约定的周期来向你收取保护费，因为他们的组织非常严密，所有成员按照0,1,2分为不同等级，0级最高就是老大。贵为老大自然身份尊崇，手底下小弟多，开销也大，所以如果0级老大亲自登门收取的话，没啥说的，甭管它什么时候来，你的家底他都要重新清点一遍，从你成立开始到现在，总共应交多少保护费，一个子儿都不能少的都要交出来。每次来都是这样。而1级成员就显的温和多了，它每次来，只要求你将上次0级收到之后到现在应交的税款给交了就行了。甚至于如果上次也是个1级成员(与它平级)来收取的话，它也认同。当然，如果上次来收的是个2级成员，它是不承认的，歹它也是个有身份的人，比它低级的成员打的收条它向来是不认同地。它至少要求将最后一个与它平级或级别比它要高的成员收取日期到现在应结的给它。1级成员带了头，2级成员也按这个来。

名为Cumulative的红社会组织相比Differential差距就比较大，虽然它们也会按照与你约定的同期来收取，但是，这家组织显然作风是属于比较凶悍的。仍然以上图为例，假设它们也按照0,1,2分为三个等级，0级老大身份尊崇，表现倒与Differential家的相同。但级别比它低的那些小弟表现与Differential家的就相差较多。对于那些级别高于它们的成员打的收条，他们还是会认可，但是其它人，甚至与它们平级的成员它们都不认。哪怕上次就是它来收取的，他也能翻脸不认帐。比如某个1级成员昨天来时就直接从上次0级收取的时间开始算的，而今天来的又是这个家伙，可它对昨天的所为都拒不认帐，坚持还要从上次0级收取的时间开始算。

注意：这两家非0级成员都有个毛病，假如它们来收费时发现自你成立起，自家的0级老大从来都没来过，本着为老大尽心为老大尽责的高贵品格，他们都会替老大把你的家底翻个底朝天，来个大清算。

今天群里一个哥们发的，挺逗的。

2009年2月23日星期一

关于JOB的一点问题

今天有人在群里问：

有个JOB，定时执行，每5秒执行一次。现在如果有一次它执行时间超过5秒，还没执行完毕，会怎么样？

关于这个问题，我的解释是：
排队执行，等着。
当第n次执行到第5秒时，如果新加JOB，数据库会另起一个进程或者线程来执行第N+1个JOB，如果碰上N+2...则依次类推，一直到达到数据库参数里设定的JOB进程数的最大值。

设置初始化参数 job_queue_processes可以调整JOB进程数的最大值（最大只能为1000）。
sql> alter system set job_queue_processes=n;

由于ORACLE采取了这样的方式处理JOB，增加了并发能力。但是针对实际情况还要考虑到MVCC带来的问题。

2009年2月12日星期四

MapReduce: 一个巨大的倒退

这是我从postgresql的wiki转载过来的，原文请看这里

前言

databasecolumn 的数据库大牛们（其中包括PostgreSQL的最初伯克利领导：Michael Stonebraker）最近写了一篇评论当前如日中天的MapReduce 技术的文章，引发剧烈的讨论。我抽空在这儿翻译一些，一起学习。

译者注：这种 Tanenbaum vs. Linus 式的讨论自然会导致非常热烈的争辩。但是老实说，从 Tanenbaum vs. Linus 的辩论历史发展来看，Linux是越来越多地学习并以不同方式应用了 Tanenbaum 等 OS 研究者的经验（而不是背弃）; 所以 MapReduce vs. DBMS 的讨论，希望也能给予后来者更多的启迪，而不是对立。

原文见：
http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html

MapReduce: A major step backwards/MapReduce: 一个巨大的倒退

注：作者是 David J. DeWitt 和 Michael Stonebraker

On January 8, a Database Column reader asked for our views on new distributed database research efforts, and we'll begin here with our views on MapReduce. This is a good time to discuss it, since the recent trade press has been filled with news of the revolution of so-called "cloud computing." This paradigm entails harnessing large numbers of (low-end) processors working in parallel to solve a computing problem. In effect, this suggests constructing a data center by lining up a large number of "jelly beans" rather than utilizing a much smaller number of high-end servers.

1月8日，一位Database Column的读者询问我们对各种新的分布式数据库研究工作有何看法，我们就从MapReduce谈起吧。现在讨论MapReduce恰逢其时，因为最近商业媒体充斥着所谓“云计算（cloud computing）”革命的新闻。这种计算方式通过大量（低端的）并行工作的处理器来解决计算问题。实际上，就是用大量便宜货（原文是jelly beans）代替数量小得多的高端服务器来构造数据中心。

For example, IBM and Google have announced plans to make a 1,000 processor cluster available to a few select universities to teach students how to program such clusters using a software tool called MapReduce [1]. Berkeley has gone so far as to plan on teaching their freshman how to program using the MapReduce framework.

例如，IBM和Google已经宣布，计划构建一个1000处理器的集群，开放给几个大学，教授学生使用一种名为MapReduce [1]的软件工具对这种集群编程。加州大学伯克利分校甚至计划教一年级新生如何使用MapReduce框架编程。

As both educators and researchers, we are amazed at the hype that the MapReduce proponents have spread about how it represents a paradigm shift in the development of scalable, data-intensive applications. MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is:

我们都既是教育者也是研究人员，MapReduce支持者们大肆宣传它代表了可伸缩、数据密集计算发展中的一次范型转移，对此我们非常惊讶。MapReduce就编写某些类型的通用计算程序而言，可能是个不错的想法，但是从数据库界看来，并非如此：

1. A giant step backward in the programming paradigm for large-scale data intensive applications
2. A sub-optimal implementation, in that it uses brute force instead of indexing
3. Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago
4. Missing most of the features that are routinely included in current DBMS
5. Incompatible with all of the tools DBMS users have come to depend on

1. 在大规模的数据密集应用的编程领域，它是一个巨大的倒退
2. 它是一个非最优的实现，使用了蛮力而非索引
3. 它一点也不新颖——代表了一种25年前已经开发得非常完善的技术
4. 它缺乏当前DBMS基本都拥有的大多数特性
5. 它和DBMS用户已经依赖的所有工具都不兼容

First, we will briefly discuss what MapReduce is; then we will go into more detail about our five reactions listed above.

首先，我们简要地讨论一下MapReduce是什么，然后更详细地阐述上面列出的5点看法。

What is MapReduce?/何谓MapReduce？

The basic idea of MapReduce is straightforward. It consists of two programs that the user writes called map and reduce plus a framework for executing a possibly large number of instances of each program on a compute cluster.

MapReduce的基本思想很直接。它包括用户写的两个程序:map和reduce，以及一个framework，在一个计算机簇中执行大量的每个程序的实例。

The map program reads a set of "records" from an input file, does any desired filtering and/or transformations, and then outputs a set of records of the form (key, data). As the map program produces output records, a "split" function partitions the records into M disjoint buckets by applying a function to the key of each output record. This split function is typically a hash function, though any deterministic function will suffice. When a bucket fills, it is written to disk. The map program terminates with M output files, one for each bucket.

map程序从输入文件中读取"records"的集合，执行任何需要的过滤或者转换，并且以(key,data)的形式输出 records的集合。当map程序产生输出记录，"split"函数对每一个输出的记录的key应用一个函数，将records分割为M个不连续的块 (buckets)。这个split函数有可能是一个hash函数，而其他确定的函数也是可用的。当一个块被写满后，将被写道磁盘上。然后map程序终止，输出M个文件，每一个代表一个块(bucket)。

In general, there are multiple instances of the map program running on different nodes of a compute cluster. Each map instance is given a distinct portion of the input file by the MapReduce scheduler to process. If N nodes participate in the map phase, then there are M files on disk storage at each of N nodes, for a total of N * M files; Fi,j, 1 ≤ i ≤ N, 1 ≤ j ≤ M.

通常情况下，map程序的多个实例持续运行在compute cluster的不同节点上。每一个map实例都被MapReduce scheduler分配了input file的不同部分，然后执行。如果有N个节点参与到map阶段，那么在这N个节点的磁盘储存都有M个文件，总共有N*M个文件。

The key thing to observe is that all map instances use the same hash function. Hence, all output records with the same hash value will be in corresponding output files.

值得注意的地方是，所有的map实例都使用同样的hash函数。因此，有相同hash值的所有output record会出被放到相应的输出文件中。

The second phase of a MapReduce job executes M instances of the reduce program, Rj, 1 ≤ j ≤ M. The input for each reduce instance Rj consists of the files Fi,j, 1 ≤ i ≤ N. Again notice that all output records from the map phase with the same hash value will be consumed by the same reduce instance -- no matter which map instance produced them. After being collected by the map-reduce framework, the input records to a reduce instance are grouped on their keys (by sorting or hashing) and feed to the reduce program. Like the map program, the reduce program is an arbitrary computation in a general-purpose language. Hence, it can do anything it wants with its records. For example, it might compute some additional function over other data fields in the record. Each reduce instance can write records to an output file, which forms part of the "answer" to a MapReduce computation.

MapReduce的第二个阶段执行M个reduce程序的实例， Rj, 1 <= j <= M. 每一个reduce实例的输入是Rj，包含文件Fi,j, 1<= i <= N. 注意，每一个来自map阶段的output record，含有相同的hash值的record将会被相同的reduce实例处理 -- 不论是哪一个map实例产生的数据。在map-reduce架构处理过后，input records将会被以他们的keys来分组(以排序或者哈希的方式)，到一个reduce实例然后给reduce程序处理。和map程序一样，reduce程序是任意计算言表示的。因此，它可以对它的records做任何想做事情。例如，可以添加一些额外的函数，来计算record的其他 data field。每一个reduce实例可以将records写到输出文件中，组成MapReduce计算的"answer"的一部分。

To draw an analogy to SQL, map is like the group-by clause of an aggregate query. Reduce is analogous to the aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute.

和SQL可以做对比的是，map程序和聚集查询中的 group-by 语句相似。Reduce函数和聚集函数(例如，average,求平均)相似，在所有的有相同group-by的属性的列上计算。

We now turn to the five concerns we have with this computing paradigm.

现在来谈一谈我们对这种计算方式的5点看法。

MapReduce is a step backwards in database access

As a data processing paradigm, MapReduce represents a giant step backwards. The database community has learned the following three lessons from the 40 years that have unfolded since IBM first released IMS in 1968.

* Schemas are good.
* Separation of the schema from the application is good.
* High-level access languages are good.

* Schemas是有益的。
* 将schema和程序分开处理是有益的。
* High-level存取语言是有益的。

MapReduce has learned none of these lessons and represents a throw back to the 1960s, before modern DBMSs were invented.

MapReduce没有学到任何一条，并且倒退回了60年代，倒退回了现代数据库管理系统发明以前的时代。

The DBMS community learned the importance of schemas, whereby the fields and their data types are recorded in storage. More importantly, the run-time system of the DBMS can ensure that input records obey this schema. This is the best way to keep an application from adding "garbage" to a data set. MapReduce has no such functionality, and there are no controls to keep garbage out of its data sets. A corrupted MapReduce dataset can actually silently break all the MapReduce applications that use that dataset.

DBMS社区懂得schemas的重要性，凭借fields和他们的数据类型记录在储存中。更重要的，运行状态的DBMS系统可以确定输入的记录都遵循这个schema。这是最佳的保护程序不会添加任何垃圾信息到数据集中。MapReduce没有任何这样的功能，没有任何控制数据集的预防垃圾数据机制。一个损坏的MapReduce数据集事实上可以无声无息的破坏所有使用这个数据集的MapReduce程序。

It is also crucial to separate the schema from the application program. If a programmer wants to write a new application against a data set, he or she must discover the record structure. In modern DBMSs, the schema is stored in a collection of system catalogs and can be queried (in SQL) by any user to uncover such structure. In contrast, when the schema does not exist or is buried in an application program, the programmer must discover the structure by an examination of the code. Not only is this a very tedious exercise, but also the programmer must find the source code for the application. This latter tedium is forced onto every MapReduce programmer, since there are no system catalogs recording the structure of records -- if any such structure exists.

将schema和程序分开也非常重要。如果一个程序员想要对一个数据集写一个新程序，他必须知道数据集的结构(record structure)。现代DBMS系统中，shcema储存在系统目录中，并且可以被任意用户查询(使用SQL)它的结构。相反的，如果schema不存在或者存在于程序中，程序员必须检查程序的代码来获得数据的结构。这不仅是一个单调枯燥的尝试，而且程序员必须能够找到先前程序的source code。每一个MapReduce程序员都必须承受后者的乏味，因为没有系统目录用来储存records的结构 -- 就算这些结构存在。

During the 1970s the DBMS community engaged in a "great debate" between the relational advocates and the Codasyl advocates. One of the key issues was whether a DBMS access program should be written:

* By stating what you want - rather than presenting an algorithm for how to get it (relational view)
* By presenting an algorithm for data access (Codasyl view)

70年代DBMS社区，在关系型数据库支持者和Codasys型数据库支持者之间发有一次"大讨论"。一个重点议题就是是否DBMS存取程序应该写入:

* 直接开始你想要的 -- 而不是展示一个算法，解释如何工作的。 (关系型数据库的观点)
* 展示数据存取的算法。(Codasyl 的观点)

The result is now ancient history, but the entire world saw the value of high-level languages and relational systems prevailed. Programs in high-level languages are easier to write, easier to modify, and easier for a new person to understand. Codasyl was rightly criticized for being "the assembly language of DBMS access." A MapReduce programmer is analogous to a Codasyl programmer -- he or she is writing in a low-level language performing low-level record manipulation. Nobody advocates returning to assembly language; similarly nobody should be forced to program in MapReduce.

MapReduce advocates might counter this argument by claiming that the datasets they are targeting have no schema. We dismiss this assertion. In extracting a key from the input data set, the map function is relying on the existence of at least one data field in each input record. The same holds for a reduce function that computes some value from the records it receives to process.

Writing MapReduce applications on top of Google's BigTable (or Hadoop's HBase) does not really change the situation significantly. By using a self-describing tuple format (row key, column name, {values}) different tuples within the same table can actually have different schemas. In addition, BigTable and HBase do not provide logical independence, for example with a view mechanism. Views significantly simplify keeping applications running when the logical schema changes.

MapReduce is a poor implementation

2. MapReduce是一个糟糕的实现

All modern DBMSs use hash or B-tree indexes to accelerate access to data. If one is looking for a subset of the records (e.g., those employees with a salary of 10,000 or those in the shoe department), then one can often use an index to advantage to cut down the scope of the search by one to two orders of magnitude. In addition, there is a query optimizer to decide whether to use an index or perform a brute-force sequential search.

所有现代DBMS都使用散列或者B树索引加速数据存取。如果要寻找记录的某个子集（比如薪水为10000的雇员或者鞋部的雇员），经常可以使用索引有效地将搜索范围缩小一到两个数量级。而且，还有查询优化器来确定是使用索引还是执行蛮力顺序搜索。

MapReduce has no indexes and therefore has only brute force as a processing option. It will be creamed whenever an index is the better access mechanism.

MapReduce没有索引，因此处理时只有蛮力一种选择。在索引是更好的存取机制时，MapReduce将劣势尽显。

One could argue that value of MapReduce is automatically providing parallel execution on a grid of computers. This feature was explored by the DBMS research community in the 1980s, and multiple prototypes were built including Gamma [2,3], Bubba [4], and Grace [5]. Commercialization of these ideas occurred in the late 1980s with systems such as Teradata.

有人可能会说，MapReduce的价值在于在计算机网格上自动地提供并行执行。这种特性数据库研究界在上世纪80年代就已经探讨过了，而且构建了许多原型，包括 Gamma [2,3], Bubba [4], 和 Grace [5]。而Teradata这样的系统早在80年代晚期，就将这些想法商业化了。

In summary to this first point, there have been high-performance, commercial, grid-oriented SQL engines (with schemas and indexing) for the past 20 years. MapReduce does not fare well when compared with such systems.

There are also some lower-level implementation issues with MapReduce, specifically skew and data interchange.

One factor that MapReduce advocates seem to have overlooked is the issue of skew. As described in "Parallel Database System: The Future of High Performance Database Systems," [6] skew is a huge impediment to achieving successful scale-up in parallel query systems. The problem occurs in the map phase when there is wide variance in the distribution of records with the same key. This variance, in turn, causes some reduce instances to take much longer to run than others, resulting in the execution time for the computation being the running time of the slowest reduce instance. The parallel database community has studied this problem extensively and has developed solutions that the MapReduce community might want to adopt.

There is a second serious performance problem that gets glossed over by the MapReduce proponents. Recall that each of the N map instances produces M output files -- each destined for a different reduce instance. These files are written to a disk local to the computer used to run the map instance. If N is 1,000 and M is 500, the map phase produces 500,000 local files. When the reduce phase starts, each of the 500 reduce instances needs to read its 1,000 input files and must use a protocol like FTP to "pull" each of its input files from the nodes on which the map instances were run. With 100s of reduce instances running simultaneously, it is inevitable that two or more reduce instances will attempt to read their input files from the same map node simultaneously -- inducing large numbers of disk seeks and slowing the effective disk transfer rate by more than a factor of 20. This is why parallel database systems do not materialize their split files and use push (to sockets) instead of pull. Since much of the excellent fault-tolerance that MapReduce obtains depends on materializing its split files, it is not clear whether the MapReduce framework could be successfully modified to use the push paradigm instead.

Given the experimental evaluations to date, we have serious doubts about how well MapReduce applications can scale. Moreover, the MapReduce implementers would do well to study the last 25 years of parallel DBMS research literature.

MapReduce is not novel

The MapReduce community seems to feel that they have discovered an entirely new paradigm for processing large data sets. In actuality, the techniques employed by MapReduce are more than 20 years old. The idea of partitioning a large data set into smaller partitions was first proposed in "Application of Hash to Data Base Machine and Its Architecture" [11] as the basis for a new type of join algorithm. In "Multiprocessor Hash-Based Join Algorithms," [7], Gerber demonstrated how Kitsuregawa's techniques could be extended to execute joins in parallel on a shared-nothing [8] cluster using a combination of partitioned tables, partitioned execution, and hash based splitting. DeWitt [2] showed how these techniques could be adopted to execute aggregates with and without group by clauses in parallel. DeWitt and Gray [6] described parallel database systems and how they process queries. Shatdal and Naughton [9] explored alternative strategies for executing aggregates in parallel.

Teradata has been selling a commercial DBMS utilizing all of these techniques for more than 20 years; exactly the techniques that the MapReduce crowd claims to have invented.

While MapReduce advocates will undoubtedly assert that being able to write MapReduce functions is what differentiates their software from a parallel SQL implementation, we would remind them that POSTGRES supported user-defined functions and user-defined aggregates in the mid 1980s. Essentially, all modern database systems have provided such functionality for quite a while, starting with the Illustra engine around 1995.

MapReduce is missing features

All of the following features are routinely provided by modern DBMSs, and all are missing from MapReduce:

* Bulk loader -- to transform input data in files into a desired format and load it into a DBMS
* Indexing -- as noted above
* Updates -- to change the data in the data base
* Transactions -- to support parallel update and recovery from failures during update
* Integrity constraints -- to help keep garbage out of the data base
* Referential integrity -- again, to help keep garbage out of the data base
* Views -- so the schema can change without having to rewrite the application program

In summary, MapReduce provides only a sliver of the functionality found in modern DBMSs.

MapReduce is incompatible with the DBMS tools

A modern SQL DBMS has available all of the following classes of tools:

* Report writers (e.g., Crystal reports) to prepare reports for human visualization
* Business intelligence tools (e.g., Business Objects or Cognos) to enable ad-hoc querying of large data warehouses
* Data mining tools (e.g., Oracle Data Mining or IBM DB2 Intelligent Miner) to allow a user to discover structure in large data sets
* Replication tools (e.g., Golden Gate) to allow a user to replicate data from on DBMS to another
* Database design tools (e.g., Embarcadero) to assist the user in constructing a data base.

MapReduce cannot use these tools and has none of its own. Until it becomes SQL-compatible or until someone writes all of these tools, MapReduce will remain very difficult to use in an end-to-end task.

In Summary

It is exciting to see a much larger community engaged in the design and implementation of scalable query processing techniques. We, however, assert that they should not overlook the lessons of more than 40 years of database technology -- in particular the many advantages that a data model, physical and logical data independence, and a declarative query language, such as SQL, bring to the design, implementation, and maintenance of application programs. Moreover, computer science communities tend to be insular and do not read the literature of other communities. We would encourage the wider community to examine the parallel DBMS literature of the last 25 years. Last, before MapReduce can measure up to modern DBMSs, there is a large collection of unmet features and required tools that must be added.

看到规模大得多的社区加入可伸缩的查询处理技术的设计与实现，非常令人兴奋。但是，我们要强调，他们不应该忽视数据库技术40多年来的教训，尤其是数据库技术中数据模型、物理和逻辑数据独立性、像SQL这样的声明性查询语言等等，可以为应用程序的设计、实现和维护带来的诸多好处。而且，计算机科学界往往喜欢自行其是，不理会其他学科的文献。我们希望更多人来一起研究过去25年的并行DBMS文献。MapReduce要达到能够与现代 DBMS相提并论的水平，还需要开发大量特性和工具。

We fully understand that database systems are not without their problems. The database community recognizes that database systems are too "hard" to use and is working to solve this problem. The database community can also learn something valuable from the excellent fault-tolerance that MapReduce provides its applications. Finally we note that some database researchers are beginning to explore using the MapReduce framework as the basis for building scalable database systems. The Pig[10] project at Yahoo! Research is one such effort.

我们完全理解数据库系统也有自己的问题。数据库界清楚地认识到，现在数据库系统还太“难”使用，而且正在解决这一问题。数据库界也从 MapReduce为其应用程序提供的出色的容错上学到了有价值的东西。最后，我们注意到，一些数据库研究人员也开始研究使用MapReduce框架作为构建可伸缩数据库系统的基础。雅虎研究院的Pig[10]项目就是其中之一。

References

[1] "MapReduce: Simplified Data Processing on Large Clusters," Jeff Dean and Sanjay Ghemawat, Proceedings of the 2004 OSDI Conference, 2004.

[2] "The Gamma Database Machine Project," DeWitt, et. al., IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990.

[4] "Gamma - A High Performance Dataflow Database Machine," DeWitt, D, R. Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna, Proceedings of the 1986 VLDB Conference, 1986.

[5] "Prototyping Bubba, A Highly Parallel Database System," Boral, et. al., IEEE Transactions on Knowledge and Data Engineering,Vol. 2, No. 1, March 1990.

[6] "Parallel Database System: The Future of High Performance Database Systems," David J. DeWitt and Jim Gray, CACM, Vol. 35, No. 6, June 1992.

[7] "Multiprocessor Hash-Based Join Algorithms," David J. DeWitt and Robert H. Gerber, Proceedings of the 1985 VLDB Conference, 1985.

[8] "The Case for Shared-Nothing," Michael Stonebraker, Data Engineering Bulletin, Vol. 9, No. 1, 1986.

[9] "Adaptive Parallel Aggregation Algorithms," Ambuj Shatdal and Jeffrey F. Naughton, Proceedings of the 1995 SIGMOD Conference, 1995.

[10] "Pig", Chris Olston, http://research.yahoo.com/project/90

[11] "Application of Hash to Data Base Machine and Its Architecture," Masaru Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka, New Generation Comput. 1(1): 63-74 (1983)

2009年2月10日星期二

Oracle跨版本导出EXP-00003错误的解决

作者：eygle 【转载时请以超链接形式标明文章出处和作者信息及本声明】
链接：http://www.eygle.com/archives/2009/02/oracle_exp_00003.html

当Oracle数据库中，使用不同版本的exp工具进行数据库导出时，有时候会遇到类似如下错误：

EXP-00003: no storage definition found for segment(11, 307)
EXP-00003: no storage definition found for segment(11, 523)
EXP-00003: no storage definition found for segment(11, 643)
EXP-00003: no storage definition found for segment(11, 275)

"EXP-00003: no storage definition found for segment ....."错误，和Oracle的一个小Bug相关，可以通过修改一个导出相关的View来解决。
在目标数据库使用sys用户，执行如下SQL创建新的view：

CREATE OR REPLACE VIEW exu9tne (
tsno, fileno, blockno, length) AS
SELECT ts#, segfile#, segblock#, length
FROM sys.uet$
WHERE ext# = 1
UNION ALL
SELECT * FROM SYS.EXU9TNEB
/

然后就可以正常进行导出操作，exp完成后, 最好将视图还原(Metalink建议)：

CREATE OR REPLACE VIEW exu9tne (
tsno, fileno, blockno, length) AS
SELECT ts#, segfile#, segblock#, length
FROM sys.uet$
WHERE ext# = 1
/

EXU9TNE视图初始由 $ORACLE_HOME/rdbms/admin/catexp.sql 脚本创建。

-The End-

这个问题以前我也碰见过，只是没解决。感谢eygle 。

2009年1月13日星期二

ORACLE11g的几个新特性（针对开发者）

一 pivot操作符
这个东西应该是从sql server里学过来的，不过话说回来，这个功能的确很好用。说白了，就是做行转列。举个例子说明吧。要运行以下的例子，必须要有scott账户下的emp和dept两个表，如果没有的话，可以运行以下语句：
create table emp as select * from scott.emp;
create table dept as select * from scott.dept;

通常来说我们做行转列都是使用以下做法（在做报表的时候）：
select deptno,
sum( decode( job, 'CLERK', sal ) ) clerk,
sum( decode( job, 'SALESMAN', sal ) ) salesman,
sum( decode( job, 'MANAGER', sal ) ) manager,
sum( decode( job, 'ANALYST', sal ) ) analyst,
sum( decode( job, 'PRESIDENT', sal ) ) president
from scott.emp
group by deptno
order by deptno；

这个查询得到的结果如下：
DEPTNO CLERK SALESMAN MANAGER ANALYST PRESIDENT
---------- ---------- ---------- ---------- ---------- ----------
10 1300 2450 5000
20 1900 2975 6000
30 950 5600 2850

而在11g中，我们可以这么写查询：
select *
from (select deptno, job, sal
from scott.emp ) e
pivot( sum(sal) for job in
( 'CLERK', 'SALESMAN', 'MANAGER',
'ANALYST', 'PRESIDENT' ) )
order by deptno；

更好的是，我们现在可以不用指定列表：
select *
from (select deptno, job, sal
from scott.emp ) e
pivot( sum(sal) for job in
(select distinct JOB from emp))
order by deptno；

二 sequence
在11g以前，在PLSQL里使用sequence的值必须用使用这种方法：
select t_seq.nextval() into var from dual;
而11g里可以直接赋值：
var:=t_seq.nextval();

三 log_error脚本
这是ORACLE提供的一个记录错误的日志脚本，特别适合在PLSQL编程的时候碰见WHEN OTHERS THEN。

CREATE OR REPLACE FUNCTION getosuser RETURN VARCHAR2 IS
vOSUser user_users.username%TYPE;
BEGIN
SELECT osuser
INTO vOSUser
FROM sys.v_$session
WHERE sid = (
SELECT sid
FROM sys.v_$mystat
WHERE rownum = 1);

RETURN vOSUser;
EXCEPTION
WHEN OTHERS THEN
RETURN 'UNK';
END getosuser;

CREATE TABLE errorlog (
procname VARCHAR2(61),
loadfilename VARCHAR2(40),
runtime DATE DEFAULT SYSDATE,
osuser VARCHAR2(30),
mesgtext VARCHAR2(250));

CREATE OR REPLACE PROCEDURE log_error (
pProcName VARCHAR2,
pLoadFile VARCHAR2,
pMesgText VARCHAR2)
IS

PRAGMA AUTONOMOUS_TRANSACTION;

BEGIN
INSERT INTO errorlog
(procname, loadfilename, osuser, mesgtext)
VALUES
(pProcName, pLoadFile, getOSUser, pMesgText);
COMMIT;

-- No exception handler intentionally. Why?

END log_error;
/
当然这个脚本也是自治事务的一个好例子。

以下是ORACLE提供的另外一个处理异常的脚本：
CREATE TABLE error_log (
error_timestamp TIMESTAMP(9),
database_name VARCHAR(50),
instance_number NUMBER,
error_number NUMBER,
error_message VARCHAR2(255),
logged_on_as VARCHAR2(30),
client_host VARCHAR2(50),
service_name VARCHAR2(30));

CREATE OR REPLACE PROCEDURE error_trap IS
odbname VARCHAR2(50); -- Oracle database name
oinst NUMBER; -- Oracle instance number
enum NUMBER; -- Error Message number
emsg VARCHAR2(250); -- Error text
curschema VARCHAR2(30);
clihost VARCHAR2(50);
serv_name VARCHAR2(30);

-- PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
enum := sqlcode;
emsg := sqlerrm;

odbname := ora_database_name;
oinst := ora_instance_num;

SELECT sys_context('USERENV', 'CURRENT_SCHEMA')
INTO curschema
FROM dual;

SELECT sys_context('USERENV', 'HOST')
INTO clihost
FROM dual;

SELECT sys_context('USERENV', 'SERVICE_NAME')
INTO serv_name
FROM dual;

INSERT INTO error_log
(error_timestamp, database_name, instance_number,
error_number, error_message, logged_on_as,
client_host, service_name)
VALUES
(SYSTIMESTAMP, odbname, oinst, enum, emsg,
curschema, clihost, serv_name);
COMMIT;
END error_trap;
/

CREATE OR REPLACE TRIGGER error_trig
AFTER SERVERERROR ON DATABASE
CALL error_trap
/

BEGIN
RAISE zero_divide;
END;
/

CREATE TABLE errors (
module VARCHAR2(50),
seq_number NUMBER,
error_number NUMBER,
error_mesg VARCHAR2(100),
error_stack VARCHAR2(2000),
call_stack VARCHAR2(2000),
timestamp DATE);

ALTER TABLE errors
ADD CONSTRAINT pk_errors
PRIMARY KEY (module, seq_number)
USING INDEX
TABLESPACE indx_sml;

CREATE TABLE call_stacks (
module VARCHAR2(50),
seq_number NUMBER,
call_order NUMBER,
object_handle VARCHAR2(10),
line_num NUMBER,
object_name VARCHAR2(80));

ALTER TABLE call_stacks
ADD CONSTRAINT pk_call_stacks
PRIMARY KEY (module, seq_number, call_order)
USING INDEX
TABLESPACE indx_sml;

ALTER TABLE call_stacks
ADD CONSTRAINT fk_cs_errors
FOREIGN KEY (module, seq_number)
REFERENCES errors (module, seq_number)
ON DELETE CASCADE;

CREATE TABLE error_stacks (
module VARCHAR2(50),
seq_number NUMBER,
error_order NUMBER,
facility CHAR(3),
error_number NUMBER(5),
error_mesg VARCHAR2(100));

ALTER TABLE error_stacks
ADD CONSTRAINT pk_error_stacks
PRIMARY KEY (module, seq_number, error_order)
USING INDEX
TABLESPACE indx_sml;

ALTER TABLE error_stacks
ADD CONSTRAINT fk_es_errors
FOREIGN KEY (module, seq_number)
REFERENCES errors (module, seq_number)
ON DELETE CASCADE;

CREATE SEQUENCE error_seq
START WITH 1
INCREMENT BY 1;

CREATE OR REPLACE PACKAGE ErrorPkg AS

/* Generic error handling package, using DBMS_UTILITY.FORMAT_ERROR_STACK and DBMS_UTILITY.FORMAT_CALL_STACK. This package stores general error information in the errors table, with detailed call stack and error stack information in the call_stacks and error_stacks tables, respectively.

Entry point for handling errors. HandleAll should be called from all exception handlers where you want the error to be logged. p_Top should be TRUE only at the topmost level of procedure nesting. It should be FALSE at other levels.
*/

PROCEDURE HandleAll(p_Top BOOLEAN);

/*
Prints the error and call stacks (using DBMS_OUTPUT) for the given module and sequence number.
*/

PROCEDURE PrintStacks(p_Module IN errors.module%TYPE,
p_SeqNum IN errors.seq_number%TYPE);

/*
Unwinds the call and error stacks, and stores them in the errors and call stacks tables. Returns the sequence number under which the error is stored. If p_CommitFlag is TRUE,
then the inserts are committed. In order to use StoreStacks, an error must have been handled. Thus HandleAll should have been called with p_Top = TRUE.
*/

PROCEDURE StoreStacks(p_Module IN errors.module%TYPE,
p_SeqNum OUT errors.seq_number%TYPE,
p_CommitFlag BOOLEAN DEFAULT FALSE);

END ErrorPkg;
/

CREATE OR REPLACE PACKAGE BODY ErrorPkg IS

v_NewLine CONSTANT CHAR(1) := CHR(10);

v_Handled BOOLEAN := FALSE;
v_ErrorStack VARCHAR2(2000);
v_CallStack VARCHAR2(2000);

PROCEDURE HandleAll(p_Top BOOLEAN) IS

BEGIN
IF p_Top THEN
v_Handled := FALSE;
ELSIF NOT v_Handled THEN
v_Handled := TRUE;
v_ErrorStack := DBMS_UTILITY.FORMAT_ERROR_STACK;
v_CallStack := DBMS_UTILITY.FORMAT_CALL_STACK;
END IF;
END HandleAll;
--===================================================
PROCEDURE PrintStacks(
p_Module IN errors.module%TYPE,
p_SeqNum IN errors.seq_number%TYPE)
IS

v_TimeStamp errors.timestamp%TYPE;
v_ErrorMsg errors.error_mesg%TYPE;

CURSOR c_CallCur IS
SELECT object_handle, line_num, object_name
FROM call_stacks
WHERE module = p_Module
AND seq_number = p_SeqNum
ORDER BY call_order;

CURSOR c_ErrorCur IS
SELECT facility, error_number, error_mesg
FROM error_stacks
WHERE module = p_Module
AND seq_number = p_SeqNum
ORDER BY error_order;

BEGIN
SELECT timestamp, error_mesg
INTO v_TimeStamp, v_ErrorMsg
FROM errors
WHERE module = p_Module
AND seq_number = p_SeqNum;

-- Output general error information.
dbms_output.put_line(TO_CHAR(v_TimeStamp, 'DD-MON-YY HH24:MI:SS'));
dbms_output.put(' Module: ' || p_Module);
dbms_output.put(' Error #' || p_SeqNum || ': ');
dbms_output.put_line(v_ErrorMsg);

-- Output the call stack.
dbms_output.put('Complete Call Stack:');
dbms_output.put(' Object Handle Line Number Object Name');
dbms_output.put_line(' ------------- ----------- -----------');

FOR v_CallRec in c_CallCur
LOOP
dbms_output.put(RPAD(' ' || v_CallRec.object_handle, 15));
dbms_output.put(RPAD(' ' || TO_CHAR(v_CallRec.line_num), 13));
dbms_output.put_line(' ' || v_CallRec.object_name);
END LOOP;

-- Output the error stack.
dbms_output.put_line('Complete Error Stack:');

FOR v_ErrorRec in c_ErrorCur
LOOP
dbms_output.put(' ' || v_ErrorRec.facility || '-');
dbms_output.put(TO_CHAR(v_ErrorRec.error_number) || ': ');
dbms_output.put_line(v_ErrorRec.error_mesg);
END LOOP;
END PrintStacks;
--===================================================
PROCEDURE StoreStacks(p_Module IN errors.module%TYPE,
p_SeqNum OUT errors.seq_number%TYPE,
p_CommitFlag BOOLEAN DEFAULT FALSE)
IS

v_SeqNum NUMBER;
v_Index NUMBER;
v_Length NUMBER;
v_End NUMBER;
v_Call VARCHAR2(100);
v_CallOrder NUMBER := 1;
v_Error VARCHAR2(120);
v_ErrorOrder NUMBER := 1;

v_Handle call_stacks.object_handle%TYPE;
v_LineNum call_stacks.line_num%TYPE;
v_ObjectName call_stacks.object_name%TYPE;

v_Facility error_stacks.facility%TYPE;
v_ErrNum error_stacks.error_number%TYPE;
v_ErrMsg error_stacks.error_mesg%TYPE;

v_FirstErrNum errors.error_number%TYPE;
v_FirstErrMsg errors.error_mesg%TYPE;

BEGIN
-- Get the error sequence number.
SELECT error_seq.nextval
INTO v_SeqNum
FROM dual;

p_SeqNum := v_SeqNum;

-- Insert first part of header info. into the errors table
INSERT INTO errors
(module, seq_number, error_stack, call_stack, timestamp)
VALUES
(p_Module, v_SeqNum, v_ErrorStack, v_CallStack, SYSDATE);

/*
Unwind the error stack to get each error out by scanning the
error stack string. Start with the index at the beginning of
the string
*;
v_Index := 1;

/*
Loop through the string, finding each newline
A newline ends each error on the stack
*/
WHILE v_Index < LENGTH(v_ErrorStack) LOOP
-- v_End is the position of the newline.
v_End := INSTR(v_ErrorStack, v_NewLine, v_Index);

-- The error is between the current index and the newline
v_Error := SUBSTR(v_ErrorStack, v_Index, v_End - v_Index);

-- Skip over the current error, for the next iteration
v_Index := v_Index + LENGTH(v_Error) + 1;

/* An error looks like 'facility-number: mesg'. Get each
piece out for insertion. The facility is the first 3
characters of the error.
*/

v_Facility := SUBSTR(v_Error, 1, 3);

-- Remove the facility and the dash (always 4 characters)
v_Error := SUBSTR(v_Error, 5);

-- Next get the error number
v_ErrNum := TO_NUMBER(SUBSTR(v_Error, 1, INSTR(v_Error,
':') - 1));

-- Remove the error number, colon & space (always 7 chars)
v_Error := SUBSTR(v_Error, 8);

-- What's left is the error message
v_ErrMsg := v_Error;

/*
Insert the errors, and grab the first error number and
message while we're at it
*/

INSERT INTO error_stacks
(module, seq_number, error_order, facility, error_number,
error_mesg)
VALUES
(p_Module, p_SeqNum, v_ErrorOrder, v_Facility, v_ErrNum,
v_ErrMsg);

IF v_ErrorOrder = 1 THEN
v_FirstErrNum := v_ErrNum;
v_FirstErrMsg := v_Facility || '-' || TO_NUMBER(v_ErrNum)
|| ': ' || v_ErrMsg;
END IF;

v_ErrorOrder := v_ErrorOrder + 1;
END LOOP;

-- Update the errors table with the message and code
UPDATE errors
SET error_number = v_FirstErrNum,
error_mesg = v_FirstErrMsg
WHERE module = p_Module
AND seq_number = v_SeqNum;

/*
Unwind the call stack to get each call out by scanning the
call stack string. Start with the index after the first call
on the stack. This will be after the first occurrence of
'name' and the newline.
*/
v_Index := INSTR(v_CallStack, 'name') + 5;

/* Loop through the string, finding each newline. A newline
ends each call on the stack.
*/
WHILE v_Index < LENGTH(v_CallStack) LOOP
-- v_End is the position of the newline
v_End := INSTR(v_CallStack, v_NewLine, v_Index);

-- The call is between the current index and the newline
v_Call := SUBSTR(v_CallStack, v_Index, v_End - v_Index);

-- Skip over the current call, for the next iteration
v_Index := v_Index + LENGTH(v_Call) + 1;

/*
Within a call, we have the object handle, then the line
number, then the object name, separated by spaces. Separate
them out for insertion.

-- Trim white space from the call first.
*/
v_Call := TRIM(v_Call);

-- First get the object handle
v_Handle := SUBSTR(v_Call, 1, INSTR(v_Call, ' '));

-- Remove the object handle,then the white space
v_Call := SUBSTR(v_Call, LENGTH(v_Handle) + 1);
v_Call := TRIM(v_Call);

-- Get the line number
v_LineNum := TO_NUMBER(SUBSTR(v_Call,1,INSTR(v_Call,' ')));

-- Remove the line number, and white space
v_Call := SUBSTR(v_Call, LENGTH(v_LineNum) + 1);
v_Call := TRIM(v_Call);

-- What is left is the object name
v_ObjectName := v_Call;

-- Insert all calls except the call for ErrorPkg
IF v_CallOrder > 1 THEN
INSERT INTO call_stacks
(module, seq_number, call_order, object_handle, line_num,
object_name)
VALUES
(p_Module, v_SeqNum, v_CallOrder, v_Handle, v_LineNum,
v_ObjectName);
END IF;

v_Callorder := v_CallOrder + 1;
END LOOP;

IF p_CommitFlag THEN
COMMIT;
END IF;
END StoreStacks;

END ErrorPkg;
/

四游标转化
DBMS_SQL.TO_CURSOR_NUMBER
DBMS_SQL.TO_REFCURSOR
这两个函数可以进行游标转换。
declare
l_rcursor sys_refcursor;
l_colCnt number;
l_descTbl dbms_sql.desc_tab;
begin
open l_rcursor for select * from all_users;

dbms_sql.describe_columns
( c => dbms_sql.to_cursor_number(l_rcursor),
col_cnt => l_colCnt,
desc_t => l_descTbl );

for i in 1 .. l_colCnt
loop
dbms_output.put_line( l_descTbl(i).col_name );
end loop;
end;
/

declare
l_cursor number := dbms_sql.open_cursor;
l_refcursor sys_refcursor;
l_data varchar2(30);
begin
dbms_sql.parse
( l_cursor,
'select username from all_users where rownum <= 5',
dbms_sql.native );
if ( dbms_sql.execute( l_cursor ) != 0 )
then
raise program_error;
end if;
l_refcursor := dbms_sql.to_refcursor(l_cursor);
loop
fetch l_refcursor into l_data;
exit when l_refcursor%notfound;
dbms_output.put_line( l_data );
end loop;
close l_refcursor;
end;

五本地编译
本地编译比普通编译更慢而且不能调试，但运行速度更快。
alter procedure p compile
PLSQL_Warnings = 'enable:all'
PLSQL_Optimize_Level = 2
PLSQL_CCFlags = 'Simple:true'
PLSQL_Code_Type = interpreted
reuse settings
/

alter procedure p compile
PLSQL_Warnings = 'enable:all'
PLSQL_Optimize_Level = 2
PLSQL_CCFlags = 'Simple:false'
PLSQL_Code_Type = native
reuse settings
/

六复合触发器：
create trigger t for update on t compound trigger
g_data varchar2(200) := 'begin state';

before statement is
begin
dbms_output.put_line( g_data );
dbms_output.put_line( 'before statement...' ); g_data := 'before statement, ';
end before statement;

before each row is
begin
dbms_output.put_line( 'before each row...' ); g_data := g_data || 'before each row, ';
end before each row;

after each row is
begin
dbms_output.put_line( 'after each row...' ); g_data := g_data || 'after each row, ';
end after each row;

after statement is
begin
dbms_output.put_line( 'after statement...' ); g_data := g_data || 'after statement';
dbms_output.put_line( g_data );
end after statement;
end;

七 Named and Mixed Notation in SQL
create or replace function f
( x in number default 1,
y in number default 2,
z in number default 3 )
return number
as
begin
return x+y+z;
end;
/
pause

clear screen
select f(4,5,6) from dual;
select f(10, z=>200) from dual;
select f(z=>10) from dual;
select f(y=>20,z=>10) from dual;

八闪回数据归档
这个功能很强，如果有足够的归档闪回数据，你甚至可以闪回十年前的数据。

使用show_space工具

这个工具也是大叔写的，用来查看表或者索引所占的空间还是很方便的，只需要登录到相应的用户下，执行脚本就可以了。用法应该不用我说了，看看脚本内容就知道了。

create or replace procedure show_space(
p_segname in varchar2,
p_owner in varchar2 default user,
p_type in varchar2 default 'TABLE')
as
l_free_blks number;

l_total_blocks number;
l_total_bytes number;
l_unused_blocks number;
l_unused_bytes number;
l_lastusedextfileid number;
l_lastusedextblockid number;
l_last_used_block number;

begin
dbms_space.free_blocks(
segment_owner => p_owner,
segment_name => p_segname,
segment_type => p_type,
freelist_group_id => 0,
free_blks => l_free_blks );

dbms_space.unused_space(
segment_owner => p_owner,
segment_name => p_segname,
segment_type => p_type,
total_blocks => l_total_blocks,
total_bytes => l_total_bytes,
unused_blocks => l_unused_blocks,
unused_bytes => l_unused_bytes,
last_used_extent_file_id => l_lastusedextfileid,
last_used_extent_block_id => l_lastusedextblockid,
last_used_block => l_last_used_block );

dbms_output.put_line( 'Free Blocks');
dbms_output.put_line( l_free_blks );
dbms_output.put_line( 'Total Blocks');
dbms_output.put_line( l_total_blocks );
dbms_output.put_line( 'Total Bytes');
dbms_output.put_line( l_total_bytes );
dbms_output.put_line( 'Unused Blocks');
dbms_output.put_line( l_unused_blocks );
dbms_output.put_line( 'Unused Bytes');
dbms_output.put_line( l_unused_bytes );
dbms_output.put_line( 'Last Used Ext FileId');
dbms_output.put_line( l_lastusedextfileid );
dbms_output.put_line( 'Last Used Ext BlockId');
dbms_output.put_line( l_lastusedextblockid );
dbms_output.put_line( 'Last Used Block');
dbms_output.put_line( l_last_used_block );
end;
/
grant execute on show_space to public;

使用TOM的run_stats工具

使用TOM大叔的run_stats工具已经好长时间了，感觉这个工具在进行数据库中几种操作方法比较的时候还是挺有用的，所以介绍一下这个工具。

一安装：
1）权限
run_stats工具需要对V$STATNAME, V$MYSTAT, v$TIMER and V$LATCH的访问权限，因此必须以sys账户登录，赋予用户相应的权限。

grant select on sys.v_$timer to public;
grant select on v_$mystat to public;
grant select on sys.v_$statname to public;
grant select on sys.v_$latch to public;

2）安装
登录自己准备安装run_stats的用户，
运行以下语句：
create global temporary table run_stats
( runid varchar2(15),
name varchar2(80),
value int )
on commit preserve rows;

create or replace view stats
as select 'STAT...' || a.name name, b.value
from v$statname a, v$mystat b
where a.statistic# = b.statistic#
union all
select 'LATCH.' || name, gets
from v$latch
union all
select 'STAT...Elapsed Time', hsecs from v$timer;

create or replace package runstats_pkg
as
procedure rs_start;
procedure rs_middle;
procedure rs_stop( p_difference_threshold in number default 0 );
end;
/

create or replace package body runstats_pkg
as

g_start number;
g_run1 number;
g_run2 number;

procedure rs_start
is
begin
delete from run_stats;

insert into run_stats
select 'before', stats.* from stats;

g_start := dbms_utility.get_time;
end;

procedure rs_middle
is
begin
g_run1 := (dbms_utility.get_time-g_start);

insert into run_stats
select 'after 1', stats.* from stats;
g_start := dbms_utility.get_time;

end;

procedure rs_stop(p_difference_threshold in number default 0)
is
begin
g_run2 := (dbms_utility.get_time-g_start);

dbms_output.put_line
( 'Run1 ran in ' || g_run1 || ' hsecs' );
dbms_output.put_line
( 'Run2 ran in ' || g_run2 || ' hsecs' );
dbms_output.put_line
( 'run 1 ran in ' || round(g_run1/g_run2*100,2) ||
'% of the time' );
dbms_output.put_line( chr(9) );

insert into run_stats
select 'after 2', stats.* from stats;

dbms_output.put_line
( rpad( 'Name', 30 ) || lpad( 'Run1', 12 ) ||
lpad( 'Run2', 12 ) || lpad( 'Diff', 12 ) );

for x in
( select rpad( a.name, 30 ) ||
to_char( b.value-a.value, '999,999,999' ) ||
to_char( c.value-b.value, '999,999,999' ) ||
to_char( ( (c.value-b.value)-(b.value-a.value)), '999,999,999' ) data
from run_stats a, run_stats b, run_stats c
where a.name = b.name
and b.name = c.name
and a.runid = 'before'
and b.runid = 'after 1'
and c.runid = 'after 2'
-- and (c.value-a.value) > 0
and abs( (c.value-b.value) - (b.value-a.value) )
> p_difference_threshold
order by abs( (c.value-b.value)-(b.value-a.value))
) loop
dbms_output.put_line( x.data );
end loop;

dbms_output.put_line( chr(9) );
dbms_output.put_line
( 'Run1 latches total versus runs -- difference and pct' );
dbms_output.put_line
( lpad( 'Run1', 12 ) || lpad( 'Run2', 12 ) ||
lpad( 'Diff', 12 ) || lpad( 'Pct', 10 ) );

for x in
( select to_char( run1, '999,999,999' ) ||
to_char( run2, '999,999,999' ) ||
to_char( diff, '999,999,999' ) ||
to_char( round( run1/run2*100,2 ), '99,999.99' ) || '%' data
from ( select sum(b.value-a.value) run1, sum(c.value-b.value) run2,
sum( (c.value-b.value)-(b.value-a.value)) diff
from run_stats a, run_stats b, run_stats c
where a.name = b.name
and b.name = c.name
and a.runid = 'before'
and b.runid = 'after 1'
and c.runid = 'after 2'
and a.name like 'LATCH%'
)
) loop
dbms_output.put_line( x.data );
end loop;
end;

end;
/
grant execute on runstats_pkg to public;

二使用
在第一条语句运行之前，执行
exec runStats_pkg.rs_start;
运行第一条语句，运行之后执行：
exec runStats_pkg.rs_middle;
运行第二条语句，运行之后执行：
exec runStats_pkg.rs_stop;

关于ORA_ROWSCN的一点问题

作者：江枫 | 【转载时请务必以超链接形式标明文章原始出处和作者信息及本声明】
地址：http://rdc.taobao.com/blog/dba/html/243_about_ora_rowscn.html

以下是引文：

Oracle10g引入了一个新的ORA_ROWSCN的伪列，可以查询表中记录最后变更的SCN。这个新的伪列在某些环境下会非常有用，比如执行乐观锁定，或者增量数据抽取的时候。但是，默认情况下，每行记录的ORA_ROWSCN是基于Block的，除非在建表的时候执行开启行级跟踪(create table … rowdependencies)。

先来简单理解一下ORA_ROWSCN的实现原理。我们知道，每个Block在头部是记录了该block最近事务的SCN的，所以默认情况下，只需要从block头部直接获取这个值就可以了，不需要其他任何的开销，Oracle就能做到这一点。但是这明显是不精确的，一个block中会有很多行记录，每次事务不可能影响到整个block中所有的行，所以这是一个非常不精准的估算值，同一个block的所有记录的ORA_ROWSCN都会是相同的，基本上没有多大的使用价值。

如果在建表的时候开启行级跟踪选项，Oracle则可以为每一行记录精确的SCN，那么显然不能再直接从block头部获取。要获得足够的信息，肯定要付出一定的代价，Oracle必须为每一行实际的存储这个SCN。所以这个行级跟踪的选项，只能在建表的时候指定，而不能通过alter table来修改现有的表，否则需要修改每一行记录的物理存储格式，代价是可想而知的。

简单的做个实验就可以知道开启行级跟踪以后块记录格式的不同。建两个表，一个norowdependencies（默认），一个rowdependencies，然后分别dump出相应的数据块：

create table t1(i int);
insert into t1 values(1);
insert into t1 values(2);
commit;
create table t2 rowdependencies as select * from t1;

norowdependencies：

block_row_dump:
tab 0, row 0, @0×1f9a
tl: 6 fb: –H-FL– lb: 0×1 cc: 1
col 0: [ 2] c1 02
tab 0, row 1, @0×1f94
tl: 6 fb: –H-FL– lb: 0×1 cc: 1
col 0: [ 2] c1 03
end_of_block_dump

rowdependencies：

block_row_dump:
tab 0, row 0, @0×1f7c
tl: 12 fb: –H-FL– lb: 0×0 cc: 1
dscn 0×0000.00029ae4
col 0: [ 2] c1 02
tab 0, row 1, @0×1f6d
tl: 12 fb: –H-FL– lb: 0×0 cc: 1
dscn 0×0000.00029ae4
col 0: [ 2] c1 03
end_of_block_dump

得到行的SCN后，通过SCN_TO_TIMESTAMP函数可以转化为时间：

SQL> select SCN_TO_TIMESTAMP(ora_rowscn) from t2 where rownum<2;

SCN_TO_TIMESTAMP(ORA_ROWSCN)
---------------------------------------------------------------------------
06-JAN-09 05.31.20.000000000 PM

乐观锁和ORA_ROWSCN
需要select … for update做悲观锁定的时候，通过使用ORA_ROWSCN可以改成乐观锁定。一开始select数据的时候将ORA_ROWSCN查出来，修改后如果要写回数据库之前再比对下最新的ORA_ROWSCN就可以知道这期间数据是否有发生变化。这个Tom在他的大著《Expert Oracle Database Architecture 9i and 10g Programming Techniques and Solutions 》中也是有提到的。

增量数据抽取和ORA_ROWSCN
每次抽取后记录最大的ORA_ROWSCN，下次抽取再基于上一次的SCN来获得最近修改过的数据即可。在10g之前，很多系统要实现增量数据抽取，要么通过解析日志，要么加触发器，要么就在表上加一个时间截字段。ORA_ROWSCN其实就是第三种方式，只是这个字段由Oracle来维护，这样可以避免一些应用绕过时间截去更新其他字段带来的问题。不过，如果系统中使用了逻辑备库或者streams等逻辑复制的方案，而数据抽取又是基于逻辑备库的话，ORA_ROWSCN就可能对抽取后的数据分析有影响了，因为通过这个得到的时间是逻辑备库上记录变更的时间，而不是源库的时间了。当然，如果纯粹只是做数据抽取，而不需要使用这个时间来做分析，还是问题不大的，但还是要考虑一旦逻辑备库出现故障需要重做的，则这个增量抽取要怎么来处理的问题。

Metalink上搜一下ORA_ROWSCN可以看到不少相关的bug，所以在生产系统中使用的时候要小心。例如，我在Linux64平台上的一台测试库中，执行以下语句就会碰到ORA-07445的错误：

SQL> select ora_rowscn from x$bh where rownum<2;
select ora_rowscn from x$bh where rownum<2
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

ORA-07445: exception encountered: core dump [qkafix()+212]
[SIGSEGV] [Address not mapped to object] [0x000000004] [] []

以下是自己的观点和测试：

自己觉得，其实不光在OLAP系统里有问题，在OLTP系统里也照样有问题，会增加SQL执行的次数。在系统默认的块级SCN列情况下，只要修改同一个块里的任意一条数据，就会导致其他的操作落空，而用户必需重新进行其操作，因此会大幅度的增加SQL的执行次数，特别是在热点数据块比较集中的时候，会产生很大的性能问题。而如果是行级的SCN列的情况下，就不会有这个问题存在，但是正如作者所说，“要获得足够的信息，肯定要付出一定的代价，Oracle必须为每一行实际的存储这个SCN”，说白了，行级SCN列还是一种以空间换取时间的做法。

举个例子说：
create table t1(i number) pctfree 99;
insert into t1 select rownum from all_objects where rownum<=100;
commit;
create table t2 rowdependencies as select * from t1;

fy@ORCL> select rowid,i,ora_rowscn from t1 where rownum<=20 order by i;

ROWID I ORA_ROWSCN
------------------ ---------- ----------
AAARC+AAFAAAAAZAAA 64 970530
AAARC+AAFAAAAAZAAB 65 970530
AAARC+AAFAAAAAZAAC 66 970530
AAARC+AAFAAAAAZAAD 67 970530
AAARC+AAFAAAAAZAAE 68 970530
AAARC+AAFAAAAAZAAF 69 970530
AAARC+AAFAAAAAZAAG 70 970530
AAARC+AAFAAAAAaAAA 71 970530
AAARC+AAFAAAAAaAAB 72 970530
AAARC+AAFAAAAAaAAC 73 970530
AAARC+AAFAAAAAaAAD 74 970530
AAARC+AAFAAAAAaAAE 75 970530
AAARC+AAFAAAAAaAAF 76 970530
AAARC+AAFAAAAAaAAG 77 970530
AAARC+AAFAAAAAbAAA 78 970530
AAARC+AAFAAAAAbAAB 79 970530
AAARC+AAFAAAAAbAAC 80 970530
AAARC+AAFAAAAAbAAD 81 970530
AAARC+AAFAAAAAbAAE 82 970530
AAARC+AAFAAAAAbAAF 83 970530

20 rows selected.

fy@ORCL> select rowid,i,ora_rowscn from t2 where rownum<=20 order by i;

ROWID I ORA_ROWSCN
------------------ ---------- ----------
AAARC/AAFAAAAAtAAA 64 970587
AAARC/AAFAAAAAtAAB 65 970587
AAARC/AAFAAAAAtAAC 66 970587
AAARC/AAFAAAAAtAAD 67 970587
AAARC/AAFAAAAAtAAE 68 970587
AAARC/AAFAAAAAtAAF 69 970587
AAARC/AAFAAAAAtAAG 70 970587
AAARC/AAFAAAAAtAAH 71 970587
AAARC/AAFAAAAAtAAI 72 970587
AAARC/AAFAAAAAtAAJ 73 970587
AAARC/AAFAAAAAtAAK 74 970587
AAARC/AAFAAAAAtAAL 75 970587
AAARC/AAFAAAAAtAAM 76 970587
AAARC/AAFAAAAAtAAN 77 970587
AAARC/AAFAAAAAtAAO 78 970587
AAARC/AAFAAAAAtAAP 79 970587
AAARC/AAFAAAAAtAAQ 80 970587
AAARC/AAFAAAAAtAAR 81 970587
AAARC/AAFAAAAAtAAS 82 970587
AAARC/AAFAAAAAtAAT 83 970587

20 rows selected.

假设现在有一个用户想要更新t1表的ROWID为AAARC+AAFAAAAAZAAA的数据，他运行了这条语句：
fy@ORCL> update t1 set i=10000 where rowid='AAARC+AAFAAAAAZAAA';

1 row updated.

fy@ORCL> commit;

Commit complete.

这时查询表里的数据如下：
fy@ORCL> select rowid,i,ora_rowscn from t1 where rownum<=20 order by i;

ROWID I ORA_ROWSCN
------------------ ---------- ----------
AAARC+AAFAAAAAZAAB 65 970638
AAARC+AAFAAAAAZAAC 66 970638
AAARC+AAFAAAAAZAAD 67 970638
AAARC+AAFAAAAAZAAE 68 970638
AAARC+AAFAAAAAZAAF 69 970638
AAARC+AAFAAAAAZAAG 70 970638
AAARC+AAFAAAAAaAAA 71 970530
AAARC+AAFAAAAAaAAB 72 970530
AAARC+AAFAAAAAaAAC 73 970530
AAARC+AAFAAAAAaAAD 74 970530
AAARC+AAFAAAAAaAAE 75 970530
AAARC+AAFAAAAAaAAF 76 970530
AAARC+AAFAAAAAaAAG 77 970530
AAARC+AAFAAAAAbAAA 78 970530
AAARC+AAFAAAAAbAAB 79 970530
AAARC+AAFAAAAAbAAC 80 970530
AAARC+AAFAAAAAbAAD 81 970530
AAARC+AAFAAAAAbAAE 82 970530
AAARC+AAFAAAAAbAAF 83 970530
AAARC+AAFAAAAAZAAA 10000 970638

20 rows selected.
我们可以看见块号为AAARC+AAFAAAAAZ的所有行的SCN都发生了变化，假设有另外一个用户同时也要更新ROWID为AAARC+AAFAAAAAZAAB的行，但是发起操作的时间稍微考后一点，虽然没有人修改这一行数据，但是由于第一个用户的修改操作，这时的SCN已经变化了，其操作会落空，那么该用户不得不重新进行一次操作，这显然不是我们想要的结果。

而反观行级SCN，
fy@ORCL> update t2 set i=10000 where rowid='AAARC/AAFAAAAAtAAA';

1 row updated.

fy@ORCL> commit;

Commit complete.
查询表里的数据：
fy@ORCL> select rowid,i,ora_rowscn from t2 where rownum<=20 order by i;

ROWID I ORA_ROWSCN
------------------ ---------- ----------
AAARC/AAFAAAAAtAAB 65 970587
AAARC/AAFAAAAAtAAC 66 970587
AAARC/AAFAAAAAtAAD 67 970587
AAARC/AAFAAAAAtAAE 68 970587
AAARC/AAFAAAAAtAAF 69 970587
AAARC/AAFAAAAAtAAG 70 970587
AAARC/AAFAAAAAtAAH 71 970587
AAARC/AAFAAAAAtAAI 72 970587
AAARC/AAFAAAAAtAAJ 73 970587
AAARC/AAFAAAAAtAAK 74 970587
AAARC/AAFAAAAAtAAL 75 970587
AAARC/AAFAAAAAtAAM 76 970587
AAARC/AAFAAAAAtAAN 77 970587
AAARC/AAFAAAAAtAAO 78 970587
AAARC/AAFAAAAAtAAP 79 970587
AAARC/AAFAAAAAtAAQ 80 970587
AAARC/AAFAAAAAtAAR 81 970587
AAARC/AAFAAAAAtAAS 82 970587
AAARC/AAFAAAAAtAAT 83 970587
AAARC/AAFAAAAAtAAA 10000 970673

20 rows selected.
可以看到只有最后一行的SCN列的值发生了变化，而这种情况下显然不会引起上述问题。

当然如果不想因为使用行级SCN带来存储空间的增长，我们也可以考虑将数据分布到不同的数据块上，这就需要DBA监测热点数据分布。

欢迎有问题讨论

2009年1月7日星期三

ORACLE内核示意图

从eygle那里转过来的，今天第一次看到这个图，原来ORACLE内核结构是这样子的，以前一直以为操作系统依赖层应该是独立于内核之外的，今天算是开眼了。

2008年12月28日星期日

ORACLE动态采样初探

一直以来，我都以为ORACLE的动态采样只有在没有统计信息的时候才有用，今天看了TOM老人家得一篇大作，受益匪浅啊。

dynamic sampling从ORACLE9iR2开始就可以使用了，CBO在执行hard parse的时候使用动态采样来收集相关表的统计信息同时来纠正自己的猜测数据。这个过程只在hard parse中产生，并且被用来生成更加准确的统计信息以供CBO使用，因此被称作动态采样。

优化器使用很多输入参数来产生合适的执行计划，比如它使用表上的约束，系统统计信息，查询涉及的相关对象的统计信息。优化器使用相应的统计信息来估算基数，而基数正是成本计算的最重要的变量。也可以说基数估算的正确与否，直接影响到执行计划的选择。这就是引入动态采样的直接动机--帮助优化器估算正确的基数，从而得到正确的执行计划。

动态采样提供了11个级别可以供使用（从0到10），后面我会详细解释每一个级别。ORACLE 9i R2的默认动态采样级别是1，而10g 11g的默认级别是2。

1 使用动态采样的方法
数据库级别可以调整optimizer_dynamic_sampling参数，会话级别可以使用alter session
在查询里可以使用dynamic_sampling提示

2 几个例子
针对没有统计信息的表：
SQL> create table t
2 as
3 select owner, object_type
4 from all_objects
5 /
Table created.

SQL> select count(*) from t;

COUNT(*)
------------------------
68076

下面的查询禁用了动态采样：
SQL> set autotrace traceonly explain
SQL> select /*+ dynamic_sampling(t 0) */ * from t;

Execution Plan
------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16010 | 437K| 55 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 16010 | 437K| 55 (0)| 00:00:01 |
--------------------------------------------------------------------------
下面的查询启用了动态采样：
SQL> select * from t;

Execution Plan
------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 77871 | 2129K| 56 (2)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 77871 | 2129K| 56 (2)| 00:00:01 |
--------------------------------------------------------------------------

Note
------------------------------------------
- dynamic sampling used for this statement

请注意，启用动态采样后的执行计划的基数（77871）更加接近实际的行数（68076），因此这个执行计划更可靠一点。

SQL>set autot off

当然也可能估算出差别很大的基数：

SQL> delete from t;
68076 rows deleted.

SQL> commit;
Commit complete.

SQL> set autotrace traceonly explain
禁用动态采样：
SQL> select /*+ dynamic_sampling(t 0) */ * from t;

Execution Plan
------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 16010 | 437K| 55 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 16010 | 437K| 55 (0)| 00:00:01 |
--------------------------------------------------------------------------
启用动态采样：
SQL> select * from t;

Execution Plan
-----------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 28 | 55 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 1 | 28 | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
---------------------------------------
- dynamic sampling used for this statement
实际上，表里现在没有数据，但是由于使用了delete而不是truncate,数据库并没有重置HWM，因此数据库猜错了结果，启用动态采样的基数要远远强于没有采样的。

以上两个例子，都是在没有对表进行统计的时候得到的，那么在有统计信息的情况下，动态采样还能有作用么？请考虑以下的例子，假设EMP表里有两个字段，birth_day date 和星座（varchar2）,假设其中有1440000行数据(当然数据库表设计没有遵循第二范式)。在这种情况下，查询出生在1月份，并且星座是天平座的人，CBO会估算出多少行呢？答案是1W，但是实际上一行都不会返回，在这种数据库表设计有严重问题的时候，动态采样再一次向我们展示了其魅力所在。

SQL> create table t
2 as select decode( mod(rownum,2), 0, 'N', 'Y' ) flag1,
3 decode( mod(rownum,2), 0, 'Y', 'N' ) flag2, a.*
4 from all_objects a
5 /
Table created.

SQL > create index t_idx on t(flag1,flag2);
Index created.

SQL > begin
2 dbms_stats.gather_table_stats
3 ( user, 'T',
4 method_opt=>'for all indexed columns size 254' );
5 end;
6 /
PL/SQL procedure successfully completed.
这里做了统计，有直方图了，以下是一些统计信息：

SQL> select num_rows, num_rows/2,
num_rows/2/2 from user_tables
where table_name = 'T';

NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2
-------- ---------- ------------
68076 34038 17019

SQL> set autotrace traceonly explain
SQL> select * from t where flag1='N';

Execution Plan
------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 33479 | 3432K| 292 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| T | 33479 | 3432K| 292 (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAG1"='N')

SQL> select * from t where flag2='N';

Execution Plan
----------------------------
Plan hash value: 1601196873

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 34597 | 3547K| 292 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| T | 34597 | 3547K| 292 (1)| 00:00:04 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("FLAG2"='N')

前两个执行计划都是准确的，会返回34597行数据，大概是表里数据的一半。接着执行以下查询：

SQL> select * from t where flag1 = 'N' and flag2 = 'N';

Execution Plan
----------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 17014 | 1744K| 292 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| T | 17014 | 1744K| 292 (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
----------------------------------------------------

1 - filter("FLAG1" = 'N' AND "FLAG2" = 'N')

这时会返回表里四分之一左右的数据，但是实际上应该是一行数据都没有，并且由于错误的基数估算导致错误的执行计划。再看看下面的这个查询，强制CBO进行动态采样。

SQL> select /*+ dynamic_sampling(t 3) */ * from t where flag1 = 'N' and flag2 = 'N';

Execution Plan
-----------------------------
Plan hash value: 470836197

------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 630 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 6 | 630 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 6 | | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
----------------------------------------------------

2 - access("FLAG1"='N' AND "FLAG2"='N')

CBO估计会返回6行数据，并且进行了索引范围扫描，成本也很低，总体来说这个执行计划还令人满意。至于CBO估算出来的基数为什么是6，而不是0，这个解释起来比较复杂，在此就不详述了。

3 CBO动态采样级别：
level 0:不使用动态采样
level 1:在以下情况进行采样所有查询涉及的表:(1)查询语句里至少有一个未anylze过的表;(2)unanalyzed table连接到其他表或者出现在子查询中或者在不可合并的视图中；(3)这个表上没索引；(4)表数据块比进行动态采样的数据块多。
level 2:应用动态采样到所有unanalyzed table，表数据块至少是动态采样块的2倍。
level 3:满足level2的所有条件，并且外加所有表的标准选择率可以使用动态采样的谓词来计算
level 4:满足level3的所有标准，并且表上有参照到多个列的单个谓词。
level 5,6,7,8 and 9:满足前一个级别的所有标准，使用使用2，4，8，32，128倍的默认动态采样率。
level 10:满足level9的所有标注，并且对表里的所有块进行动态采样。

当然，前面的关于11个level的论述我可能翻译的不是很准确，有兴趣的话可以参考oracle database performance tuning.

4 何时使用动态采样以及如何使用：
(1) OLAP系统中可以大量使用
(2) 尽量避免在OLTP系统中使用
(3) 如果想在OLTP系统中使用的话，建议通过sql profiles来使用，遗憾的是从10g以后才能使用这个功能。原理上来说，SQL profiles有点像收集统计数据的查询和存储信息的字典，因此降低动态抽样的时间和hard parse的时间。

欢迎有问题一起讨论。tom关于这个问题有更加详细的论述，如果有兴趣的话，可以参考。

订阅：博文 (Atom)

fanng's blog

2009年3月26日星期四

ORA-12737错误

2009年3月18日星期三

判断表空间是否还可以压缩的脚本

2009年3月11日星期三

ORACLE sql调整（三）

ORACLE sql调整（二）

2009年3月9日星期一

几种SQL模式

ORACLE sql调整（一）

2009年3月8日星期日

使用ORACLE高级查询重写

2009年2月24日星期二

关于增量备份的小故事

2009年2月23日星期一

关于JOB的一点问题

2009年2月12日星期四

MapReduce: 一个巨大的倒退

2009年2月10日星期二

Oracle跨版本导出EXP-00003错误的解决

2009年1月13日星期二

ORACLE11g的几个新特性（针对开发者）

使用show_space工具

使用TOM的run_stats工具

关于ORA_ROWSCN的一点问题

2009年1月7日星期三

ORACLE内核示意图

2008年12月28日星期日

ORACLE动态采样初探

关注者

我的博客列表

博文分类

博客归档

我的简介