关于字符集的修改–续

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select userenv('language') from dual;

USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.ZHS16GBK

SQL> create table ht01(name varchar2(4));

Table created.

SQL> insert into ht01 values('海天');

1 row created.

SQL> commit;

Commit complete.

SQL> begin
  2    for i in 1..100 loop
  3      insert /*+ append */into ht01 select * from ht01;
  4      commit;
  5    end loop;
  6  end;
  7  /
begin
*
ERROR at line 1:
ORA-01653: unable to extend table ROGER.HT01 by 128 in tablespace ROGER
ORA-06512: at line 3

SQL> commit;

Commit complete.

SQL> select count(*) from ht01;

  COUNT(*)
----------
   1048576

SQL> analyze table ht01 compute statistics;

Table analyzed.

SQL> select owner,segment_name,bytes/1024/1024
  2  from dba_segments
  3  where segment_name='HT01';

OWNER                          SEGMENT_NAME         BYTES/1024/1024
------------------------------ -------------------- ---------------
ROGER                          HT01                              19

SQL> select dump('海天') from dual;

DUMP('????')
-------------------------
Typ=96 Len=4: 63,63,63,63
我们可以看到,在GBK字符集下,一个汉字占据2个字节,而海天2个字则占据4个字节了。
[oracle@roger oracle]$ ls -ltr scan*

-rw-r--r--  1 oracle dba  5684 Jul  9 18:57 scan.txt
-rw-r--r--  1 oracle dba  1357 Jul  9 18:57 scan.err
-rw-r--r--  1 oracle dba 75231 Jul  9 18:57 scan.out

[oracle@roger oracle]$ cat scan.txt

Database Scan Summary Report

Time Started  : 2011-07-09 18:54:10
Time Completed: 2011-07-09 18:57:26

Process ID         Time Started       Time Completed
---------- -------------------- --------------------
         1  2011-07-09 18:54:53  2011-07-09 18:57:25
---------- -------------------- --------------------

[Database Size]

Tablespace                           Used            Free           Total       Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM                            475.00M           5.00M         480.00M            .00K
UNDOTBS1                           20.38M           4.63M          25.00M            .00K
SYSAUX                            240.19M           9.81M         250.00M            .00K
TEMP                                 .00K            .00K            .00K            .00K
USERS                             448.00K           4.56M           5.00M            .00K
ROGER                              19.13M         896.00K          20.00M            .00K
------------------------- --------------- --------------- --------------- ---------------
Total                             755.13M          24.88M         780.00M            .00K

The size of the largest CLOB is 3250228 bytes

[Database Scan Parameters]

Parameter                      Value
------------------------------ ------------------------------------------------
CSSCAN Version                 v2.1
Instance Name                  alex
Database Version               10.2.0.4.0
Scan type                      Full database
Scan CHAR data?                YES
Database character set         ZHS16GBK
FROMCHAR                       ZHS16GBK
TOCHAR                         UTF8
Scan NCHAR data?               NO
Array fetch buffer size        1024000
Number of processes            1
Capture convertible data?      NO
------------------------------ ------------------------------------------------

[Scan Summary]

All character type data in the data dictionary remain the same in the new character set
All character type application data remain the same in the new character set

[Data Dictionary Conversion Summary]

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                     1,919,100                0                0                0
CHAR                             1,104                0                0                0
LONG                           149,170                0                0                0
CLOB                            20,335                0                0                0
VARRAY                          22,422                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                        2,112,131                0                0                0
Total in percentage            100.000%           0.000%           0.000%           0.000%

The data dictionary can be safely migrated using the CSALTER script

[Application Data Conversion Summary]

Datatype                    Changeless      Convertible       Truncation            Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2                     1,078,683                0                0                0
CHAR                                 0                0                0                0
LONG                                 0                0                0                0
CLOB                                 0                0                0                0
VARRAY                           1,575                0                0                0
--------------------- ---------------- ---------------- ---------------- ----------------
Total                        1,080,258                0                0                0
Total in percentage            100.000%           0.000%           0.000%           0.000%

[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE                                              Convertible       Truncation            Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
-------------------------------------------------- ---------------- ---------------- ----------------

[Distribution of Convertible, Truncated and Lossy Data by Column]

USER.TABLE|COLUMN                                       Convertible       Truncation            Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
-------------------------------------------------- ---------------- ---------------- ----------------

[Indexes to be Rebuilt]

USER.INDEX on USER.TABLE(COLUMN)
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
[oracle@roger oracle]$ sqlplus "/as sysdba"

SQL*Plus: Release 10.2.0.4.0 - Production on Sat Jul 9 18:57:55 2011
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> @ ?/rdbms/admin/csalter.plb

0 rows created.

Function created.
Function created.
Procedure created.

This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?y
old   6:     if (UPPER('&conf') <> 'Y') then
new   6:     if (UPPER('y') <> 'Y') then
Checking data validility...
Sorry only one session is allowed to run this script

PL/SQL procedure successfully completed.

Checking or Converting phrase did not finish successfully
No database (national) character set will be altered
CSALTER finished unsuccessfully.

PL/SQL procedure successfully completed.

0 rows deleted.

Function dropped.

Function dropped.

Procedure dropped.

从上看,具体转换没有成功,太奇怪了,根据前面的scan.txt来,应该是完全ok的。
我尝试将库重启一下,然后再试;
SQL> shutdown abort;
ORACLE instance shut down.

SQL> startup

ORACLE instance started.
Total System Global Area  167772160 bytes
Fixed Size                  1266392 bytes
Variable Size              88083752 bytes
Database Buffers           75497472 bytes
Redo Buffers                2924544 bytes
Database mounted.
Database opened.
SQL> @ ?/rdbms/admin/csalter.plb;

0 rows created.

Function created.

Function created.

Procedure created.

This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y
old   6:     if (UPPER('&conf') <> 'Y') then
new   6:     if (UPPER('Y') <> 'Y') then
Checking data validility...
begin converting system objects

PL/SQL procedure successfully completed.

Alter the database character set...
declare
*
ERROR at line 1:
ORA-12721: operation cannot execute when other sessions are active
ORA-06512: at "SYS.DBMS_SYS_SQL", line 909
ORA-06512: at "SYS.DBMS_SQL", line 39
ORA-06512: at "SYS.CSM$RUNDML", line 6
ORA-06512: at line 22

0 rows deleted.

Function dropped.

Function dropped.

Procedure dropped.

SQL> @ ?/rdbms/admin/csalter.plb;

0 rows created.

Function created.

Function created.

Procedure created.

This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y
old   6:     if (UPPER('&conf') <> 'Y') then
new   6:     if (UPPER('Y') <> 'Y') then
Checking data validility...
begin converting system objects

PL/SQL procedure successfully completed.

Alter the database character set...
CSALTER operation completed, please restart database

PL/SQL procedure successfully completed.

0 rows deleted.

Function dropped.

Function dropped.

Procedure dropped.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup

ORACLE instance started.
Total System Global Area  167772160 bytes
Fixed Size                  1266392 bytes
Variable Size              88083752 bytes
Database Buffers           75497472 bytes
Redo Buffers                2924544 bytes
Database mounted.
Database opened.

SQL> select userenv('language') from dual;

USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.UTF8

这里字符集修改成功了,比较怪异,第一次居然失败了,真是匪夷所思,下面我们来看看在UTF8下的情况:
SQL> conn roger/roger
Connected.

SQL> select dump('海天') from dual;

DUMP('????')
--------------------------------------------------------------
Typ=96 Len=12: 239,191,189,239,191,189,239,191,189,239,191,189

SQL> select owner,segment_name,bytes/1024/1024
  2  from dba_segments
  3  where segment_name='HT01';

OWNER                          SEGMENT_NAME         BYTES/1024/1024
------------------------------ -------------------- ---------------
ROGER                          HT01                              19
从上面的测试来看,从GBK到UTF8后,ht01表的大小并没有发生变化。

用如下命令再次更改字符集:
csscan system/oracle full=y FROMCHAR=UTF8 TOCHAR=AL32UTF8  ARRAY=1024000 PROCESS=1

SQL> conn /as sysdba
Connected.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup

ORACLE instance started.
Total System Global Area  167772160 bytes
Fixed Size                  1266392 bytes
Variable Size              83889448 bytes
Database Buffers           79691776 bytes
Redo Buffers                2924544 bytes
Database mounted.
Database opened.

SQL> @ ?/rdbms/admin/csalter.plb;

0 rows created.
Function created.
Function created.
Procedure created.

This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y
old   6:     if (UPPER('&conf') <> 'Y') then
new   6:     if (UPPER('Y') <> 'Y') then
Checking data validility...
begin converting system objects

PL/SQL procedure successfully completed.

Alter the database character set...
CSALTER operation completed, please restart database

PL/SQL procedure successfully completed.

0 rows deleted.
Function dropped.
Function dropped.
Procedure dropped.

SQL> shutdown abort;

ORACLE instance shut down.

SQL> startup

ORACLE instance started.
Total System Global Area  167772160 bytes
Fixed Size                  1266392 bytes
Variable Size              83889448 bytes
Database Buffers           79691776 bytes
Redo Buffers                2924544 bytes
Database mounted.
Database opened.

SQL> select userenv('language') from dual;

USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.AL32UTF8

SQL> conn roger/roger
Connected.

SQL> analyze table ht01 compute statistics;

Table analyzed.

SQL> select owner,segment_name,bytes/1024/1024
  2  from dba_segments
  3  where segment_name='HT01';

OWNER                          SEGMENT_NAME         BYTES/1024/1024
------------------------------ -------------------- ---------------
ROGER                          HT01                              19

SQL> select dump('海天') from dual;

DUMP('????')
--------------------------------------------------------------
Typ=96 Len=12: 239,191,189,239,191,189,239,191,189,239,191,189

SQL> select length('海天') from dual;

LENGTH('????')
--------------
             4
SQL> SELECT DISTINCT z.owner_name
  2  || '.'
  3  || z.table_name
  4  || '('
  5  || z.column_name
  6  || ') - '
  7  || z.column_type
  8  || ' ' LossyColumns
  9  FROM csmig.csmv$errors z
 10  WHERE z.error_type ='DATA_LOSS'
 11  ORDER BY LossyColumns
 12  /

no rows selected
最后转换为 以后,发生数据量大小仍然没有变化,请教了一下老白,他说还需要执行什么脚本?
我找遍了所有的文档也没发现,最后无奈只能看看这个csalter.plb的内容了:
[oracle@roger admin]$ more csalter.plb
set serveroutput on;
/* Insert common schema into csm$dictusers.
   All other data dictionary schema is already inserted into csm$dictuses
   when csm$dictuser is created.
*/
insert into csm$dictusers
  select distinct u.user#, u.name from sys.user$ u
  where u.name in ('HR', 'OE', 'SH', 'PM');
/* this function runs a DML statement. It returns how many rows updated */
create or replace function csm$rundml wrapped
a000000
1
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
8
102 f3
Ku7Of/ePHTQvbCjNgzvJsAv6Wegwg1zQLcusZ3RAkBAYy8vIL1sftvQddpSWNxxhv9vwNYBS
UM75HMFscMa79rVhSYCXJ4S4EtATscFCLCz0castOwK8cDLRMJzhjDJZDvrg8HSjPKaXf1mU
OLmzKFZp9A0bluiLavPlwo2vsCAuByEwGxtH11HVVHwIG9Yw6WFLcyKyzXTZfDRRwN9ZY7Xe
5X+sUnLYhoI/ggmNZLznejUF

/
/* This function create a sql statment to update the clob/nvarchar2 for a table
   The sql statement will be something like:
      UPDATE USERNAME.TABLENAME SET
       COL1 := CONVERT(COL1, TOCHAR, FROMCHAR'), COL2:=CONVERT(COL2, TOCHAR, FROMCHAR)
      WHERE ROWID >=MINROWID AND ROWID  'Y') then
       return;
    end if;

    csm$main();
    return;

end;
/
declare
   svalue   varchar2(30);
   sqlstmt  varchar2(1024);
   tochar   varchar2(30);
   rowcnt   number;
begin


    select value into svalue from csm$parameters where name='CSALTER_FINISHED';
    if (svalue <> 'TRUE') then
       dbms_output.put_line('Checking or Converting phrase did not finish successfully');
       dbms_output.put_line('No database (national) character set will be altered');
       dbms_output.put_line('CSALTER finished unsuccessfully.');
       return;
    end if;


    select value into svalue from csm$parameters where name='SCAN_CHAR';
    if (svalue = 'YES') then
       dbms_output.put_line('Alter the database character set...');
       select value into tochar from csm$parameters where name='TO_CHARSET_NAME';
       rowcnt := csm$rundml('alter database character set internal_use '||tochar);
       commit;
    end if;


    select value into svalue from csm$parameters where name='SCAN_NCHAR';
    if (svalue = 'YES') then
       dbms_output.put_line('Alter the national database character set...');
       select value into tochar from csm$parameters where name='TO_NCHARSET_NAME';
       rowcnt := csm$rundml('alter database national character set internal_use '||tochar);
       commit;
    end if;

    dbms_output.put_line('CSALTER operation completed, please restart database');
end;
/
/* restore csm$dictuser */
delete from csm$dictusers where username in ('PM', 'SH', 'HR', 'OE');
drop function csm$rundml;
drop function csm$createsql;
drop procedure csm$main;

郁闷的是这几个主要的function都加密的,我汗。。。。

那么对于新创建的表呢?是如何存储的?下面来看看,
SQL> conn roger/roger
Connected.

SQL> create table ht03 as select * from ht01;

Table created.

SQL> set lines 120
SQL> col segment_name for a20
SQL> select owner,segment_name,bytes/1024/1024
  2  from dba_segments
  3  where owner='ROGER';

OWNER                          SEGMENT_NAME         BYTES/1024/1024
------------------------------ -------------------- ---------------
ROGER                          HT03                              13
ROGER                          HT02                           .0625
ROGER                          HT01                              19
我们发现ht03跟ht01完全一样,只是ht01是做字符集转换以前建的,ht03是字符集转换以后再
创建的,当前的数据库字符集为AL32UTF8。

由于前面也转换过为UTF8,那么我们有必要来看看UTF8和AL32UTF8的区别:
SQL> select dump(convert('测试', 'AL32UTF8')) from dual;

DUMP(CONVERT('测试','AL32UTF8
----------------------------
Typ=1 Len=4: 178,226,202,212

SQL> select dump(convert('测试', 'ZHS16GBK')) from dual;

DUMP(CONVERT('测试','ZHS
-----------------------
Typ=1 Len=3: 63,163,191

SQL> select dump(convert('测试', 'UTF8')) from dual;

DUMP(CONVERT('测试','UTF8'))
------------------------------------
Typ=1 Len=6: 239,191,189,239,191,189
The difference between UTF8 and AL32UTF8 are:
UTF8 stores Unicode characters with code points > U+FFFF as two surrogate characters, three bytes each

AL32UTF8 stores Unicode characters with code points > U+FFFF as one four-byte character

UTF8 will not be updated anymore when new Unicode versions are released, only AL32UTF8 and AL16UTF16 will.

Due to compatibility problems with pre-9i versions use UTF8 if you have Oracle8i clients connecting to the database. Use AL32UTF8 in pure Oracle9i environment.

UTF-8 encoding is variable-width. In UTF-8, each character can be represented by either one, two, or three bytes.

UTF8 is a varying width 1-3 bytes per character Unicode encoding. It is supported for both database and national character sets. It is a binary superset of US7ASCII. UTF8 corresponds to Unicode CESU-8 encoding.

AL32UTF8 is a varying width 1-4 bytes per character. It is supported for CHAR, VARCHAR2, LONG and CLOB only (database character set). It is a binary superset of UTF8 (in 9.2 only) and US7ASCII. AL32UTF8 corresponds to Unicode UTF-8 encoding.

This is what Metalink says. In Note: 237593.1

There is a possible problem for 817 and lower versions: Problems connecting to AL32UTF8 databases from older versions 8i and lower.

The default UTF8 characterset for 9i/10G is AL32UTF8, however this characterset is NOT recognised by any pre-9i clients/server systems.

Oracle recommends that you use UTF8 instead of AL32UTF8 as database characterset if you have 8i (or older) servers and clients connecting to the 9i/10g system until you can upgrade the older versions.

UTF8 is unicode revision 3.0 in 8.1.7 and up. AL32UTF8 is Unicode 3.0 in 9.0.1, Unicode 3.1 in 9.2, Unicode 3.2 in 10.1 and Unicode 4.01 in 10.2

Besides the difference in Unicode version the "big difference" is that AL32UTF8 has build in support for "Surrogate Pairs" (also known as Surrogate characters or "Supplementary characters").
Practically this means that in 99% of the cases you can use UTF8 instead of AL32UTF8 without any problem.

There are only a few situations where Surrogate Pairs are already used on client side. Windows system with HKSCS2001 (hong kong extension) is one of those. Note that you actually can *store* Surrogate Pairs in UTF8 but will store 2 * 3 byte characters and not like AL32UTF8 one 4 byte character.

Note that if you now use UTF8 as database characterset and -in the future you do a roll out of new 9i or higher clients and all your other databases are upgraded to 9i or higher, you can simply do a alter database characterset to go from UTF8 to AL32UTF8 so downtime
will be limited to a few minutes if the need to go to AL32UTF8 should arise. There is no performance impact on staying on UTF8

NOTE:
This note is ONLY relevant if you have already a 9i AL32UTF8 database with data in. If you still need to create the 9i system then choose UTF8 instead of AL32UTF8 as database characterset in the database creation assistant.

So, *IF* you have already a 9i system running with AL32UTF8 then you can use the following steps in this note to change the database characterset to UTF8 without losing data.

You can't simply use "ALTER DATABASE CHARACTERSET" to go from AL32UTF8 to UTF8 because UTF8 is a SUB-set of AL32UTF8 (some codepoints which are correct in AL32UTF8 are not known in UTF8)

But again, UTF8 *contains* all characters know in AL32UTF8, the difference between them is pure the way some characters are stored (AL32UTF8 is a bit more efficient for some characters)


So you will run into ORA-12712 if you try alter database ...
SQL> create table ht04(name varchar2(12));

Table created.

SQL> insert into ht04 values('北京上海');

1 row created.

SQL> insert into ht04 values('北京上海a');

1 row created.

SQL> insert into ht04 values('北京上海ab');

1 row created.

SQL> insert into ht04 values('北京上海abc');

1 row created.

SQL> insert into ht04 values('北京上海abcd');

1 row created.

SQL> insert into ht04 values('北京上海abcde');
insert into ht04 values('北京上海abcde')
                        *
ERROR at line 1:
ORA-12899: value too large for column "ROGER"."HT04"."NAME" (actual: 13,
maximum: 12)

SQL> insert into ht04 values('北京上海人');

1 row created.

SQL> insert into ht04 values('北京上海人权');

1 row created.

SQL> insert into ht04 values('北京上海人权宣');
insert into ht04 values('北京上海人权宣')
                        *
ERROR at line 1:
ORA-12899: value too large for column "ROGER"."HT04"."NAME" (actual: 14,
maximum: 12)
SQL> commit;

Commit complete.

SQL> select * from ht04;

NAME
------------
北京上海
北京上海a
北京上海ab
北京上海abc
北京上海abcd
北京上海人
北京上海人权

7 rows selected.

SQL> select dump(name) from ht04;

DUMP(NAME)
--------------------------------------------------------------------------------
Typ=1 Len=8: 177,177,190,169,201,207,186,163
Typ=1 Len=9: 177,177,190,169,201,207,186,163,97
Typ=1 Len=10: 177,177,190,169,201,207,186,163,97,98
Typ=1 Len=11: 177,177,190,169,201,207,186,163,97,98,99
Typ=1 Len=12: 177,177,190,169,201,207,186,163,97,98,99,100
Typ=1 Len=10: 177,177,190,169,201,207,186,163,200,203
Typ=1 Len=12: 177,177,190,169,201,207,186,163,200,203,200,168

7 rows selected.
这个问题纠结了我2天,最后恍然大悟,其实跟字段类型有关系,要看一个数据库字符集
转换以后,容量是否会发生变化,我们通过scan.txt来看就行了,如下:
[Database Size]

Tablespace                           Used            Free           Total       Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM                            473.00M           7.00M         480.00M            .00K
UNDOTBS1                           21.25M           3.75M          25.00M            .00K
SYSAUX                            165.13M          74.88M         240.00M            .00K
TEMP                                 .00K            .00K            .00K            .00K
USERS                             448.00K           4.56M           5.00M            .00K
ROGER                              64.00K         199.94M         200.00M            .00K
------------------------- --------------- --------------- --------------- ---------------
Total                             659.88M         290.13M         950.00M            .00K
由于前面的测试表字段类型都是varchar2,我们也值得varchar2是可变的,

UTF8      是1~3 个bytes
AL32UTF8  是1~4 个bytes

从上面的scan.txt来看,就能发现字符集转换以后,ht01 ht02所在的表空间roger完全没有变化。

评论

  1. Hello Website Owner! I found your blog on Google and I really like it. My team provides professional article writing, and we are able to do it for $0.01 per word – that’s $4 for a 400 word article. All of our writers are based in the United States, and all of our articles passes the Copyscape test. If you are interested in using our service, or simply want to give us a try, please check out website out http://www.contentwriters.us

  2. Hello Website Owner! I really like your blog, I found you through Google so I thought I’ll share this tip with you. There’s a WordPress add-on for SEO that does automatic SEO for your blog, automatic SEO plugins like this are new in the blog scene so getting on this now would give your page a big traffic jump for sure. If you are serious about making your blog grow and make money then check it out @ http://tiny.cc/0ej3z. Thanks, keep up the good work.

  3. Webmaster, I am the admin at SEOPlugins.org. We profile SEO Plugins for WordPress blogs for on-site and off-site SEO. I’d like to invite you to check out our recent profile for a pretty amazing plugin which can double or triple traffic for a Worpdress blog. You can delete this comment, I didn’t want to comment on your blog, just wanted to drop you a personal message. Thanks, Rich

  4. Very nice, i suggest webmaster can set up a forum, so that we can talk and communicate.

  5. I love this, beautiful colors and great blogs. Good work!

  6. Alex (Said he knows you) told me that you could help with a “permalink” problem I was having with wordpress. Can you take a look for me?

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注