SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> select userenv('language') from dual;
USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.ZHS16GBK
SQL> create table ht01(name varchar2(4));
Table created.
SQL> insert into ht01 values('海天');
1 row created.
SQL> commit;
Commit complete.
SQL> begin
2 for i in 1..100 loop
3 insert /*+ append */into ht01 select * from ht01;
4 commit;
5 end loop;
6 end;
7 /
begin
*
ERROR at line 1:
ORA-01653: unable to extend table ROGER.HT01 by 128 in tablespace ROGER
ORA-06512: at line 3
SQL> commit;
Commit complete.
SQL> select count(*) from ht01;
COUNT(*)
----------
1048576
SQL> analyze table ht01 compute statistics;
Table analyzed.
SQL> select owner,segment_name,bytes/1024/1024
2 from dba_segments
3 where segment_name='HT01';
OWNER SEGMENT_NAME BYTES/1024/1024
------------------------------ -------------------- ---------------
ROGER HT01 19
SQL> select dump('海天') from dual;
DUMP('????')
-------------------------
Typ=96 Len=4: 63,63,63,63
我们可以看到,在GBK字符集下,一个汉字占据2个字节,而海天2个字则占据4个字节了。
[oracle@roger oracle]$ ls -ltr scan*
-rw-r--r-- 1 oracle dba 5684 Jul 9 18:57 scan.txt
-rw-r--r-- 1 oracle dba 1357 Jul 9 18:57 scan.err
-rw-r--r-- 1 oracle dba 75231 Jul 9 18:57 scan.out
[oracle@roger oracle]$ cat scan.txt
Database Scan Summary Report
Time Started : 2011-07-09 18:54:10
Time Completed: 2011-07-09 18:57:26
Process ID Time Started Time Completed
---------- -------------------- --------------------
1 2011-07-09 18:54:53 2011-07-09 18:57:25
---------- -------------------- --------------------
[Database Size]
Tablespace Used Free Total Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM 475.00M 5.00M 480.00M .00K
UNDOTBS1 20.38M 4.63M 25.00M .00K
SYSAUX 240.19M 9.81M 250.00M .00K
TEMP .00K .00K .00K .00K
USERS 448.00K 4.56M 5.00M .00K
ROGER 19.13M 896.00K 20.00M .00K
------------------------- --------------- --------------- --------------- ---------------
Total 755.13M 24.88M 780.00M .00K
The size of the largest CLOB is 3250228 bytes
[Database Scan Parameters]
Parameter Value
------------------------------ ------------------------------------------------
CSSCAN Version v2.1
Instance Name alex
Database Version 10.2.0.4.0
Scan type Full database
Scan CHAR data? YES
Database character set ZHS16GBK
FROMCHAR ZHS16GBK
TOCHAR UTF8
Scan NCHAR data? NO
Array fetch buffer size 1024000
Number of processes 1
Capture convertible data? NO
------------------------------ ------------------------------------------------
[Scan Summary]
All character type data in the data dictionary remain the same in the new character set
All character type application data remain the same in the new character set
[Data Dictionary Conversion Summary]
Datatype Changeless Convertible Truncation Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2 1,919,100 0 0 0
CHAR 1,104 0 0 0
LONG 149,170 0 0 0
CLOB 20,335 0 0 0
VARRAY 22,422 0 0 0
--------------------- ---------------- ---------------- ---------------- ----------------
Total 2,112,131 0 0 0
Total in percentage 100.000% 0.000% 0.000% 0.000%
The data dictionary can be safely migrated using the CSALTER script
[Application Data Conversion Summary]
Datatype Changeless Convertible Truncation Lossy
--------------------- ---------------- ---------------- ---------------- ----------------
VARCHAR2 1,078,683 0 0 0
CHAR 0 0 0 0
LONG 0 0 0 0
CLOB 0 0 0 0
VARRAY 1,575 0 0 0
--------------------- ---------------- ---------------- ---------------- ----------------
Total 1,080,258 0 0 0
Total in percentage 100.000% 0.000% 0.000% 0.000%
[Distribution of Convertible, Truncated and Lossy Data by Table]
USER.TABLE Convertible Truncation Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
-------------------------------------------------- ---------------- ---------------- ----------------
[Distribution of Convertible, Truncated and Lossy Data by Column]
USER.TABLE|COLUMN Convertible Truncation Lossy
-------------------------------------------------- ---------------- ---------------- ----------------
-------------------------------------------------- ---------------- ---------------- ----------------
[Indexes to be Rebuilt]
USER.INDEX on USER.TABLE(COLUMN)
-----------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
[oracle@roger oracle]$ sqlplus "/as sysdba"
SQL*Plus: Release 10.2.0.4.0 - Production on Sat Jul 9 18:57:55 2011
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> @ ?/rdbms/admin/csalter.plb
0 rows created.
Function created.
Function created.
Procedure created.
This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?y
old 6: if (UPPER('&conf') <> 'Y') then
new 6: if (UPPER('y') <> 'Y') then
Checking data validility...
Sorry only one session is allowed to run this script
PL/SQL procedure successfully completed.
Checking or Converting phrase did not finish successfully
No database (national) character set will be altered
CSALTER finished unsuccessfully.
PL/SQL procedure successfully completed.
0 rows deleted.
Function dropped.
Function dropped.
Procedure dropped.
从上看,具体转换没有成功,太奇怪了,根据前面的scan.txt来,应该是完全ok的。
我尝试将库重启一下,然后再试;
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 167772160 bytes
Fixed Size 1266392 bytes
Variable Size 88083752 bytes
Database Buffers 75497472 bytes
Redo Buffers 2924544 bytes
Database mounted.
Database opened.
SQL> @ ?/rdbms/admin/csalter.plb;
0 rows created.
Function created.
Function created.
Procedure created.
This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y
old 6: if (UPPER('&conf') <> 'Y') then
new 6: if (UPPER('Y') <> 'Y') then
Checking data validility...
begin converting system objects
PL/SQL procedure successfully completed.
Alter the database character set...
declare
*
ERROR at line 1:
ORA-12721: operation cannot execute when other sessions are active
ORA-06512: at "SYS.DBMS_SYS_SQL", line 909
ORA-06512: at "SYS.DBMS_SQL", line 39
ORA-06512: at "SYS.CSM$RUNDML", line 6
ORA-06512: at line 22
0 rows deleted.
Function dropped.
Function dropped.
Procedure dropped.
SQL> @ ?/rdbms/admin/csalter.plb;
0 rows created.
Function created.
Function created.
Procedure created.
This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y
old 6: if (UPPER('&conf') <> 'Y') then
new 6: if (UPPER('Y') <> 'Y') then
Checking data validility...
begin converting system objects
PL/SQL procedure successfully completed.
Alter the database character set...
CSALTER operation completed, please restart database
PL/SQL procedure successfully completed.
0 rows deleted.
Function dropped.
Function dropped.
Procedure dropped.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 167772160 bytes
Fixed Size 1266392 bytes
Variable Size 88083752 bytes
Database Buffers 75497472 bytes
Redo Buffers 2924544 bytes
Database mounted.
Database opened.
SQL> select userenv('language') from dual;
USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.UTF8
这里字符集修改成功了,比较怪异,第一次居然失败了,真是匪夷所思,下面我们来看看在UTF8下的情况:
SQL> conn roger/roger
Connected.
SQL> select dump('海天') from dual;
DUMP('????')
--------------------------------------------------------------
Typ=96 Len=12: 239,191,189,239,191,189,239,191,189,239,191,189
SQL> select owner,segment_name,bytes/1024/1024
2 from dba_segments
3 where segment_name='HT01';
OWNER SEGMENT_NAME BYTES/1024/1024
------------------------------ -------------------- ---------------
ROGER HT01 19
从上面的测试来看,从GBK到UTF8后,ht01表的大小并没有发生变化。
用如下命令再次更改字符集:
csscan system/oracle full=y FROMCHAR=UTF8 TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=1
SQL> conn /as sysdba
Connected.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 167772160 bytes
Fixed Size 1266392 bytes
Variable Size 83889448 bytes
Database Buffers 79691776 bytes
Redo Buffers 2924544 bytes
Database mounted.
Database opened.
SQL> @ ?/rdbms/admin/csalter.plb;
0 rows created.
Function created.
Function created.
Procedure created.
This script will update the content of the Oracle Data Dictionary.
Please ensure you have a full backup before initiating this procedure.
Would you like to proceed (Y/N)?Y
old 6: if (UPPER('&conf') <> 'Y') then
new 6: if (UPPER('Y') <> 'Y') then
Checking data validility...
begin converting system objects
PL/SQL procedure successfully completed.
Alter the database character set...
CSALTER operation completed, please restart database
PL/SQL procedure successfully completed.
0 rows deleted.
Function dropped.
Function dropped.
Procedure dropped.
SQL> shutdown abort;
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 167772160 bytes
Fixed Size 1266392 bytes
Variable Size 83889448 bytes
Database Buffers 79691776 bytes
Redo Buffers 2924544 bytes
Database mounted.
Database opened.
SQL> select userenv('language') from dual;
USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.AL32UTF8
SQL> conn roger/roger
Connected.
SQL> analyze table ht01 compute statistics;
Table analyzed.
SQL> select owner,segment_name,bytes/1024/1024
2 from dba_segments
3 where segment_name='HT01';
OWNER SEGMENT_NAME BYTES/1024/1024
------------------------------ -------------------- ---------------
ROGER HT01 19
SQL> select dump('海天') from dual;
DUMP('????')
--------------------------------------------------------------
Typ=96 Len=12: 239,191,189,239,191,189,239,191,189,239,191,189
SQL> select length('海天') from dual;
LENGTH('????')
--------------
4
SQL> SELECT DISTINCT z.owner_name
2 || '.'
3 || z.table_name
4 || '('
5 || z.column_name
6 || ') - '
7 || z.column_type
8 || ' ' LossyColumns
9 FROM csmig.csmv$errors z
10 WHERE z.error_type ='DATA_LOSS'
11 ORDER BY LossyColumns
12 /
no rows selected
最后转换为 以后,发生数据量大小仍然没有变化,请教了一下老白,他说还需要执行什么脚本?
我找遍了所有的文档也没发现,最后无奈只能看看这个csalter.plb的内容了:
[oracle@roger admin]$ more csalter.plb
set serveroutput on;
/* Insert common schema into csm$dictusers.
All other data dictionary schema is already inserted into csm$dictuses
when csm$dictuser is created.
*/
insert into csm$dictusers
select distinct u.user#, u.name from sys.user$ u
where u.name in ('HR', 'OE', 'SH', 'PM');
/* this function runs a DML statement. It returns how many rows updated */
create or replace function csm$rundml wrapped
a000000
1
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
abcd
8
102 f3
Ku7Of/ePHTQvbCjNgzvJsAv6Wegwg1zQLcusZ3RAkBAYy8vIL1sftvQddpSWNxxhv9vwNYBS
UM75HMFscMa79rVhSYCXJ4S4EtATscFCLCz0castOwK8cDLRMJzhjDJZDvrg8HSjPKaXf1mU
OLmzKFZp9A0bluiLavPlwo2vsCAuByEwGxtH11HVVHwIG9Yw6WFLcyKyzXTZfDRRwN9ZY7Xe
5X+sUnLYhoI/ggmNZLznejUF
/
/* This function create a sql statment to update the clob/nvarchar2 for a table
The sql statement will be something like:
UPDATE USERNAME.TABLENAME SET
COL1 := CONVERT(COL1, TOCHAR, FROMCHAR'), COL2:=CONVERT(COL2, TOCHAR, FROMCHAR)
WHERE ROWID >=MINROWID AND ROWID 'Y') then
return;
end if;
csm$main();
return;
end;
/
declare
svalue varchar2(30);
sqlstmt varchar2(1024);
tochar varchar2(30);
rowcnt number;
begin
select value into svalue from csm$parameters where name='CSALTER_FINISHED';
if (svalue <> 'TRUE') then
dbms_output.put_line('Checking or Converting phrase did not finish successfully');
dbms_output.put_line('No database (national) character set will be altered');
dbms_output.put_line('CSALTER finished unsuccessfully.');
return;
end if;
select value into svalue from csm$parameters where name='SCAN_CHAR';
if (svalue = 'YES') then
dbms_output.put_line('Alter the database character set...');
select value into tochar from csm$parameters where name='TO_CHARSET_NAME';
rowcnt := csm$rundml('alter database character set internal_use '||tochar);
commit;
end if;
select value into svalue from csm$parameters where name='SCAN_NCHAR';
if (svalue = 'YES') then
dbms_output.put_line('Alter the national database character set...');
select value into tochar from csm$parameters where name='TO_NCHARSET_NAME';
rowcnt := csm$rundml('alter database national character set internal_use '||tochar);
commit;
end if;
dbms_output.put_line('CSALTER operation completed, please restart database');
end;
/
/* restore csm$dictuser */
delete from csm$dictusers where username in ('PM', 'SH', 'HR', 'OE');
drop function csm$rundml;
drop function csm$createsql;
drop procedure csm$main;
郁闷的是这几个主要的function都加密的,我汗。。。。
那么对于新创建的表呢?是如何存储的?下面来看看,
SQL> conn roger/roger
Connected.
SQL> create table ht03 as select * from ht01;
Table created.
SQL> set lines 120
SQL> col segment_name for a20
SQL> select owner,segment_name,bytes/1024/1024
2 from dba_segments
3 where owner='ROGER';
OWNER SEGMENT_NAME BYTES/1024/1024
------------------------------ -------------------- ---------------
ROGER HT03 13
ROGER HT02 .0625
ROGER HT01 19
我们发现ht03跟ht01完全一样,只是ht01是做字符集转换以前建的,ht03是字符集转换以后再
创建的,当前的数据库字符集为AL32UTF8。
由于前面也转换过为UTF8,那么我们有必要来看看UTF8和AL32UTF8的区别:
SQL> select dump(convert('测试', 'AL32UTF8')) from dual;
DUMP(CONVERT('测试','AL32UTF8
----------------------------
Typ=1 Len=4: 178,226,202,212
SQL> select dump(convert('测试', 'ZHS16GBK')) from dual;
DUMP(CONVERT('测试','ZHS
-----------------------
Typ=1 Len=3: 63,163,191
SQL> select dump(convert('测试', 'UTF8')) from dual;
DUMP(CONVERT('测试','UTF8'))
------------------------------------
Typ=1 Len=6: 239,191,189,239,191,189
The difference between UTF8 and AL32UTF8 are:
UTF8 stores Unicode characters with code points > U+FFFF as two surrogate characters, three bytes each
AL32UTF8 stores Unicode characters with code points > U+FFFF as one four-byte character
UTF8 will not be updated anymore when new Unicode versions are released, only AL32UTF8 and AL16UTF16 will.
Due to compatibility problems with pre-9i versions use UTF8 if you have Oracle8i clients connecting to the database. Use AL32UTF8 in pure Oracle9i environment.
UTF-8 encoding is variable-width. In UTF-8, each character can be represented by either one, two, or three bytes.
UTF8 is a varying width 1-3 bytes per character Unicode encoding. It is supported for both database and national character sets. It is a binary superset of US7ASCII. UTF8 corresponds to Unicode CESU-8 encoding.
AL32UTF8 is a varying width 1-4 bytes per character. It is supported for CHAR, VARCHAR2, LONG and CLOB only (database character set). It is a binary superset of UTF8 (in 9.2 only) and US7ASCII. AL32UTF8 corresponds to Unicode UTF-8 encoding.
This is what Metalink says. In Note: 237593.1
There is a possible problem for 817 and lower versions: Problems connecting to AL32UTF8 databases from older versions 8i and lower.
The default UTF8 characterset for 9i/10G is AL32UTF8, however this characterset is NOT recognised by any pre-9i clients/server systems.
Oracle recommends that you use UTF8 instead of AL32UTF8 as database characterset if you have 8i (or older) servers and clients connecting to the 9i/10g system until you can upgrade the older versions.
UTF8 is unicode revision 3.0 in 8.1.7 and up. AL32UTF8 is Unicode 3.0 in 9.0.1, Unicode 3.1 in 9.2, Unicode 3.2 in 10.1 and Unicode 4.01 in 10.2
Besides the difference in Unicode version the "big difference" is that AL32UTF8 has build in support for "Surrogate Pairs" (also known as Surrogate characters or "Supplementary characters").
Practically this means that in 99% of the cases you can use UTF8 instead of AL32UTF8 without any problem.
There are only a few situations where Surrogate Pairs are already used on client side. Windows system with HKSCS2001 (hong kong extension) is one of those. Note that you actually can *store* Surrogate Pairs in UTF8 but will store 2 * 3 byte characters and not like AL32UTF8 one 4 byte character.
Note that if you now use UTF8 as database characterset and -in the future you do a roll out of new 9i or higher clients and all your other databases are upgraded to 9i or higher, you can simply do a alter database characterset to go from UTF8 to AL32UTF8 so downtime
will be limited to a few minutes if the need to go to AL32UTF8 should arise. There is no performance impact on staying on UTF8
NOTE:
This note is ONLY relevant if you have already a 9i AL32UTF8 database with data in. If you still need to create the 9i system then choose UTF8 instead of AL32UTF8 as database characterset in the database creation assistant.
So, *IF* you have already a 9i system running with AL32UTF8 then you can use the following steps in this note to change the database characterset to UTF8 without losing data.
You can't simply use "ALTER DATABASE CHARACTERSET" to go from AL32UTF8 to UTF8 because UTF8 is a SUB-set of AL32UTF8 (some codepoints which are correct in AL32UTF8 are not known in UTF8)
But again, UTF8 *contains* all characters know in AL32UTF8, the difference between them is pure the way some characters are stored (AL32UTF8 is a bit more efficient for some characters)
So you will run into ORA-12712 if you try alter database ...
SQL> create table ht04(name varchar2(12));
Table created.
SQL> insert into ht04 values('北京上海');
1 row created.
SQL> insert into ht04 values('北京上海a');
1 row created.
SQL> insert into ht04 values('北京上海ab');
1 row created.
SQL> insert into ht04 values('北京上海abc');
1 row created.
SQL> insert into ht04 values('北京上海abcd');
1 row created.
SQL> insert into ht04 values('北京上海abcde');
insert into ht04 values('北京上海abcde')
*
ERROR at line 1:
ORA-12899: value too large for column "ROGER"."HT04"."NAME" (actual: 13,
maximum: 12)
SQL> insert into ht04 values('北京上海人');
1 row created.
SQL> insert into ht04 values('北京上海人权');
1 row created.
SQL> insert into ht04 values('北京上海人权宣');
insert into ht04 values('北京上海人权宣')
*
ERROR at line 1:
ORA-12899: value too large for column "ROGER"."HT04"."NAME" (actual: 14,
maximum: 12)
SQL> commit;
Commit complete.
SQL> select * from ht04;
NAME
------------
北京上海
北京上海a
北京上海ab
北京上海abc
北京上海abcd
北京上海人
北京上海人权
7 rows selected.
SQL> select dump(name) from ht04;
DUMP(NAME)
--------------------------------------------------------------------------------
Typ=1 Len=8: 177,177,190,169,201,207,186,163
Typ=1 Len=9: 177,177,190,169,201,207,186,163,97
Typ=1 Len=10: 177,177,190,169,201,207,186,163,97,98
Typ=1 Len=11: 177,177,190,169,201,207,186,163,97,98,99
Typ=1 Len=12: 177,177,190,169,201,207,186,163,97,98,99,100
Typ=1 Len=10: 177,177,190,169,201,207,186,163,200,203
Typ=1 Len=12: 177,177,190,169,201,207,186,163,200,203,200,168
7 rows selected.
这个问题纠结了我2天,最后恍然大悟,其实跟字段类型有关系,要看一个数据库字符集
转换以后,容量是否会发生变化,我们通过scan.txt来看就行了,如下:
[Database Size]
Tablespace Used Free Total Expansion
------------------------- --------------- --------------- --------------- ---------------
SYSTEM 473.00M 7.00M 480.00M .00K
UNDOTBS1 21.25M 3.75M 25.00M .00K
SYSAUX 165.13M 74.88M 240.00M .00K
TEMP .00K .00K .00K .00K
USERS 448.00K 4.56M 5.00M .00K
ROGER 64.00K 199.94M 200.00M .00K
------------------------- --------------- --------------- --------------- ---------------
Total 659.88M 290.13M 950.00M .00K
由于前面的测试表字段类型都是varchar2,我们也值得varchar2是可变的,
UTF8 是1~3 个bytes
AL32UTF8 是1~4 个bytes
从上面的scan.txt来看,就能发现字符集转换以后,ht01 ht02所在的表空间roger完全没有变化。
发表回复