Category Archives: Uncategorized

Find the Datafile Checkpoint SCN in Linux

Português/English

Today I was asked how to find the datafile checkpoint SCN in Linux, reading the datafile directly. Of course, you have other ways to do it easier if you can use Oracle tools, either using RMAN ou querying V$ views. But, if you just have the datafile and for whatever reason you can’t use Oracle tools, here is a shell script to find the SCN:

SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file#=1;

CHECKPOINT_CHANGE#
------------------
           2502608

1 row selected.

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

$ file="system01.dbf"
$ a=`od -x -N 10000 $file | grep ^0020740 | tr -s " " | cut -f5 -d" "`
$ b=`od -x -N 10000 $file | grep ^0020740 | tr -s " " | cut -f4 -d" "` 
$ printf "%d\n" 0x$a$b
2502608

I tested it against 11g and 12c datafiles. I have no idea if it works also in older versions, or will work in newer ones. It depends on Oracle not changing the absolute position of the SCN in the datafile header.

I also had no opportunity to test it in a Big Endian platform, like AIX. I guess you should just switch $a and $b variables in the code for this to work.

If you can test it please let me know the results, especially in a Big Endian platform.

Regards.


Encontre o Checkpoint SCN de um Datafile no Linux

Português/English

Hoje me perguntaram como encontrar o SCN do último checkpoint de um datafile no Linux, lendo o arquivo diretamente. Claro que há maneiras mais fáceis se você puder usar ferramentas Oracle, como o RMAN ou consultando as visões V$. Mas, se você só tem o datafile e por qualquer razão não pode usar as ferramentas Oracle no momento, aqui está um shell script para encontrar o SCN:

SQL> select CHECKPOINT_CHANGE# from v$datafile_header where file#=1;

CHECKPOINT_CHANGE#
------------------
           2502608

1 row selected.

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

$ file="system01.dbf"
$ a=`od -x -N 10000 $file | grep ^0020740 | tr -s " " | cut -f5 -d" "`
$ b=`od -x -N 10000 $file | grep ^0020740 | tr -s " " | cut -f4 -d" "` 
$ printf "%d\n" 0x$a$b
2502608

Eu testei em datafiles 11g e 12c. Não tenho ideia se funciona em versões mais antigas, ou se vai continuar funcionando nas mais novas. Depende de a Oracle não mudar a posição absoluta do SCN no cabeçalho do datafile.

Eu também não tive a oportunidade de testar em um ambiente Big Endian como AIX. Eu acredito que basta inverter os valores das variáveis $a e $b para funcionar.

Se você puder testar em seu banco por favor me diga se funcionou, especialmente em plataforma Big Endian.

Abraços.

Advertisements

Understanding CPU Metrics in a Linux environment

Português/English

The purpose of this blog post is just compile some information I have been around in the last weeks, while producing reports on some Oracle database environments for one of our clients.

When it comes to CPU metrics, there is always some confusion about the real understanding of each metric.

I am not an Operating System specialist, so my intention here is not produce a long article explaining all the details. I will just put together some notes that would be useful to me in the future, and hopefully for someone else that reads it. I will also point some links that explains better the metrics. They helped me a lot to understand them.

CPU Metrics

The CPU has 3 main states:

sy (Sys): time the CPUs spent running system (kernel-related, operating system) processes.
us (User): time the CPUs spent running user (i.e., not kernel-related) processes.
id (Idle): time the CPUs were doing nothing.

But they have other sub-types, to compose the seven metrics shown in the “top” command:

ni (Nice): time the CPUs spent running user processes that have been niced.
wa (I/O Wait): time the CPUs were doing nothing, while there were runnable processes waiting for I/O operations to complete.
hi (Hardware Interrupts): time the CPUs spent dealing with hardware interrupts.
si (Software Interrupts): time the CPUs spent dealing with software interrupts.
st (Steal Time): (only for Virtual Machines) time the virtual CPUs spent waiting to be served with a real CPU from the Hypervisor.

The sum of those seven metrics is 100%.

Load Average

Load Average is the average number of runnable processes during a period of time. This includes ALL runnable processes:

  • Those that are really running.
  • Those that are waiting for a CPU to become available.
  • Those that are waiting for an I/O operation to complete.

The number usually agreed as a maximum good number for Load Average is 70% of the CPU Threads. So, if you have a server with, say, 4 CPU cores with 2 threads each, the Load Average should not be higher than 5.6 (70% of 8).

My Summary and Conclusions

  • Beware of average values. A “good” average could hide huge and even long spikes.
  • If the Load Average is between 70% and 100% of the CPU Threads, then probably you have processes wasting part of their time waiting for CPU.
  • If the Load Average is over 100% of the CPU Threads, then you probably is having huge CPU bottlenecks.
  • If you have high Load Average and low CPU Idle, then you have a CPU bottleneck in that moment/period.
  • If you have high Load Average, high I/O Wait Time but also high Idle Time, then the processes in the run queue are really waiting for I/O, not CPU.
  • If you don’t have I/O Wait Time, it does not mean you are not having I/O bottlenecks.
    • The I/O Wait Time only show up in the CPU metrics when there are some CPU Idle.
    • If the CPUs are all loaded, then I/O Wait is ZERO even if there are processes waiting for I/O. In this case, these processes would show up in the Load Average.

Links

Understanding Linux CPU stats

Understanding Linux CPU Load – when should you be worried?

Linux performance metric myths (Load Average, % IO Wait)

That’s it. If you find something wrong here, or want to add something, let me know.

I will update this post whenever I find more important information related.

See you!


Entendendo as métricas de CPU em um ambiente Linux

O propósito deste post é apenas compilar algumas informações que eu estive pesquisando nas últimas semanas, enquanto estava produzindo um relatório sobre alguns ambientes Oracle de um de nossos clientes.

Quando se trata de métricas de CPU, sempre há alguma confusão sobre o real significado de cada métrica.

Eu não sou um especialista em Sistema Operacional, e portanto minha intenção não é escrever um artigo longo explicando todos os detalhes. Eu vou apenas juntar algumas notas que me podem ser úteis no futuro, e espero que também sejam úteis para alguém mais que leia isto. Colocarei alguns links que explicam melhor as métricas. Eles me ajudaram muito a entendê-las.

Métricas de CPU

A CPU tem 3 estados principais:

sy (Sys): tempo que as CPUs gastaram executando processos do sistema (relacionados com o kernel do Sistema Operacional).
us (User): tempo que as CPUs gastaram executando processos de usuários (isto é, não relacionados diretamente com o funcionamento do Sistema Operacional).
id (Idle): tempo em que as CPUs não estavam executando nada.

Mas eles têm outros sub-tipos, que compõe as sete métricas exibidas no comando “top”:

ni (Nice): tempo que as CPUs gastaram executando processos que receberam prioridade (nice).
wa (I/O Wait): tempo em que as CPUs não estavam executando nada, mas havia processos esperando por operações de I/O terminarem.
hi (Hardware Interrupts): tempo que as CPUs gastaram lidando com interrupções de hardware.
si (Software Interrupts): tempo que as CPUs gastaram lidando com interrupções de software.
st (Steal Time): (somente para Máquinas Virtuais) tempo que as CPUs virtuais perderam esperando que o Hypervisor disponibilizasse uma CPU real.

A soma destas sete métricas é 100%.

Load Average

O Load Average é a quantidade média de processos prontos para executar em um período de tempo. Isto inclui TODOS os processos considerados prontos para executar:

  • Aqueles realmente executando.
  • Aqueles que estavam esperando por uma CPU ficar disponível.
  • Aqueles que estavam esperando por operações de I/O.

O número normalmente tido como consenso como máximo aceitável para o Load Average é o equivalente a 70% do número de CPU Threads. Então, se você tem um servidor com, digamos, 4 CPU cores e 2 threads em cada CPU, o Load Average não deveria ser maior do que 5.6 (70% de 8).

Meu Resumo e Conclusões

  • Cuidado com valores médios. Uma média “boa” pode esconder picos enormes e longos.
  • Se o Load Average está entre 70% e 100% das CPU Threads, então provavelmente você já tem processos perdendo parte do seu tempo esperando por CPU.
  • Se o Load Average está acima de 100% das the CPU Threads, então você provavelmente tem um sério gargalo de CPU.
  • Se você tem um Load Average alto e baixo CPU Idle, então você tem um gargalo de CPU naquele período/momento.
  • Se você tem um Load Average alto, I/O Wait Time alto mas também Idle Time alto, então os processos na fila estão na verdade esperando por I/O, não por CPU.
  • Se você não tem I/O Wait Time, isso não significa que você não tenha gargalos de I/O.
    • O I/O Wait Time só aparece nas métricas de CPU quando há tempo de CPU Idle.
    • Se as CPUs estão todas sobrecarregadas, então o I/O Wait será ZERO mesmo que você tenha processos esperando por I/O. Neste caso, estes processos devem aparecer no Load Average.

Links

Understanding Linux CPU stats

Understanding Linux CPU Load – when should you be worried?

Linux performance metric myths (Load Average, % IO Wait)

É isso. Se você encontrar algo errado aqui, ou quiser adicionar alguma coisa, me avise.

Eu vou atualizar este post sempre que encontrar alguma informação importante relacionada.

Até a próxima!

Use Statspack in 11g/12c (and show SQL_ID in the Report)

Português/English

In the last weeks I am preparing a performance report to a client that does not have Diagnostics Pack licenses :-(. This means we cannot query any data from AWR nor ASH, and it makes things much more difficult in terms of diagnosing and solving performance issues.

In this situation, one of the tools available to get AWR-like information is the old Statspack (which is actually the antecessor of AWR). Yes, it still works in 11g and 12c.

Statspack reports are very similar to those produced by its newer brother AWR. Of course, there are some sessions and much more detail in AWR that is not present in Statspack.

But one of my main concerns when I tried to used Statspack is that it does not show SQL_IDs in the report. It only shows OLD_HASH_VALUE to help identifying a query.

Well, I’ve made some tests and realized that Statspack already saves the SQL_ID for the queries captured. It just don’t show it in the report (as of 11.2.0.1, I didn’t test it in newer versions):

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

5 rows selected.

SQL> > desc stats$sqltext
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OLD_HASH_VALUE                            NOT NULL NUMBER
 TEXT_SUBSET                               NOT NULL VARCHAR2(31)
 PIECE                                     NOT NULL NUMBER
 SQL_ID                                             VARCHAR2(13)
 SQL_TEXT                                           VARCHAR2(64)
 ADDRESS                                            RAW(8)
 COMMAND_TYPE                                       NUMBER
 LAST_SNAP_ID                                       NUMBER

So what I did was to tweak the original Statspack report to show SQL_IDs for the queries.

Here is the original report:

^LSQL ordered by CPU  DB/Inst: XXXX/XXXX  Snaps: 1-2
-> Total DB CPU (s):               1
-> Captured SQL accounts for   99.8% of Total DB CPU
-> SQL reported below exceeded  1.0% of Total DB CPU

    CPU                  CPU per             Elapsd                     Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)    Buffer Gets  Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
      0.80            1       0.80   99.6       1.81          15,830 2522684317
Module: sqlplus@localhost.localdomain (TNS V1-V3)
BEGIN statspack.snap; END;

          -------------------------------------------------------------
^LSQL ordered by Elapsed time for DB: XXXX  Instance: XXXX  Snaps: 1 -2
-> Total DB Time (s):               2
-> Captured SQL accounts for   99.9% of Total DB Time

  Elapsed                Elap per            CPU                        Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)  Physical Reads Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
      1.81            1       1.81   99.7       0.80              19 2522684317
Module: sqlplus@localhost.localdomain (TNS V1-V3)
BEGIN statspack.snap; END;

          -------------------------------------------------------------

And here is the one with my changes:

^LSQL ordered by CPU  DB/Inst: XXXX/XXXX  Snaps: 1-2
-> Total DB CPU (s):               1
-> Captured SQL accounts for   99.8% of Total DB CPU
-> SQL reported below exceeded  1.0% of Total DB CPU

    CPU                  CPU per             Elapsd                     Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)    Buffer Gets  Hash Value        SQL_ID
---------- ------------ ---------- ------ ---------- --------------- ---------- -------------
      0.80            1       0.80   99.6       1.81          15,830 2522684317 4qk3ay7bq4pab
Module: sqlplus@localhost.localdomain (TNS V1-V3)
BEGIN statspack.snap; END;

                 -------------------------------------------------------------
^LSQL ordered by Elapsed time for DB: XXXX  Instance: XXXX  Snaps: 1 -2
-> Total DB Time (s):               2
-> Captured SQL accounts for   99.9% of Total DB Time
-> SQL reported below exceeded  1.0% of Total DB Time

  Elapsed                Elap per            CPU                        Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)  Physical Reads Hash Value        SQL_ID
---------- ------------ ---------- ------ ---------- --------------- ---------- -------------
      1.81            1       1.81   99.7       0.80              19 2522684317 4qk3ay7bq4pab
Module: sqlplus@localhost.localdomain (TNS V1-V3)
BEGIN statspack.snap; END;

                 -------------------------------------------------------------

The changes were done in two files: sprepcon.sql (just the linesize set) and sprepins.sql.

As I already mentioned, the changes were made based on files gathered from a 11.2.0.1 installation. So, maybe the changes to make may differ in other versions. Please verify and of course save a backup of the files before making any change.

I built a table explaining every change I’ve made, how to make it, and which automatic shell command could you use to perform the changes. I was not able to make all the changes automatically, some of them I had to make manually (I didn’t spend much time struggling with shell script specifics to make all work).

Here is the table:

FILE         CONTENT TO FIND                    REPLACE THIS WITH THIS                                                                 BASH COMMAND
------------ ----------------------------------	------------ ------------------------------------------------------------------------  --------------------------------------------------------------------------------------------------------
sprepcon.sql define linesize_fmt = 80           entire row   define linesize_fmt = 94                                                  sed -i 's/linesize_fmt = 80/linesize_fmt = 94/g' sprepcon.sql
sprepins.sql col aa format a80 heading -        entire row   col aa format a100 heading -                                              sed -i 's/col aa format a80/col aa format a100/g' sprepins.sql
sprepins.sql rpad('Module: '||topn.module,80    entire row   rpad('Module: '||topn.module,94                                           sed -i "s/topn\.module\,80/topn\.module\,94/g" sprepins.sql
sprepins.sql col hashval   format 99999999999   Hash Value   Hash_Value                                                                
sprepins.sql Hash Value                         Hash Value   Hash Value        SQL_ID                                                  sed -i "s/Hash Value/Hash Value        SQL_ID/g" sprepins.sql
sprepins.sql lpad(topn.old_hash_value           entire row   lpad(topn.old_hash_value,10)  || ' ' || lpad(st.sql_id, 13)  || ' ' ||    
sprepins.sql col aa format                      none         include at the end of the string: " -------------" (space + 13 dashes)
sprepins.sql truncate table STATS$TEMP_SQLSTATS entire row   truncate table perfstat.STATS$TEMP_SQLSTATS                               sed -i "s/truncate table STATS$TEMP_SQLSTATS/truncate table perfstat.STATS$TEMP_SQLSTATS/g" sprepins.sql

That’s it! I hope you enjoy 😉 .


Use o Statspack no 11g/12c (e mostre o SQL_ID no relatório)

Português/English

Nas últimas semanas estou trabalhando em um relatório para um cliente que não tem licenças do Diagnostics Pack :-(. Isso significa que não podemos consultar nenhum dado do AWR ou ASH, o que torna muito mais difícil diagnosticar s resolver problemas de performance.

Nesta situação, uma das ferramentas disponíveis para obter informações semelhantes às do AWR é o Statspack (que na verdade é o antecessor do AWR). Sim, ele ainda funciona no 11g e 12c.

Os relatórios do Statspack são muito semelhantes aos produzidos pelo seu irmão mais novo AWR. Claro, há algumas seções e muito mais detalhes no AWR que não estão presentes no Statspack.

Mas a minha maior reclamação quando tentei usar o Statspack é que ele não mostra o SQL_IDs no relatório. Ele mostra apenas o OLD_HASH_VALUE para ajudar a identificar uma query.

Bem, eu fiz alguns testes e descobri que na verdade o Statspack já salva o SQL_ID para as queries capturadas. Ele apenas não mostra esta informação no relatório (pelo menos na versão 11.2.0.1, eu não testei em versões mais novas):

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

5 rows selected.

SQL> > desc stats$sqltext
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OLD_HASH_VALUE                            NOT NULL NUMBER
 TEXT_SUBSET                               NOT NULL VARCHAR2(31)
 PIECE                                     NOT NULL NUMBER
 SQL_ID                                             VARCHAR2(13)
 SQL_TEXT                                           VARCHAR2(64)
 ADDRESS                                            RAW(8)
 COMMAND_TYPE                                       NUMBER
 LAST_SNAP_ID                                       NUMBER

Então o que eu fiz foi ajustar o relatório original do Statspack para mostrar o SQL_IDs para as queries.

Aqui está o relatório original:

^LSQL ordered by CPU  DB/Inst: XXXX/XXXX  Snaps: 1-2
-> Total DB CPU (s):               1
-> Captured SQL accounts for   99.8% of Total DB CPU
-> SQL reported below exceeded  1.0% of Total DB CPU

    CPU                  CPU per             Elapsd                     Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)    Buffer Gets  Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
      0.80            1       0.80   99.6       1.81          15,830 2522684317
Module: sqlplus@localhost.localdomain (TNS V1-V3)
BEGIN statspack.snap; END;

          -------------------------------------------------------------
^LSQL ordered by Elapsed time for DB: XXXX  Instance: XXXX  Snaps: 1 -2
-> Total DB Time (s):               2
-> Captured SQL accounts for   99.9% of Total DB Time

  Elapsed                Elap per            CPU                        Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)  Physical Reads Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
      1.81            1       1.81   99.7       0.80              19 2522684317
Module: sqlplus@localhost.localdomain (TNS V1-V3)
BEGIN statspack.snap; END;

          -------------------------------------------------------------

E aqui o com as minhas alterações:

^LSQL ordered by CPU  DB/Inst: XXXX/XXXX  Snaps: 1-2
-> Total DB CPU (s):               1
-> Captured SQL accounts for   99.8% of Total DB CPU
-> SQL reported below exceeded  1.0% of Total DB CPU

    CPU                  CPU per             Elapsd                     Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)    Buffer Gets  Hash Value        SQL_ID
---------- ------------ ---------- ------ ---------- --------------- ---------- -------------
      0.80            1       0.80   99.6       1.81          15,830 2522684317 4qk3ay7bq4pab
Module: sqlplus@localhost.localdomain (TNS V1-V3)
BEGIN statspack.snap; END;

                 -------------------------------------------------------------
^LSQL ordered by Elapsed time for DB: XXXX  Instance: XXXX  Snaps: 1 -2
-> Total DB Time (s):               2
-> Captured SQL accounts for   99.9% of Total DB Time
-> SQL reported below exceeded  1.0% of Total DB Time

  Elapsed                Elap per            CPU                        Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)  Physical Reads Hash Value        SQL_ID
---------- ------------ ---------- ------ ---------- --------------- ---------- -------------
      1.81            1       1.81   99.7       0.80              19 2522684317 4qk3ay7bq4pab
Module: sqlplus@localhost.localdomain (TNS V1-V3)
BEGIN statspack.snap; END;

                 -------------------------------------------------------------

As mudanças foram feitas em dois arquivos: sprepcon.sql (somente o linesize) e sprepins.sql.

Como já mencionei, as mudanças foram feitas com base em arquivos extraídos de uma instalação 11.2.0.1. Então, talvez as mudanças a fazer possam ser diferentes em outras versões. Por favor verifique e claro faça um backup dos arquivos antes de qualquer alteração.

Eu criei uma tabela mostrando cada mudança que fiz, como fazê-la, e qual comando shell pode ser usado para fazer a alteração automaticamente. Eu não consegui fazer todas as alterações automaticamente, algumas eu tive que fazer na mão mesmo (não gastei muito tempo brigando com o shell script para fazer todas funcionarem).

Aqui está a tabela:

ARQUIVO      CONTEÚDO A PROCURAR                TROCAR ISTO  POR ISTO                                                                  COMMAND BASH
------------ ----------------------------------	------------ ------------------------------------------------------------------------  --------------------------------------------------------------------------------------------------------
sprepcon.sql define linesize_fmt = 80           linha toda   define linesize_fmt = 94                                                  sed -i 's/linesize_fmt = 80/linesize_fmt = 94/g' sprepcon.sql
sprepins.sql col aa format a80 heading -        linha toda   col aa format a100 heading -                                              sed -i 's/col aa format a80/col aa format a100/g' sprepins.sql
sprepins.sql rpad('Module: '||topn.module,80    linha toda   rpad('Module: '||topn.module,94                                           sed -i "s/topn\.module\,80/topn\.module\,94/g" sprepins.sql
sprepins.sql col hashval   format 99999999999   Hash Value   Hash_Value                                                                
sprepins.sql Hash Value                         Hash Value   Hash Value        SQL_ID                                                  sed -i "s/Hash Value/Hash Value        SQL_ID/g" sprepins.sql
sprepins.sql lpad(topn.old_hash_value           linha toda   lpad(topn.old_hash_value,10)  || ' ' || lpad(st.sql_id, 13)  || ' ' ||    
sprepins.sql col aa format                      nada         incluir no fim da string: " -------------" (space + 13 dashes)
sprepins.sql truncate table STATS$TEMP_SQLSTATS linha toda   truncate table perfstat.STATS$TEMP_SQLSTATS                               sed -i "s/truncate table STATS$TEMP_SQLSTATS/truncate table perfstat.STATS$TEMP_SQLSTATS/g" sprepins.sql

É isso! Espero que gostem 😉 .

Zone Maps in Oracle 11g / non-Exadata

Português/English

After my last post, my boss and great Oracle guy Rodrigo Righetti told me he started to build a kind of manual Zone Maps for Oracle 11g in non-Exadata environments. But his work is not finished yet due to lack of time 😦 . So, I was inspired to do what is exposed here.

If you don’t know Zone Maps and Attribute Clustering, they are exciting new features that work only in 12c/Exadata environments. This blog post covers very well the topic and was source for my research.

The idea is to build Zone Maps for a tabela, that would store MIN/MAX values for a column and could help in pruning regions of a table in a FULL TABLE SCAN operation. Only the regions possibly having needed rows would be scanned.

So, let’s start.

Drop the tables to start a fresh test

SQL> drop table scott.invoices;

Table dropped.

SQL> drop table scott.invoices_ac;

Table dropped.

SQL> drop table scott.invoices_ac_zone_maps;

Table dropped.

Create and populate the sample table

SQL> create table scott.invoices (invoice_id number, region number, invoice_value number(10,2));

Table created.

SQL> begin
 for i in 1..10 loop
 insert into scott.invoices
 select rownum+100000*(i-1), mod(rownum,1000), dbms_random.value(100,1000)
 from dual connect by level <= 100000;
 end loop;
end;
/

PL/SQL procedure successfully completed.

Create the Attribute Clustered table

In Oracle 12c, you can change an attribute on a table to make it clustered by a column (or columns). After it, all the new data inserted into the table will be clustered based on that column value; if the table already has data, it should be moved (please see this blog post for more details).

Here, in Oracle 11g, we don’t have this feature. So, the table should be clustered manually, by inserting data in an ordered fashion based on the target column value.

To simulate a clustered table, I will create a copy of my sample table, but inserting the data ordered:

SQL> create table scott.invoices_ac as select * from scott.invoices order by region;

Table created.

SQL> exec dbms_stats.gather_table_stats('SCOTT','INVOICES_AC')

PL/SQL procedure successfully completed.

Create the Zone Maps table

I will reuse part of the query I created in my previous post, and create the Zone Maps table storing the EXTEND_ID, MIN/MAX ROWID, MIN/MAX column value and the target COLUMN_NAME as well:

SQL> create table scott.invoices_ac_zone_maps as
select 'REGION' column_name, extent_id, min_rowid, max_rowid, min(REGION) min_value, max(REGION) max_value
from
(
  select
    o.owner, o.object_name table_name, e.extent_id,
    dbms_rowid.rowid_create(1,o.data_object_id,e.file_id,e.block_id,0) min_rowid,
    dbms_rowid.rowid_create(1,o.data_object_id,e.file_id,e.block_id+e.blocks-1,32767) max_rowid
  from dba_objects o, dba_extents e 
  where e.segment_name = o.object_name and e.owner = o.owner
  and o.object_name = 'INVOICES_AC' and o.owner = 'SCOTT'
  and e.PARTITION_NAME is null
) a
join SCOTT.INVOICES_AC b on b.rowid between a.min_rowid and a.max_rowid
group by owner, table_name, extent_id, min_rowid, max_rowid
/

Table created.

Check the Zone Maps table

SQL> col owner for a10
SQL> col table_name for a20
SQL> col column_name for a6
SQL> set linesize 200
SQL> select * from scott.invoices_ac_zone_maps order by extent_id;

COLUMN EXTENT_ID MIN_ROWID          MAX_ROWID          MIN_VALUE MAX_VALUE
------ --------- ------------------ ------------------ --------- ---------
REGION         0 AAASNjAAEAAAAIIAAA AAASNjAAEAAAAIPH//         0         2
REGION         1 AAASNjAAEAAAAIQAAA AAASNjAAEAAAAIXH//         2         5
REGION         2 AAASNjAAEAAAAIYAAA AAASNjAAEAAAAIfH//         5         8
REGION         3 AAASNjAAEAAAAIgAAA AAASNjAAEAAAAInH//         8        11
REGION         4 AAASNjAAEAAAAIoAAA AAASNjAAEAAAAIvH//        11        14
REGION         5 AAASNjAAEAAAAIwAAA AAASNjAAEAAAAI3H//        14        17
REGION         6 AAASNjAAEAAAAI4AAA AAASNjAAEAAAAI/H//        17        20
REGION         7 AAASNjAAEAAAAJAAAA AAASNjAAEAAAAJHH//        20        23
REGION         8 AAASNjAAEAAAAJIAAA AAASNjAAEAAAAJPH//        23        26
REGION         9 AAASNjAAEAAAAJQAAA AAASNjAAEAAAAJXH//        26        29
REGION        10 AAASNjAAEAAAAJYAAA AAASNjAAEAAAAJfH//        29        32
REGION        11 AAASNjAAEAAAAJgAAA AAASNjAAEAAAAJnH//        32        35
REGION        12 AAASNjAAEAAAAJoAAA AAASNjAAEAAAAJvH//        35        38
REGION        13 AAASNjAAEAAAAJwAAA AAASNjAAEAAAAJ3H//        38        41
REGION        14 AAASNjAAEAAAAJ4AAA AAASNjAAEAAAAJ/H//        41        44
REGION        15 AAASNjAAEAAAAKAAAA AAASNjAAEAAAAKHH//        44        47
REGION        16 AAASNjAAEAAAA0AAAA AAASNjAAEAAAA1/H//        47        98
REGION        17 AAASNjAAEAAAA2AAAA AAASNjAAEAAAA3/H//        98       146
REGION        18 AAASNjAAEAAAA6AAAA AAASNjAAEAAAA7/H//       146       194
REGION        19 AAASNjAAEAAAA8AAAA AAASNjAAEAAAA9/H//       194       242
REGION        20 AAASNjAAEAAAA+AAAA AAASNjAAEAAAA//H//       242       290
REGION        21 AAASNjAAEAAABAAAAA AAASNjAAEAAABB/H//       290       338
REGION        22 AAASNjAAEAAABCAAAA AAASNjAAEAAABD/H//       338       386
REGION        23 AAASNjAAEAAABEAAAA AAASNjAAEAAABF/H//       386       434
REGION        24 AAASNjAAEAAABGAAAA AAASNjAAEAAABH/H//       434       482
REGION        25 AAASNjAAEAAABIAAAA AAASNjAAEAAABJ/H//       482       530
REGION        26 AAASNjAAEAAABKAAAA AAASNjAAEAAABL/H//       530       578
REGION        27 AAASNjAAEAAABMAAAA AAASNjAAEAAABN/H//       578       626
REGION        28 AAASNjAAEAAABOAAAA AAASNjAAEAAABP/H//       626       674
REGION        29 AAASNjAAEAAABQAAAA AAASNjAAEAAABR/H//       674       722
REGION        30 AAASNjAAEAAABSAAAA AAASNjAAEAAABT/H//       722       770
REGION        31 AAASNjAAEAAABUAAAA AAASNjAAEAAABV/H//       770       818
REGION        32 AAASNjAAEAAABWAAAA AAASNjAAEAAABX/H//       818       866
REGION        33 AAASNjAAEAAABYAAAA AAASNjAAEAAABZ/H//       866       914
REGION        34 AAASNjAAEAAABaAAAA AAASNjAAEAAABb/H//       914       962
REGION        35 AAASNjAAEAAABcAAAA AAASNjAAEAAABd/H//       962       999

36 rows selected.

Compare the access WITH and WITHOUT the help of Zone Maps

Now let’s compare a regular FULL TABLE SCAN in our sample table with an operation helped by the Zone Maps created.

First, the ordinary operation:

SQL> set autotrace on
SQL> select count(*), sum(invoice_value) from scott.INVOICES_AC where region = 7;

 COUNT(*) SUM(INVOICE_VALUE)
--------- ------------------
     1000          560668.92

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1529924733

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |     1 |     9 |   729   (1)| 00:00:09 |
|   1 |  SORT AGGREGATE    |             |     1 |     9 |            |          |
|*  2 |   TABLE ACCESS FULL| INVOICES_AC |  1000 |  9000 |   729   (1)| 00:00:09 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("REGION"=7)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2616  consistent gets
          0  physical reads
          0  redo size
        612  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Please execute the query a few times in order to have all the blocks in memory and better compare the gets (consistent gets + db block gets). Do the same for all upcoming queries.

Note that to read our table entirely, it was necessary 2616 gets.

Now let’s see how it goes with the Zone Maps:

SQL> select count(*), sum(invoice_value)
from SCOTT.INVOICES_AC a join scott.invoices_ac_zone_maps b
on (a.REGION between b.min_value and b.max_value and a.rowid between b.min_rowid and b.max_rowid)
where a.region = 7 and b.column_name = 'REGION' 
/

 COUNT(*) SUM(INVOICE_VALUE)
--------- ------------------
     1000          560668.92

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3513419715

-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                       |     1 |    67 |   734   (2)| 00:00:09 |
|   1 |  SORT AGGREGATE       |                       |     1 |    67 |            |          |
|   2 |   MERGE JOIN          |                       |    32 |  2144 |   734   (2)| 00:00:09 |
|*  3 |    TABLE ACCESS FULL  | INVOICES_AC           |  1000 | 14000 |   729   (1)| 00:00:09 |
|*  4 |    FILTER             |                       |       |       |            |          |
|*  5 |     SORT JOIN         |                       |     3 |   159 |     4  (25)| 00:00:01 |
|*  6 |      TABLE ACCESS FULL| INVOICES_AC_ZONE_MAPS |     3 |   159 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("A"."REGION"=7)
   4 - filter("A"."REGION"<="B"."MAX_VALUE" AND "A".ROWID>=CHARTOROWID("B"."MIN_ROWID") AND "A".ROWID<=CHARTOROWID("B"."MAX_ROWID"))    5 - access(INTERNAL_FUNCTION("A"."REGION")>=INTERNAL_FUNCTION("B"."MIN_VALUE"))
       filter(INTERNAL_FUNCTION("A"."REGION")>=INTERNAL_FUNCTION("B"."MIN_VALUE"))
   6 - filter("B"."MIN_VALUE"<=7 AND "B"."MAX_VALUE">=7 AND "B"."COLUMN_NAME"='REGION')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         16  consistent gets
          0  physical reads
          0  redo size
        612  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Wow, did you see the difference? From 2616 to just 16 gets!

Some final comments on this:

  • Despite the execution plan does not show the operation TABLE ACCESS BY ROWID RANGE for INVOICES_AC table, it is reducing the gets, so I suppose it is doing that behind the scenes. If you perform the same tests in 12c you will be able to se the ROWID range scan operation in the plan.
  • In my tests, the behavior was the expected only when the region value maps to only one ROWID range. If I tried to use the value 5 (which maps to extents 1 and 2) instead of 7, it didn’t work well. The same applied if I tried to use a range of values with a BETWEEN clause. I indeed managed to build a different query that worked for these cases: I had to remove the JOIN and read the Zone Maps table twice (to read separately MIN_ROWID and MAX_ROWID). I will show this alternative query here, but to me it’s seem too much alchemy already. And also the query group all the ranges into one, which would suppose that the extents are ordered by region, which is of course not feasible in a real situation. Maybe someone could come up with a better solution :-). Anyway, here it is:
SQL> WITH
range AS (
	select min(min_rowid) min_rowid, max(max_rowid) max_rowid
	from scott.invoices_ac_zone_maps
	where (4 <= MAX_VALUE) and (6 >= MIN_VALUE)
)
select count(*), sum(invoice_value)
from SCOTT.INVOICES_AC a
where
	rowid between
	(select min_rowid from range) and
	(select max_rowid from range) and 
	a.region between 4 and 6
/
  • Maybe something could be done to change the query transparently to the user. Who knows, maybe a future post?
  • In this approach, each region in the Zone Maps table is corresponding to one extent. So, if you have few extents or very different sizes of extents, this could negatively impact the results.
  • If the table has not the similar values in the target column stored together, there will be fewer or no benefit.
  • Again I didn’t take into consideration partitioned tables, but it would not be difficult to adapt.
  • Whenever the table extents change OR the key column values are updated, the Zone Maps may become stale and should be rebuilt. Then this technique is recommended only for Data Warehouse environments, just like the original 12c Zone Maps.

That’s it! I hope you enjoy 😉 .


Zone Maps no Oracle 11g / não-Exadata

Português/English

Depois do meu último post, meu chefe e grande especialista Oracle Rodrigo Righetti me disse que tinha começado a construir um tipo de Zone Maps manual para Oracle 11g em ambientes não Exadata, mas não terminou ainda por falta de tempo 😦 . Isso me inspirou a fazer o trabalho que descrevo aqui.

Se você não sabe o que são Zone Maps e Attribute Clustering, são novos recursos muito interessantes que só funcionam no 12c em Exadata. Este post explica muito bem o assunto and foi importante para minha pesquisa.

A ideia é construir um Zone Map para uma tabela, que gravaria os valores mínimos e máximos para uma coluna and poderiam ajudar a ignorar regiões de uma tabela em uma operação FULL TABLE SCAN. Somente as regiões que possam ter as linhas procuradas seriam varridas.

Então, vamos começar.

Apague as tabelas para começar um novo teste

SQL> drop table scott.invoices;

Table dropped.

SQL> drop table scott.invoices_ac;

Table dropped.

SQL> drop table scott.invoices_ac_zone_maps;

Table dropped.

Crie e popule uma tabela de exemplo

SQL> create table scott.invoices (invoice_id number, region number, invoice_value number(10,2));

Table created.

SQL> begin
 for i in 1..10 loop
 insert into scott.invoices
 select rownum+100000*(i-1), mod(rownum,1000), dbms_random.value(100,1000)
 from dual connect by level <= 100000;
 end loop;
end;
/

PL/SQL procedure successfully completed.

Crie uma tabela “Attribute Clustered”

No Oracle 12c, você pode alterar um atributo de uma tabela para fazer com que ela armazene os dados “clusterizados” baseados no valor de uma coluna (ou colunas). Depois da alteração, todos os novos dados inseridos serão armazenados considerando o valor da coluna, se a tabela já tem dados, ela deve ser movida (veja este post para mais detalhes).

Aqui, no Oracle 11g, não temos este recurso disponível. Então a tabela deve ser “clusterizada” manualmente, inserindo os dados ordenados pela coluna em questão.

Para simular uma tabela “clusterizada”, vou criar uma cópia da minha tabela de exemplo, mas inserindo os dados ordenadamente:

SQL> create table scott.invoices_ac as select * from scott.invoices order by region;

Table created.

SQL> exec dbms_stats.gather_table_stats('SCOTT','INVOICES_AC')

PL/SQL procedure successfully completed.

Crie a tabela de Zone Map

Vou reutilizar aqui parte da query que construí no meu post anterior, e criar uma tabela de Zone Map salvando o EXTEND_ID, ROWID máximo e mínimo, valor da coluna máximo e mínimo e também o COLUMN_NAME:

SQL> create table scott.invoices_ac_zone_maps as
select 'REGION' column_name, extent_id, min_rowid, max_rowid, min(REGION) min_value, max(REGION) max_value
from
(
  select
    o.owner, o.object_name table_name, e.extent_id,
    dbms_rowid.rowid_create(1,o.data_object_id,e.file_id,e.block_id,0) min_rowid,
    dbms_rowid.rowid_create(1,o.data_object_id,e.file_id,e.block_id+e.blocks-1,32767) max_rowid
  from dba_objects o, dba_extents e 
  where e.segment_name = o.object_name and e.owner = o.owner
  and o.object_name = 'INVOICES_AC' and o.owner = 'SCOTT'
  and e.PARTITION_NAME is null
) a
join SCOTT.INVOICES_AC b on b.rowid between a.min_rowid and a.max_rowid
group by owner, table_name, extent_id, min_rowid, max_rowid
/

Table created.

Verifique a tabela de Zone Map

SQL> col owner for a10
SQL> col table_name for a20
SQL> col column_name for a6
SQL> set linesize 200
SQL> select * from scott.invoices_ac_zone_maps order by extent_id;

COLUMN EXTENT_ID MIN_ROWID          MAX_ROWID          MIN_VALUE MAX_VALUE
------ --------- ------------------ ------------------ --------- ---------
REGION         0 AAASNjAAEAAAAIIAAA AAASNjAAEAAAAIPH//         0         2
REGION         1 AAASNjAAEAAAAIQAAA AAASNjAAEAAAAIXH//         2         5
REGION         2 AAASNjAAEAAAAIYAAA AAASNjAAEAAAAIfH//         5         8
REGION         3 AAASNjAAEAAAAIgAAA AAASNjAAEAAAAInH//         8        11
REGION         4 AAASNjAAEAAAAIoAAA AAASNjAAEAAAAIvH//        11        14
REGION         5 AAASNjAAEAAAAIwAAA AAASNjAAEAAAAI3H//        14        17
REGION         6 AAASNjAAEAAAAI4AAA AAASNjAAEAAAAI/H//        17        20
REGION         7 AAASNjAAEAAAAJAAAA AAASNjAAEAAAAJHH//        20        23
REGION         8 AAASNjAAEAAAAJIAAA AAASNjAAEAAAAJPH//        23        26
REGION         9 AAASNjAAEAAAAJQAAA AAASNjAAEAAAAJXH//        26        29
REGION        10 AAASNjAAEAAAAJYAAA AAASNjAAEAAAAJfH//        29        32
REGION        11 AAASNjAAEAAAAJgAAA AAASNjAAEAAAAJnH//        32        35
REGION        12 AAASNjAAEAAAAJoAAA AAASNjAAEAAAAJvH//        35        38
REGION        13 AAASNjAAEAAAAJwAAA AAASNjAAEAAAAJ3H//        38        41
REGION        14 AAASNjAAEAAAAJ4AAA AAASNjAAEAAAAJ/H//        41        44
REGION        15 AAASNjAAEAAAAKAAAA AAASNjAAEAAAAKHH//        44        47
REGION        16 AAASNjAAEAAAA0AAAA AAASNjAAEAAAA1/H//        47        98
REGION        17 AAASNjAAEAAAA2AAAA AAASNjAAEAAAA3/H//        98       146
REGION        18 AAASNjAAEAAAA6AAAA AAASNjAAEAAAA7/H//       146       194
REGION        19 AAASNjAAEAAAA8AAAA AAASNjAAEAAAA9/H//       194       242
REGION        20 AAASNjAAEAAAA+AAAA AAASNjAAEAAAA//H//       242       290
REGION        21 AAASNjAAEAAABAAAAA AAASNjAAEAAABB/H//       290       338
REGION        22 AAASNjAAEAAABCAAAA AAASNjAAEAAABD/H//       338       386
REGION        23 AAASNjAAEAAABEAAAA AAASNjAAEAAABF/H//       386       434
REGION        24 AAASNjAAEAAABGAAAA AAASNjAAEAAABH/H//       434       482
REGION        25 AAASNjAAEAAABIAAAA AAASNjAAEAAABJ/H//       482       530
REGION        26 AAASNjAAEAAABKAAAA AAASNjAAEAAABL/H//       530       578
REGION        27 AAASNjAAEAAABMAAAA AAASNjAAEAAABN/H//       578       626
REGION        28 AAASNjAAEAAABOAAAA AAASNjAAEAAABP/H//       626       674
REGION        29 AAASNjAAEAAABQAAAA AAASNjAAEAAABR/H//       674       722
REGION        30 AAASNjAAEAAABSAAAA AAASNjAAEAAABT/H//       722       770
REGION        31 AAASNjAAEAAABUAAAA AAASNjAAEAAABV/H//       770       818
REGION        32 AAASNjAAEAAABWAAAA AAASNjAAEAAABX/H//       818       866
REGION        33 AAASNjAAEAAABYAAAA AAASNjAAEAAABZ/H//       866       914
REGION        34 AAASNjAAEAAABaAAAA AAASNjAAEAAABb/H//       914       962
REGION        35 AAASNjAAEAAABcAAAA AAASNjAAEAAABd/H//       962       999

36 rows selected.

Compare o acesso COM e SEM a ajuda do Zone Map

Agora vamos comparar um FULL TABLE SCAN normal com uma operação auxiliada pelos Zone Map criado.

Primeira, a operação normal:

SQL> set autotrace on
SQL> select count(*), sum(invoice_value) from scott.INVOICES_AC where region = 7;

 COUNT(*) SUM(INVOICE_VALUE)
--------- ------------------
     1000          560668.92

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1529924733

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |     1 |     9 |   729   (1)| 00:00:09 |
|   1 |  SORT AGGREGATE    |             |     1 |     9 |            |          |
|*  2 |   TABLE ACCESS FULL| INVOICES_AC |  1000 |  9000 |   729   (1)| 00:00:09 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("REGION"=7)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2616  consistent gets
          0  physical reads
          0  redo size
        612  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Por favor, execute a query algumas vezes para ter todos os blocos em meória e melhor comparar os gets (consistent gets + db block gets). Faça o mesmo para todas as queries seguintes.

Veja que para ler nossa tabela inteira, foram necessários 2634 gets.

Agora vamos ver o que acontece com o uso do Zone Map:

SQL> select count(*), sum(invoice_value)
from SCOTT.INVOICES_AC a join scott.invoices_ac_zone_maps b
on (a.REGION between b.min_value and b.max_value and a.rowid between b.min_rowid and b.max_rowid)
where a.region = 7 and b.column_name = 'REGION' 
/

 COUNT(*) SUM(INVOICE_VALUE)
--------- ------------------
     1000          560668.92

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3513419715

-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                       |     1 |    67 |   734   (2)| 00:00:09 |
|   1 |  SORT AGGREGATE       |                       |     1 |    67 |            |          |
|   2 |   MERGE JOIN          |                       |    32 |  2144 |   734   (2)| 00:00:09 |
|*  3 |    TABLE ACCESS FULL  | INVOICES_AC           |  1000 | 14000 |   729   (1)| 00:00:09 |
|*  4 |    FILTER             |                       |       |       |            |          |
|*  5 |     SORT JOIN         |                       |     3 |   159 |     4  (25)| 00:00:01 |
|*  6 |      TABLE ACCESS FULL| INVOICES_AC_ZONE_MAPS |     3 |   159 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("A"."REGION"=7)
   4 - filter("A"."REGION"<="B"."MAX_VALUE" AND "A".ROWID>=CHARTOROWID("B"."MIN_ROWID") AND "A".ROWID<=CHARTOROWID("B"."MAX_ROWID"))    5 - access(INTERNAL_FUNCTION("A"."REGION")>=INTERNAL_FUNCTION("B"."MIN_VALUE"))
       filter(INTERNAL_FUNCTION("A"."REGION")>=INTERNAL_FUNCTION("B"."MIN_VALUE"))
   6 - filter("B"."MIN_VALUE"<=7 AND "B"."MAX_VALUE">=7 AND "B"."COLUMN_NAME"='REGION')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         16  consistent gets
          0  physical reads
          0  redo size
        612  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Uau, viu a diferença? De 2616 para apenas 16 gets!

Alguns comentários finais:

  • Apesar de o plano de execução não mostrar a operação TABLE ACCESS BY ROWID RANGE na tabela INVOICES_AC, a quantidade de gets é menor, então suponho que isto está acontecendo nos bastidores. Se você executar os mesmos testes no 12c você verá no plano o ROWID range scan.
  • Nos meus testes, o resultado foi o esperado somente quando a região pesquisada apontava para somente uma faixa de ROWIDs. Se eu tentava o valor 5 (que aponta para as extensões 1 e 2) ao invés de 7, não funcionava. O mesmo se aplica para uma faixa de valores com BETWEEN. Na verdade eu consegui fazer uma query diferente que funcionou nestes casos: para isso eu tive que remover o JOIN e ler a tabela de Zone Maps duas vezes (uma para buscar o MIN_ROWID e outra para o MAX_ROWID). Abaixo eu mostro esta query alternativa, mas para mim já parece muita alquimia. E também a query agrupa todas as faixas em uma só, o que supõe que as extensões estão ordenadas por região, algo que não é factível em uma situação real. Espero que alguém apresente uma solução melhor :-). De qualquer maneira, aqui está ela:
SQL> WITH
range AS (
	select min(min_rowid) min_rowid, max(max_rowid) max_rowid
	from scott.invoices_ac_zone_maps
	where (4 <= MAX_VALUE) and (6 >= MIN_VALUE)
)
select count(*), sum(invoice_value)
from SCOTT.INVOICES_AC a
where
	rowid between
	(select min_rowid from range) and
	(select max_rowid from range) and 
	a.region between 4 and 6
/
  • Talvez algo possa ser feito para mudar a query transparentemente para o usuário. Quem sabe, talvez um futuro post?
  • Nessa abordagem, cada região no Zone Map corresponde a uma extensão da tabela. Logo, se você tem poucas extensões ou tamanhos muito diferentes, isso pode ter impact negativo nos resultados.
  • Se a tabela não ter os valores semelhantes armazenados juntos para a coluna em questão, vai haver pouco ou nenhum benefício.
  • Novamente, eu não levei em conta tabelas particionadas, mas não seria difícil adaptar.
  • Whenever the table extents change OR the key column values are updated, the Zone Maps may become stale and should be rebuilt
  • Sempre que as extensões da tabela mudam OU o valor da coluna chave é alterado, o Zone Map pode ficar desatualizado e deve ser reconstruído. Por disso esta técnica é recomendada somente para ambientes Data Warehouse, assim como os Zone Maps originais do 12c.

É isso! Espero que gostem 😉 .

How to manually break a FULL TABLE SCAN into ranges of ROWID

Português/English

I’ve been in a client some days ago, where they were migrating a SAP application from Oracle to Hana (the in-memory database from SAP).

To migrate they data they were using a tool named SAPSLT, that allows to do the initial load and then keep them synced until the cut-over.

Some tables were very large, and this caused some issues due to SAPSLT limitations. For example, the largest table had 20 billion rows and 1 billion blocks, WITHOUT PARTITIONING! The query in Oracle had to do a FULL TABLE SCAN on that billions of blocks, and lasted for DAYS to finish. Several “SNAPSHOT TOO OLD” errors occurred, as expected when you need to read a very large table that is being updated often by other transactions.

In a try to load the tables faster and avoid the errors, the SAP consultants tried to break the operation into smaller processes. SAPSLT broke it into ranges of ROWID, turning the resulting queries more efficient and less prone to errors in Oracle. But the tool had an important limitation: when using that kind of parallelism, it simply IGNORED the configured filters and ended up bringing the entire table.

Then I started to try a different approach, by using Oracle parallelism. The result was good, but not as good as expected: the bottleneck now was the Query Coordinator, that was not able to attend the consumers efficiently. And, as the query still was unique, the chance of “SNAPSHOT TOO OLD” errors was already there.

For this specific case, the manual break by ROWID ranges was more efficient than the parallelism in Oracle. But unfortunately the SAPSLT tool could not do this job correctly, due to the limitation already explained.

So, I started to think on how I could do this break by ROWIDs manually, using Oracle resources. Of course, I could not use this in that project, due to lack of time to develop and test accordingly. But it would still be an interesting test case, and could be useful in other situations.

Soon I realized that to do it easily, I should consider the extents of the table, and build the ROWID ranges inside each extent. As the ROWID has in its composition the FILE_ID and BLOCK_ID, if I was able to take the first ROWID from the first block of the extent, and the last ROWID of its last block, for sure I would have a ROWID range that would bring all the rows in that extent.

I browsed the Internet to see if I could find something ready, but I couldn’t. I’ve just found the following very interesting blog post, that helped me in some points to build my solution:

https://hemantoracledba.blogspot.com.br/2013/06/getting-rowids-present-in-block.html
I thank the blog author for the post 😉 .

After some hours of study and tests, I built the solution that I present here.

IMPORTANT: The query is not prepared to work with partitioned tables, and neither was exhaustively tested for every possible situation. For sure, with some addition effort it could be adapted to work with partitioned tables. Use with caution 🙂 .

If someone find some inconsistency in the result or any other point of this work, the comments are always welcome.

The query built receives as parameters the OWNER, OBJECT_NAME and the number of CHUNKS. It determine the size of each chunk by blocks, but the break always let entire extents for each chunk.

If the table has irregular extent sizes, it’s possible that the chunks have a little different sizes. A limitation of this approach is that the number of chunks should always be lower than or equal to the number of extents of the table.

It’s important to highlight that I don’t validate the ROWIDs I’m building. And this doesn’t make any difference. The strategy consists in building the possible ROWIDs of what could be the row 1 in the first block of the extent, and the last row in the last block. Even if these specific ROWIDs don’t exist, they serve for my purpose.

Talking about the last row in a block, I’m considering 32767 as this maximum number. I’ve found some other blog posts saying this is the real limit, despite there is no reference in the official documentation. Anyway, I think it’s almost impossible a single block could have room for more than 32 thousand rows, right? And, if it happens, we could just replace the number in the query.

Please see the query and an example of utilization.

Creating a tablespace with UNIFORM SIZE:

This step is to force my extents to have the same size and allow a better distribuition of the blocks in the chunks.

SQL> create tablespace tsp1 datafile '/tmp/tsp1.dbf' size 50m uniform size 64K;

Tablespace created.

Creating a sample table:

SQL> drop table scott.t;

Table dropped.

SQL> create table scott.t (id_col number, data_col varchar2(15))
tablespace tsp1;

Table created.

SQL> begin
  for i in 1..10 loop
    insert into scott.t
    select rownum+100000*(i-1), 'Row '||to_char(rownum+100000*(i-1))
    from dual connect by level <= 100000;
  end loop;
end;
/

PL/SQL procedure successfully completed.

Verifying the number of extents and blocks of the table:

SQL> select blocks, extents from dba_segments
where owner='SCOTT' and segment_name = 'T';

    BLOCKS    EXTENTS
---------- ----------
      2928        366

Script to break by ROWID ranges:

SQL> define owner=SCOTT
SQL> define object_name=T
SQL> define chunks=10

SQL> select
  row_number() over (order by ychunk) chunk, min_rowid, max_rowid,
  min(extent_id) min_extid, max(extent_id) max_extid, sum(blocks) blocks
from
  (
  select
  ychunk, extent_id, blocks,
  first_value(min_rowid) over (partition by ychunk) min_rowid,
  last_value(max_rowid) over (partition by ychunk) max_rowid
  from
    (
    select least(xchunks, trunc( sum(blocks) over (order by file_id, blkini) / blk_per_chunk ) +1 ) ychunk,
    extent_id, file_id, blkini, blocks, sum(blocks) over (order by file_id, blkini) totblk, blkfin, min_rowid, max_rowid
    from
      (
      select
      o.owner, o.object_name, s.blocks seg_blocks, 
      least(s.extents, &chunks) xchunks, trunc( s.blocks / least(s.extents, &chunks)) blk_per_chunk,
      e.extent_id, e.file_id, e.block_id blkini, 
      e.blocks, e.block_id + e.blocks -1 blkfin, 
      dbms_rowid.rowid_create(1,o.data_object_id,e.file_id,e.block_id,0) min_rowid,
      dbms_rowid.rowid_create(1,o.data_object_id,e.file_id,e.block_id+e.blocks-1,32767) max_rowid 
      from dba_objects o, dba_segments s, dba_extents e
      where e.segment_name = o.object_name and e.owner = o.owner 
      and s.owner = e.owner and s.segment_name = e.segment_name
      and o.object_name = '&object_name' and o.owner = '&owner'
      and s.PARTITION_NAME is null
      order by file_id, blkini
      )
    order by ychunk, file_id, blkini
    )
  )
group by ychunk, min_rowid, max_rowid
order by chunk
/

CHUNK MIN_ROWID          MAX_ROWID          MIN_EXTID MAX_EXTID BLOCKS
----- ------------------ ------------------ --------- --------- ---------
    1 AAASNVAAGAAAAUAAAA AAASNVAAGAAAAYfH//         0        35    288
    2 AAASNVAAGAAAAYgAAA AAASNVAAGAAAAc/H//        36        71    288
    3 AAASNVAAGAAAAdAAAA AAASNVAAGAAAAhnH//        72       108    296
    4 AAASNVAAGAAAAhoAAA AAASNVAAGAAAAmHH//       109       144    288
    5 AAASNVAAGAAAAmIAAA AAASNVAAGAAAAqvH//       145       181    296
    6 AAASNVAAGAAAAqwAAA AAASNVAAGAAAAvPH//       182       217    288
    7 AAASNVAAGAAAAvQAAA AAASNVAAGAAAAz3H//       218       254    296
    8 AAASNVAAGAAAAz4AAA AAASNVAAGAAAA4XH//       255       290    288
    9 AAASNVAAGAAAA4YAAA AAASNVAAGAAAA8/H//       291       327    296
   10 AAASNVAAGAAAA9AAAA AAASNVAAGAAABBvH//       328       365    304

10 rows selected.

As you can see, the number of blocks on each chunk is not exactly equal, because the break is by extent. Anyway, for tables really large, this distribution would be close to ideal.

Confirming the accuracy of the ROWID ranges:

Let’s verify first if the ROWID ranges produce the expected execution plan:

SQL> set autotrace traceonly explain
SQL> select * from scott.t
where rowid between 'AAASNVAAGAAAAUAAAA' and 'AAASNVAAGAAAAYfH//';

Execution Plan
----------------------------------------------------------
Plan hash value: 280204748

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |   157K|  5240K|   786   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY ROWID RANGE| T    |   157K|  5240K|   786   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access(ROWID>='AAASNVAAGAAAAUAAAA' AND ROWID<='AAASNVAAGAAAAYfH//')

Note
-----
 - dynamic sampling used for this statement (level=2)

Note that the plan shows the operation “TABLE ACCESS BY ROWID RANGE“, exactly what I wanted 🙂 .

Now I will build an unique query with all the ROWID ranges, to verify if the number of rows correspond to the total rows in the table:

select count(1) from scott.t
where 
rowid between 'AAASNVAAGAAAAUAAAA' and 'AAASNVAAGAAAAYfH//' or
rowid between 'AAASNVAAGAAAAYgAAA' and 'AAASNVAAGAAAAc/H//' or
rowid between 'AAASNVAAGAAAAdAAAA' and 'AAASNVAAGAAAAhnH//' or
rowid between 'AAASNVAAGAAAAhoAAA' and 'AAASNVAAGAAAAmHH//' or
rowid between 'AAASNVAAGAAAAmIAAA' and 'AAASNVAAGAAAAqvH//' or
rowid between 'AAASNVAAGAAAAqwAAA' and 'AAASNVAAGAAAAvPH//' or
rowid between 'AAASNVAAGAAAAvQAAA' and 'AAASNVAAGAAAAz3H//' or
rowid between 'AAASNVAAGAAAAz4AAA' and 'AAASNVAAGAAAA4XH//' or
rowid between 'AAASNVAAGAAAA4YAAA' and 'AAASNVAAGAAAA8/H//' or
rowid between 'AAASNVAAGAAAA9AAAA' and 'AAASNVAAGAAABBvH//'
/

COUNT(1)
---------
 1000000

1 row selected.

Great!!! All the 1 million rows inserted were read by the ROWID ranges, which confirm they are correct!

I hope you enjoy 😉 .


Como quebrar manualmente um FULL TABLE SCAN em faixas de ROWID

Português/English

Eu estive em um cliente há alguns dias atrás, em que estavam migrando uma aplicação SAP de Oracle para Hana (o banco de dados em memória da SAP).

Para migrar os dados estavam usando a ferramenta SAPSLT, que permite fazer a carga inicial e depois manter os dados sincronizados até o “cut-over”.

Algumas tabelas eram muito grandes, e isso causou problemas por causa de limitações da ferramenta SAPSLT. A título de exemplo, a maior delas tinha 20 bilhões de linhas e 1 bilhão de blocos, e SEM PARTICIONAMENTO! A query no Oracle tinha que fazer FULL TABLE SCAN naqueles bilhões de blocos, e demorava DIAS para rodar. Tiveram vários erros “SNAPSHOT TOO OLD”, comuns em uma situação em que é necessário fazer uma leitura tão longa em uma tabela sendo atualizada com frequência por outras transações.

Na tentativa de carregar as tabelas mais rápido e evitar os erros, os consultores SAP tentaram alternativas de quebrar a operação em processos menores. O SAPSLT quebrava a operação em faixas de ROWID, o que tornava as queries eficiente no Oracle e menos suscetíveis a erros. O problema é que a ferramenta tinha uma limitação importante: o paralelismo que ela montava IGNORAVA os filtros configurados e acabava trazendo a tabela inteira.

Então comecei a tentar usar paralelismo do Oracle. O resultado foi bom, mas não tanto quanto esperávamos: o gargalo passou a ser o processo coordenador do paralelismo, que não conseguia atender a todos os consumidores de maneira eficiente. E como a query ainda era uma só lendo a tabela inteira, a chance do “SNAPSHOT TOO OLD” permanecia grande.

Para este caso específico, portanto, a quebra manual por ROWID era bem mais eficiente do que o paralelismo do Oracle. Mas infelizmente a ferramenta SAPSLT não conseguia montar essa quebra eficientemente pela limitação já citada acima.

Então, comecei a pensar em como eu poderia fazer essa quebra por ROWIDs manualmente, utilizando recursos do Oracle. Claro que isso não poderia ser usado naquele projeto, pela criticidade e falta de tempo para desenvolver e testar adequadamente. Mas ainda assim era um caso de estudo interessante, e que poderá ser útil em outras situações.

Logo pensei que para fazer isso de maneira simples, deveria levar em conta as extensões da tabela, e montar as faixas de ROWID dentro de cada extensões. Como o ROWID tem na composição o FILE_ID e o BLOCK_ID, se eu conseguisse pegar o primeiro ROWID do primeiro bloco da extensão, e o último ROWID do último bloco, certamente eu teria uma faixa de ROWIDs que traria todas as linhas daquela extensão.

Pesquisei na Internet para ver se encontrava algo já montado para isto, e não encontrei.
Mas encontrei o link a seguir com um post muito interessante, que me ajudou em alguns pontos para montar esta solução: https://hemantoracledba.blogspot.com.br/2013/06/getting-rowids-present-in-block.html
Agradeço ao escritor pelo post 😉 .

Após algumas horas de estudo e testes, cheguei à solução que apresento abaixo.

IMPORTANTE: A query não está preparada para trabalhar com tabelas particionadas, e nem foi exaustivamente testada em todas as possíveis situações. Com certeza, com algum esforço adicional ela poderá ser adaptada para trabalhar com objetos particionados. Use com parcimônia 🙂 .

Se alguém encontrar alguma inconsistência no resultado ou em qualquer ponto deste trabalho, os comentários serão sempre bem-vindos.

A query construída recebe como parâmetros o OWNER, o nome do OBJETO e a quantidade de PARTES (Chunks). Ela determina o tamanho de cada pedaço por blocos, mas a quebra sempre ocorre de maneira que cada chunk leia extensões inteiras.

Caso a tabela tenha tamanhos de extensão muito irregulares, é possível que os chunks tenham tamanhos um pouco diferentes. Uma limitação desta abordagem é que a quantidade de chunks sempre deve ser menor ou igual à quantidade de extensões da tabela.

É importante destacar que eu não faço qualquer validação da existência ou não dos ROWIDs da faixa que construo. E isso não faz diferença. A estratégia consiste em construir os possíveis ROWIDs do que seria a linha 1 do primeiro bloco da extensão, e a última linha do seu último bloco. Mesmo que estes ROWIDs não existam, eles servem para o meu propósito.

Por falar em última linha do bloco, estou considerando 32767 como a máximo número de linha possível em um bloco. Encontrei em alguns outros posts que este parece ser o limite, apesar de eu não ter encontrado referência na documentação oficial. De qualquer maneira, imagino ser quase impossível se ter mais do que 32 mil linhas em um bloco Oracle, concorda? E, se tiver mais, bastaria alterar este número na query.

Abaixo coloco a query e um exemplo de utilização.

Criando uma tablespace com UNIFORM SIZE:

Este passo é para fazer com que as extensões da tabela tenham tamanho uniforme e permitam uma melhor distribuição dos blocos pelos chunks.

SQL> create tablespace tsp1 datafile '/tmp/tsp1.dbf' size 50m uniform size 64K;

Tablespace created.

Criando uma tabela para exemplo:

SQL> drop table scott.t;

Table dropped.

SQL> create table scott.t (id_col number, data_col varchar2(15))
tablespace tsp1;

Table created.

SQL> begin
  for i in 1..10 loop
    insert into scott.t
    select rownum+100000*(i-1), 'Row '||to_char(rownum+100000*(i-1))
    from dual connect by level <= 100000;
  end loop;
end;
/

PL/SQL procedure successfully completed.

Verificando a quantidade de extensões e blocos da tabela:

SQL> select blocks, extents from dba_segments
where owner = 'SCOTT' and segment_name = 'T';

    BLOCKS    EXTENTS
---------- ----------
      2928        366

Script para quebra por ROWID:

SQL> define owner=SCOTT
SQL> define object_name=T
SQL> define chunks=10

SQL> select
  row_number() over (order by ychunk) chunk, min_rowid, max_rowid,
  min(extent_id) min_extid, max(extent_id) max_extid, sum(blocks) blocks
from
  (
  select
  ychunk, extent_id, blocks,
  first_value(min_rowid) over (partition by ychunk) min_rowid,
  last_value(max_rowid) over (partition by ychunk) max_rowid
  from
    (
    select least(xchunks, trunc( sum(blocks) over (order by file_id, blkini) / blk_per_chunk ) +1 ) ychunk,
    extent_id, file_id, blkini, blocks, sum(blocks) over (order by file_id, blkini) totblk, blkfin, min_rowid, max_rowid
    from
      (
      select
      o.owner, o.object_name, s.blocks seg_blocks, 
      least(s.extents, &chunks) xchunks, trunc( s.blocks / least(s.extents, &chunks)) blk_per_chunk,
      e.extent_id, e.file_id, e.block_id blkini, 
      e.blocks, e.block_id + e.blocks -1 blkfin, 
      dbms_rowid.rowid_create(1,o.data_object_id,e.file_id,e.block_id,0) min_rowid,
      dbms_rowid.rowid_create(1,o.data_object_id,e.file_id,e.block_id+e.blocks-1,32767) max_rowid 
      from dba_objects o, dba_segments s, dba_extents e
      where e.segment_name = o.object_name and e.owner = o.owner 
      and s.owner = e.owner and s.segment_name = e.segment_name
      and o.object_name = '&object_name' and o.owner = '&owner'
      and s.PARTITION_NAME is null
      order by file_id, blkini
      )
    order by ychunk, file_id, blkini
    )
  )
group by ychunk, min_rowid, max_rowid
order by chunk
/

CHUNK MIN_ROWID          MAX_ROWID          MIN_EXTID MAX_EXTID BLOCKS
----- ------------------ ------------------ --------- --------- ---------
    1 AAASNVAAGAAAAUAAAA AAASNVAAGAAAAYfH//         0        35    288
    2 AAASNVAAGAAAAYgAAA AAASNVAAGAAAAc/H//        36        71    288
    3 AAASNVAAGAAAAdAAAA AAASNVAAGAAAAhnH//        72       108    296
    4 AAASNVAAGAAAAhoAAA AAASNVAAGAAAAmHH//       109       144    288
    5 AAASNVAAGAAAAmIAAA AAASNVAAGAAAAqvH//       145       181    296
    6 AAASNVAAGAAAAqwAAA AAASNVAAGAAAAvPH//       182       217    288
    7 AAASNVAAGAAAAvQAAA AAASNVAAGAAAAz3H//       218       254    296
    8 AAASNVAAGAAAAz4AAA AAASNVAAGAAAA4XH//       255       290    288
    9 AAASNVAAGAAAA4YAAA AAASNVAAGAAAA8/H//       291       327    296
   10 AAASNVAAGAAAA9AAAA AAASNVAAGAAABBvH//       328       365    304

10 rows selected.

Como podem ver, a quantidade de blocos em cada extensão não foi exatamente uniforme, pelo fato de a quebra ser feita em extensões inteiras. Mas, para tabelas realmente grandes, a distribuição seria bem próxima da ideal.

Confirmando a exatidão das faixas de ROWID:

Vamos verificar primeiro se as faixas de ROWID produzem o plano de execução esperado:

SQL> set autotrace traceonly explain
SQL> select * from t where rowid between 'AAASNVAAGAAAAUAAAA' and 'AAASNVAAGAAAAYfH//';

Execution Plan
----------------------------------------------------------
Plan hash value: 280204748

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |   157K|  5240K|   786   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS BY ROWID RANGE| T    |   157K|  5240K|   786   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access(ROWID>='AAASNVAAGAAAAUAAAA' AND ROWID<='AAASNVAAGAAAAYfH//')

Note
-----
 - dynamic sampling used for this statement (level=2)

Veja que o plano mostra a operação “TABLE ACCESS BY ROWID RANGE“, que é exatamente o que eu queria 🙂 .

Abaixo eu construo uma única query com todas as faixas de ROWID criadas para verificar se a quantidade de linhas lidas corresponde ao total de linhas da tabela:

select count(1) from t
where 
rowid between 'AAASNVAAGAAAAUAAAA' and 'AAASNVAAGAAAAYfH//' or
rowid between 'AAASNVAAGAAAAYgAAA' and 'AAASNVAAGAAAAc/H//' or
rowid between 'AAASNVAAGAAAAdAAAA' and 'AAASNVAAGAAAAhnH//' or
rowid between 'AAASNVAAGAAAAhoAAA' and 'AAASNVAAGAAAAmHH//' or
rowid between 'AAASNVAAGAAAAmIAAA' and 'AAASNVAAGAAAAqvH//' or
rowid between 'AAASNVAAGAAAAqwAAA' and 'AAASNVAAGAAAAvPH//' or
rowid between 'AAASNVAAGAAAAvQAAA' and 'AAASNVAAGAAAAz3H//' or
rowid between 'AAASNVAAGAAAAz4AAA' and 'AAASNVAAGAAAA4XH//' or
rowid between 'AAASNVAAGAAAA4YAAA' and 'AAASNVAAGAAAA8/H//' or
rowid between 'AAASNVAAGAAAA9AAAA' and 'AAASNVAAGAAABBvH//'
/

COUNT(1)
---------
 1000000

1 row selected.

Show!!! Todos os 1 milhão de registros inseridos foram lidos pelas faixas de ROWID criadas, o que confirma que as faixas estão corretas!

Espero que curtam 😉 .

Video Tutorial – Install OEL5, Grid and Oracle 11g into Virtualbox

Português

In this video tutorial (my first one ever), I go through the process of installing Oracle Enterprise Linux 5, Oracle Grid Infrastructure for a Standalone Server and Oracle Database 11gR2 softwares, using a Virtualbox VM. All the configuration process is covered, step by step.

Please watch the video here. Your feedback is welcome.

As this is intended to serve mainly for my students, it’s completely in Portuguese. Sorry English speakers :-). By the way, there is a lot of similar tutorials in English in the net.

Here are the links I mention in the video:

Oracle Linux 5 installation:

http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/11g/r1/prod/install/oel5gainst/oel5gainst.htm

Oracle-validated package install (do all the pre-configuration needed to install Oracle DB):

http://www.oracle.com/technetwork/articles/servers-storage-admin/ginny-dbinstallonlinux-2111493.html

Download Oracle Linux:

https://edelivery.oracle.com

Download Oracle Grid and Database:

https://www.oracle.com , click Download and “see all”

Install Grid and Oracle DB:

https://oracle-base.com/articles/11g/oracle-db-11gr2-rac-installation-on-oel5-using-virtualbox

I hope you enjoy!

Regards.

Tutorial em Vídeo – Instalação do OEL5, Grid e Oracle DB 11gR2 no Virtualbox

Neste tutorial em vídeo (meu primeiro), eu passo pelo processo de instalação do Oracle Enterprise Linux 5, Oracle Grid Infrastructure for a Standalone Server e Oracle Database 11gR2, usando uma VM Virtualbox. Todo o processo de configuração é coberto, passo a passo.

Assista ao vídeo aqui. Seu feedback é bem-vindo.

Como o intuito principal é que sirva para meus alunos, ele está completamente em português. Peço desculpas aos que só entendem inglês :-). De qualquer maneira, existem muitos tutoriais semelhantes na internet.

Aqui estão os links que eu menciono no vídeo:

Instalação do Oracle Linux 5:

http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/11g/r1/prod/install/oel5gainst/oel5gainst.htm

Instalação do pacote oracle-validated (que faz toda a configuração necessária para a instalação do Oracle DB):

http://www.oracle.com/technetwork/articles/servers-storage-admin/ginny-dbinstallonlinux-2111493.html

Download Oracle Linux:

https://edelivery.oracle.com

Download do Oracle Grid e Database:

https://www.oracle.com , clicar em Download e “see all”

Tutorial para instalação do Grid e Oracle DB:

https://oracle-base.com/articles/11g/oracle-db-11gr2-rac-installation-on-oel5-using-virtualbox

Espero que gostem!

Abraços.