How PARANOID are you with data? Or are there DIRTY READs in Oracle???

Some weeks ago, I was discussing with a friend about an issue he saw in some Oracle databases, kind of losing data, or in other words, seeing UNCOMMITTED data that later on disappeared.

Allowing a session to read uncommited data being changed in another transaction is a Transaction Isolation Level called READ UNCOMMITED, which is not supported by Oracle. Such type of read is known as a DIRTY READ.

Many database scholars argue that a dirty read should never be allowed, under any circumstances, which I tend to agree with. And Oracle seems to agree as well, since it does not support this by design, unlike other databases.

So, the idea of Oracle allowing such thing sounds crazy, right? Or maybe a bug?

Well, after some research this friend did, he found this post from Jonathan Lewis explaining a very rare but possible situation when these DIRTY READS could take place in Oracle and cause data confusion.

And I was able to reproduce the issue in several different versions, including 23c FREE.

In this post I will explain in details how to reproduce the issue, and then I will present the solution that will make the title of this post make sense.

Basically, the problem happens when the LGWR process hangs, avoiding the COMMIT to complete successfully. But the problem is the way Oracle works by default, the COMMIT steps are executed first in memory by writing the redo entries to the Redo Log Buffers and then calling LGWR to write them to Redo Log Files on disk. In the rare case when the LGWR is hung for whatever reason but the instance is still working, another session may end up querying and seeing the changed data event without the COMMIT being persisted to disk.

In order to reproduce the behavior, we will follow these steps exactly the way JL shows:

Initial Preparation

To prepare I will just create a table and insert a row of data:

create table t1(n1 number);

Table created.

insert into t1 values(1);

1 row created.

commit;

Commit complete.

Step 1 – Session 1 changes some data

Now let’s change some data:

col con_name for a10
col sid for a10
select SYS_CONTEXT ('userenv','con_name') con_name, SYS_CONTEXT ('userenv','sid') sid from dual;

CON_NAME   SID
---------- ----------
FREEPDB1   51

update t1 set n1 = 2;

1 row updated.

select * from t1;

	    N1
----------
	     2

Step 2 – Session 2 connects (or is already connected)

This is the session that will see the dirty read. It needs to be connected before the LGWR fails, because after that no new session will be able to connect.

col con_name for a10
col sid for a10
select SYS_CONTEXT ('userenv','con_name') con_name, SYS_CONTEXT ('userenv','sid') sid from dual;

CON_NAME   SID
---------- ----------
FREEPDB1   271

Step 3 – LGWR hangs

Let’s find the LGWR PID# and use it with oradebug to force a hang (put the process in a suspended state for the tests).

In order to use oradebug, I will need a third session connected as SYSDBA at the CDB level:

col con_name for a10
col sid for a10
select SYS_CONTEXT ('userenv','con_name') con_name, SYS_CONTEXT ('userenv','sid') sid from dual;

CON_NAME   SID
---------- ----------
CDB$ROOT   237

select * from session_privs where privilege = 'SYSDBA';

PRIVILEGE
----------------------------------------
SYSDBA

Then, I will identify the LGWR PID:

select pid from v$process where pname='LGWR';

       PID
----------
	    22

And finally I will put the process to sleep:

oradebug setorapid 22
Oracle pid: 22, Unix process pid: 88, NID: 4026532473, image: oracle@bcadfe8ca782 (LGWR)
oradebug suspend
Statement processed.

I won’t show it here, but you can test yourself that from this moment on, if you try to open a new session it will hang.

Step 4 – Session 1 issues a COMMIT

When the COMMIT is issued, first the server process attending that session will work in memory writing the COMMIT redo entry, and then calls LGWR to write the redo entries from the Redo Log Buffer into the Redo Log File on disk.

select SYS_CONTEXT ('userenv','con_name') con_name, SYS_CONTEXT ('userenv','sid') sid from dual;SQL> SQL> 

CON_NAME   SID
---------- ----------
FREEPDB1   51

commit;
-- THE SESSION WILL HANG

Since the LGWR is hung, the COMMIT does not finish and the session also hangs. BUT, the work in memory is already done and so from the memory perspective the COMMIT was successful already.

Step 5 – Session 2 issues a SELECT against the data and receives a DIRTY READ in response

SQL> select SYS_CONTEXT ('userenv','con_name') con_name, SYS_CONTEXT ('userenv','sid') sid from dual;

CON_NAME   SID
---------- ----------
FREEPDB1   271

SQL> select * from t1;

        N1
----------
         2

See that? Even with the COMMIT not confirmed and the redo data not persisted on disk, the Session 2 was able to see the data already modified, which should NOT be allowed.

So, this is a VERY VERY rare situation that only happened because a background process (the LGWR) stopped working properly, and right after, without the instance failing, a session changes data and another one checks that data. But although it’s rare, it can happen as we demonstrated here.

Simulating an instance failure

What happens now if the instance fails? Well, since the COMMIT was never persisted on disk, the transaction should be rolled back after the instance is back up, so theoretically the data change never happened.

Let’s see it in action: in the SYSDBA session, without resuming the LGWR, I will force a restart of the instance:

startup force
ORACLE instance started.

Total System Global Area 1603679416 bytes
Fixed Size		    5313720 bytes
Variable Size		  973078528 bytes
Database Buffers	  620756992 bytes
Redo Buffers		    4530176 bytes
Database mounted.
Database opened.

Now that the instance is up again, I will reconnect as a normal user and check the data:

col con_name for a10
col sid for a10
select SYS_CONTEXT ('userenv','con_name') con_name, SYS_CONTEXT ('userenv','sid') sid from dual;

CON_NAME   SID
---------- ----------
FREEPDB1   33

select * from t1;

        N1
----------
         1

As expected, the transaction was lost and the original value is back.

Now, the solution

So, as I mentioned in the beginning, I was able to reproduce the same behavior in different Oracle versions: 12, 19 and 23.

BUT, in 23c there is a new parameter called paranoid_concurrency_mode that controls this behavior. I don’t know exactly how this parameter works internally, since at this point I could not find any documentation about it.

This original value is FALSE, and when it is changed to TRUE, then the demonstrated behavior showed here does not happen. In this case, the Session 2 also hangs when trying to issue a SELECT against the changed data.

So, this is a solution for the problem, and also a very rare case when a SELECT can hang in Oracle.

Some important notes here:

I was only able to make the parameter to work by changing it at the CDB level, and restarting the instance (it’s not online modifiable). If I changed it at the PDB level the problem persisted, which makes sense since the LGWR is a background process for the instance, so to change its behavior it’s expected to change the parameter at the CDB level.
I don’t know what are the disadvantages of changing this parameter to TRUE, what else it changes or even what is the performance penalty for the instance. Although it’s not a hidden parameter (it doesn’t start with an hyphen), since there is no documentation at this point, you should ask My Oracle Support about changing it.

So that’s it for now.

This is just a study case, and PLEASE do not test or do any change in Production database without proper testings and in this particular case, having the OK from MOS.

I hope you like it.

Se you in the next post!

4 thoughts on “How PARANOID are you with data? Or are there DIRTY READs in Oracle???”

Luis Santos March 26, 2024 at 11:10 am

Very good article, very well written, as usual.

LikeLike

Reply ↓
Osman DINC March 26, 2024 at 1:48 pm

Awesome post. Well explained thank you.

LikeLike

Reply ↓
Frank Wilhoit March 27, 2024 at 2:44 am

One of my employers was a big DB2 shop, and the DBAs forced us to use dirty reads for everything. The applications were poorly indexed, so (we were told) any joins caused a cascade of table locks that in one case reached all the way up to the system catalog, preventing even the nightly backup from running. Again, this is what we were told.

LikeLike

Reply ↓
1. Eduardo Claro Post authorMarch 27, 2024 at 2:20 pm
  
  yeah Frank, this is one of the basic things that explain why Oracle is superior to most other databases: the locking mechanism is much better implemented, so Oracle NEVER needs to escalate locks. And when a SELECT is issued agains data that is being modified by another sesion’s transaction, these other DBs usually lock the SELECT statement until the transaction is completed. Oracle NEVER blocks a SELECT, it uses the UNDO to present consistent data even with ongoing transactions. Or, to simplify: the concurrency controls in Oracle are much better.
  
  LikeLike
  
  Reply ↓

	Eduardo Claro on How to move an Oracle database…
	Eduardo Claro on How PARANOID are you with data…
	Frank Wilhoit on How PARANOID are you with data…
	Osman DINC on How PARANOID are you with data…
	Luis Santos on How PARANOID are you with data…

EDUARDO CLARO

An Oracle Blog

How PARANOID are you with data? Or are there DIRTY READs in Oracle???

Initial Preparation

Step 1 – Session 1 changes some data

Step 2 – Session 2 connects (or is already connected)

Step 3 – LGWR hangs

Step 4 – Session 1 issues a COMMIT

Step 5 – Session 2 issues a SELECT against the data and receives a DIRTY READ in response

Simulating an instance failure

Now, the solution

4 thoughts on “How PARANOID are you with data? Or are there DIRTY READs in Oracle???”

Leave a comment Cancel reply

Initial Preparation

Step 1 – Session 1 changes some data

Step 2 – Session 2 connects (or is already connected)

Step 3 – LGWR hangs

Step 4 – Session 1 issues a COMMIT

Step 5 – Session 2 issues a SELECT against the data and receives a DIRTY READ in response

Simulating an instance failure

Now, the solution

Share this:

4 thoughts on “How PARANOID are you with data? Or are there DIRTY READs in Oracle???”

Leave a comment Cancel reply