Monthly Archives: June 2023

Triggers firing order mistery, and DATA LOSS along the chain

Hello,

Weeks ago we faced an issue related to triggers firing order and some weird data loss that I am going to explain here.

So here is the situation:

  • Two databases created as copies of Production, with same objects (tables, PL/SQL, etc), just some difference in data after the copies.
  • The application has a similar structure of triggers on several tables:
    • One BEFORE INSERT trigger that sets the primary key to the next sequence value if it was not specified.
    • Another BEFORE INSERT trigger that sets other values if they are not specified, like CREATED_ON and CREATED_BY.
    • One AFTER INSERT/UPDATE/DELETE trigger that stores the previous values in a log table.
    • The triggers have no FOLLOWS or PRECEDES clauses to determine the order of firing.

So, the problem that we were called to analyze was that all of the sudden, with no apparent reason, the app started to receive an Oracle error that it could not insert a NULL value in a NOT NULL column:

ORA-01400: cannot insert NULL into ("OWNER"."TABLE"."CREATED_ON")

The interesting fact: in one of the DB copies the error was showing up, but not in the other.

First, I checked to see if the table structure was changed, or constraints, or whatever else I could think of that could have introduced the error. None!

I also compared all these objects between the two databases: table structures and timestamps, constraints, triggers code, etc. No difference!

So, as far as I could check, the databases were the same but the error was showing up only in one of them.

Analyzing the triggers code, it was clear that the value for the column CREATED_ON was being set to the current timestamp in one of the BEFORE INSERT triggers, but for some reason this value was not being carried on until the end of the process.

Then at some point I thought this could be an issue with the order of the triggers firing. Since no particular order was specified (which could be achieved by using the FOLLOWS or PRECEDES clauses), the Oracle documentation that it’s impossible to determine the exact order of firing.

But, since the two triggers we had with concurrent firing did not overlap each other in any way (I mean, one did not undo anything the other was doing), the order should not change the behavior, right? Wrong!!!

So, after lots of tests, I was able to determine that the error was showing up when a specific order of firing was occurring, but not with the opposite order. I was able to proof that by including a FOLLOWS clause in one of the triggers to force a particular order.

Why the order of firing differed from one DB to the other, all of the sudden and without any noticeable change? Well, I still have no idea…

So, this is the first “issue”: the triggers with the same trigger conditions could be triggered (funny uh?) in any order. It is not a de facto issue, since the documentation clearly states it is expected behavior.

Why the order of firing made a difference if one doesn’t undo anything the other does? Wait and you will figure out.

I had a hard time until I was able to create a stand-alone reproducible test case that I could provide to Oracle Support to have them see the issue. I had to had working sessions and record what was happening in our environment, showing all the details, for them to be able to understand exactly what was the issue.

While they continued (and still continue) researching the issue on their side, I also continued to work on that and finally was able to create the reproducible test case that I will present here, so you can also see the problem by yourself.

Note: the issue is reproducible both in Oracle 19.19 as well as Oracle 23 FREE.

First, I will create the tables:

create table maintab (
	id number,
	ts6 timestamp (6),
	ts3 timestamp (3),
	dt date,
	nu number,
	ch char(1),
	vc varchar2(10));

Table created.

create table histtab (
	id number references maintab,
	ts6 timestamp (6) not null,
	ts3 timestamp (3) not null,
	dt date not null,
	nu number not null,
	ch char(1) not null,
	vc varchar2(10) not null);

Table created.

Please note that I am creating a main table and a history table, to reproduce exactly the scenario I had in our real environment. Also note I am creating several columns, one for each data type. You will understand why ahead.

Now I will create the triggers:

create or replace trigger triggerAfter after insert on maintab for each ROW
  BEGIN
    dbms_output.put_line('TRIGGER triggerAfter');
	dbms_output.put_line(':new.ts6: ' || :new.ts6);
	dbms_output.put_line(':new.ts3: ' || :new.ts3);
	dbms_output.put_line(':new.dt: ' || :new.dt);
	dbms_output.put_line(':new.nu: ' || :new.nu);
	dbms_output.put_line(':new.ch: ' || :new.ch);
	dbms_output.put_line(':new.vc: ' || :new.vc);
	dbms_output.put_line('----');
    INSERT
    INTO histtab
      (
        id, ts6, ts3, dt, nu, ch, vc
      )
      VALUES
      (
        :new.id, :new.ts6, :new.ts3, :new.dt, :new.nu, :new.ch, :new.vc
      );
  END;
/

Trigger created.

create or replace trigger before_dont_set_values
before insert on maintab for each row
begin
    dbms_output.put_line('TRIGGER before_dont_set_values');
	dbms_output.put_line(':new.ts6: ' || :new.ts6);
	dbms_output.put_line(':new.ts3: ' || :new.ts3);
	dbms_output.put_line(':new.dt: ' || :new.dt);
	dbms_output.put_line(':new.nu: ' || :new.nu);
	dbms_output.put_line(':new.ch: ' || :new.ch);
	dbms_output.put_line(':new.vc: ' || :new.vc);
	dbms_output.put_line('----');
end;
/

Trigger created.

create or replace trigger before_set_values
before insert on maintab for each row follows before_dont_set_values
begin
	:new.ts6 := systimestamp;
	:new.ts3 := systimestamp;
	:new.dt := sysdate;
	:new.nu := 1;
	:new.ch := '1';
	:new.vc := '1';
    dbms_output.put_line('TRIGGER before_set_values');
	dbms_output.put_line(':new.ts6: ' || :new.ts6);
	dbms_output.put_line(':new.ts3: ' || :new.ts3);
	dbms_output.put_line(':new.dt: ' || :new.dt);
	dbms_output.put_line(':new.nu: ' || :new.nu);
	dbms_output.put_line(':new.ch: ' || :new.ch);
	dbms_output.put_line(':new.vc: ' || :new.vc);
	dbms_output.put_line('----');
end;
/

Trigger created.

Please note that I am forcing a particular order of the BEFORE INSERT triggers firing, first the BEFORE_DONT_SET_VALUES, then the BEFORE_SET_VALUES, by including a FOLLOWS in the second one.

Now, I will issue a simple INSERT that should work, but it doesn’t:

set serverout on

INSERT INTO maintab (id) values (1);

TRIGGER before_dont_set_values
:new.ts6:
:new.ts3:
:new.dt:
:new.nu:
:new.ch:
:new.vc:
----
TRIGGER before_set_values
:new.ts6: 2023-06-27 09:43:17.374968
:new.ts3: 2023-06-27 09:43:17.375
:new.dt: 2023-06-27 09:43:17
:new.nu: 1
:new.ch: 1
:new.vc: 1
----
TRIGGER triggerAfter
:new.ts6:
:new.ts3:
:new.dt: 2023-06-27 09:43:17
:new.nu: 1
:new.ch: 1
:new.vc: 1
----
INSERT INTO maintab (id) values (1)
            *
ERROR at line 1:
ORA-01400: cannot insert NULL into ("CLAROE"."HISTTAB"."TS6")
ORA-06512: at "CLAROE.TRIGGERAFTER", line 10
ORA-04088: error during execution of trigger 'CLAROE.TRIGGERAFTER'

Note that my outputs show that when the first trigger (BEFORE_DONT_SET_VALUES), all the columns have values. I did not provide those values directly in my INSERT statement, they were filled by the trigger code.

But, when the second trigger (BEFORE_SET_VALUES) starts, we can see that the values for the TIMESTAMP data types (columns TS6 and TS3) were simply lost! The other columns kept their values as expected. Since all the columns have a NOT NULL constraint, the error shows up.

If I simply force the opposite order of firing, by putting the FOLLOWS clause in BEFORE_DONT_SET_VALUES instead, then the error does not show up:

create or replace trigger before_set_values
before insert on maintab for each row 
begin
	:new.ts6 := systimestamp;
	:new.ts3 := systimestamp;
	:new.dt := sysdate;
	:new.nu := 1;
	:new.ch := '1';
	:new.vc := '1';
    dbms_output.put_line('TRIGGER before_set_values');
	dbms_output.put_line(':new.ts6: ' || :new.ts6);
	dbms_output.put_line(':new.ts3: ' || :new.ts3);
	dbms_output.put_line(':new.dt: ' || :new.dt);
	dbms_output.put_line(':new.nu: ' || :new.nu);
	dbms_output.put_line(':new.ch: ' || :new.ch);
	dbms_output.put_line(':new.vc: ' || :new.vc);
	dbms_output.put_line('----');
end;
/

Trigger created.

create or replace trigger before_dont_set_values
before insert on maintab for each row follows before_set_values
begin
    dbms_output.put_line('TRIGGER before_dont_set_values');
	dbms_output.put_line(':new.ts6: ' || :new.ts6);
	dbms_output.put_line(':new.ts3: ' || :new.ts3);
	dbms_output.put_line(':new.dt: ' || :new.dt);
	dbms_output.put_line(':new.nu: ' || :new.nu);
	dbms_output.put_line(':new.ch: ' || :new.ch);
	dbms_output.put_line(':new.vc: ' || :new.vc);
	dbms_output.put_line('----');
end;
/

Trigger created.

INSERT INTO maintab (id) values (1);

TRIGGER before_set_values
:new.ts6: 2023-06-27 09:54:24.268542
:new.ts3: 2023-06-27 09:54:24.269
:new.dt: 2023-06-27 09:54:24
:new.nu: 1
:new.ch: 1
:new.vc: 1
----
TRIGGER before_dont_set_values
:new.ts6: 2023-06-27 09:54:24.268542
:new.ts3: 2023-06-27 09:54:24.269
:new.dt: 2023-06-27 09:54:24
:new.nu: 1
:new.ch: 1
:new.vc: 1
----
TRIGGER triggerAfter
:new.ts6: 2023-06-27 09:54:24.268542
:new.ts3: 2023-06-27 09:54:24.269
:new.dt: 2023-06-27 09:54:24
:new.nu: 1
:new.ch: 1
:new.vc: 1
----

1 row created.

You see? So, this is inconsistent as I said in the beginning. The triggers firing order should not cause any difference in this particular case.

Please be aware that this issue is being investigated as a bug, so Oracle will likely come up with a patch for it soon (I hope 🙂 ).

The most frightening thing for me is that in this case the error is a good thing, since we were able to see the issue and troubleshoot it right away. If we did not have the NOT NULL constraint, we would realize at some point in the future that a lot of rows did not have the data we would expect, and along with the data loss, this would be much more difficult to troubleshoot.

That’s it. I will post an update when the patch is available or I have any update on this case.

EDIT 2023/NOV/06: Oracle have just assigned the Bug 35680150 – TRIGGER LOSING :NEW VALUE FOR TIMESTAMP WHEN ‘FOLLOWS’ IS USED for this issue. The link is not public yet, but they said you can request a one-off patch if you need the issue fixed now.

Until there, if you have weird behaviors of column values being lost along the chain of triggers, you now know what may be causing the issue 😉 .

See you next time.

Huge pages, SGA increase and ulimits “not being respected”

Hello, this weekend I was doing something that should be a simple task of increasing the huge pages configuration in order to increase SGA in some of my client’s Production databases.

The process involves just a few steps and is very well described here by Tim Hall, so I won’t re-explain them. Basically, you should (considering you are using Oracle Linux, for other distributions the steps may be slightly different):

  1. Check the current configuration. Backup the files you are going to change (including DB spfile).
  2. Calculate and set the new number of huge pages you need at the OS level, and make it effective.
  3. Calculate and set the memlock user limit (ulimit) at the OS level.
  4. Increase the SGA and (optional) set USE_LARGE_PARGES to ONLY and restart the DB to make these changes effective.

As I said, this should be pretty straightforward, but in our case it wasn’t.

For some reason we didn’t know at that time, even doing all the configurations correctly and using the very same values and steps we’ve used successfully in other servers, it did not work at first.

When we tried to restart the DB with the increased SGA, we were receiving the following error message:

The message shows that Oracle was unable to allocate all the shared memory for the SGA.

Well, to make a long story short, after a lot of research, “ipcs” and restarts of servers to make sure there was no “trash” memory chunks lost, Oracle SR and tries, we determined that the problem was that the ulimit for memlock was not being respected. The value shown when we issued “ulimit -l” had nothing to do with the value we’ve put for memlock in the /etc/security/limits.conf file.

Then we realized that in Doc ID 361468.1 there is the following statement:

“These settings can also be done in a *.conf file under /etc/security/limits.d/ directory. Also if there is a *.conf file in /etc/security/limits.d/ that has settings for memlock, setting it up in /etc/security/limits.conf may not work. Make sure you check the files /etc/security/limits.d/*.conf files.”

So, our problem was simply there were different *conf files in that folder, with different ulimit values set, and this was messing up our configuration.

Interestingly, the same files with the same configurations were in several servers, but it the issues occurred in some servers but not in others. And, if you see the names of the files (related to oracle preinstall configuration), I would guess those files were put into the server during the process of installation of new versions of the DB software.

So, at this point I don’t know for sure what/who created the files and put them there, and what is the order they are validated that made it a problem in some servers but not in others. Date order maybe? Let me know in the comments if you know it.

Anyway, this post is an advisory: make sure you don’t have undesired *conf files in /etc/security/limits.d/ that could mess up your configuration. Just rename them (making sure it won’t break anything) for something not ending with “.conf” and they won’t be used.

That’s it, see you next time.

How I solved a real world problem using a RECURSIVE WITH clause, and a detailed explanation of how this feature works

Hello,

Some days ago one of my fellow colleagues came up with a problem that he was trying to solve using pure SQL, to replace a very costly PL/SQL procedure.

His client has an application that needs to send messages to clients, but these messages need some transformation before being ready for sending.

Basically, they need to replace specific characters in the original message with different strings. For example, they need to replace spaces with ‘%20’, and hyphen with ‘%2D’ and so on. The list of possible replacements is extensive, so they have a table that contains all the possibilities.

The current PL/SQL procedure traverses every message, character by character, searching and replacing when necessary. That’s why the procedure is very expensive. Imagine you have 1000 messages to send, each with 200 characters on average. It would make the procedure loops 1000 x 200 = 200 thousand times, probably querying the table for each iteration.

My friend was able to come up with a better solution replacing the original FOR loops with FORALL and BULK COLLECT. But I decided to insist in providing a solution use solely SQL. As Tom Kyte used to say, if you can do it all using only SQL, do it 🙂 .

After some tries, I was able to come up with the solution by using a recursive WITH clause. As you may guess, a recursive WITH clause is one that calls itself, over and over.

Let me present the test case and the solution, I guess it can be useful for many.

First, I will create the table that contains all the possible string replacements:

create table replacements (id number primary key, from_string varchar2(100), to_string varchar2(100));

Table created.

insert into replacements values (1,' ','%20');

1 row created.

insert into replacements values (2,'-','%2D');

1 row created.

insert into replacements values (3,'º','%BA');

1 row created.

commit;

commit complete.

Then, I will create a table with the messages to be sent:

create table messages (id number primary key, message varchar2(100));

Table created.

insert into messages values (1,'COMPANY: SEU PROTOCOLO DE SAC Nº 123456 FOI RESPONDIDO. POR FAVOR, VERIFICAR SEU E-MAIL.');

1 row created.

insert into messages values (2,'COMPANY: E-MAIL ENVIADO.');

1 row created.

commit;

Commit complete.

And now finally I can present my solution, to go over every message in the MESSAGES table and do all necessary replacements using the strings in the REPLACEMENTS table, with pure SQL, nice and clean, with good performance, using a recursive WITH clause:

col replaced for a120

WITH string_replace_recursive(id, m_id, replaced) as (
	select r.id r_id, m.id m_id, replace(m.message, r.from_string, r.to_string) replaced
	from replacements r, messages m
	where r.id = 1
	UNION ALL
	select o.id, m_id, replace(r.replaced,o.from_string,o.to_string) replaced
	from replacements o, string_replace_recursive r
	where o.id = r.id +1
)
select m_id, replaced
from (
	select id, m_id, max(id) over (partition by m_id) max_id, replaced
	from string_replace_recursive
	--order by id desc fetch first 1 rows only
	)
where id = max_id
;

      M_ID REPLACED
---------- ------------------------------------------------------------------------------------------------------------------------
         1 COMPANY:%20SEU%20PROTOCOLO%20DE%20SAC%20N%BA%20123456%20FOI%20RESPONDIDO.%20POR%20FAVOR,%20VERIFICAR%20SEU%20E%2DMAIL.
         2 COMPANY:%20E%2DMAIL%20ENVIADO.

2 rows selected.

Perfect! This seems to be the result we wanted!

And here is the execution plan used by the query:

select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
Plan hash value: 126274356

------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |              |     4 |  8164 |    23   (5)| 00:00:01 |
|*  1 |  VIEW                                       |              |     4 |  8164 |    23   (5)| 00:00:01 |
|   2 |   WINDOW SORT                               |              |     4 |  8112 |    23   (5)| 00:00:01 |
|   3 |    VIEW                                     |              |     4 |  8112 |    22   (0)| 00:00:01 |
|   4 |     UNION ALL (RECURSIVE WITH) BREADTH FIRST|              |       |       |            |          |
|   5 |      NESTED LOOPS                           |              |     2 |   364 |     4   (0)| 00:00:01 |
|   6 |       TABLE ACCESS BY INDEX ROWID           | REPLACEMENTS |     1 |   117 |     1   (0)| 00:00:01 |
|*  7 |        INDEX UNIQUE SCAN                    | SYS_C008225  |     1 |       |     1   (0)| 00:00:01 |
|   8 |       TABLE ACCESS FULL                     | MESSAGES     |     2 |   130 |     3   (0)| 00:00:01 |
|   9 |      NESTED LOOPS                           |              |     2 |  4290 |    18   (0)| 00:00:01 |
|  10 |       NESTED LOOPS                          |              |     2 |  4290 |    18   (0)| 00:00:01 |
|  11 |        RECURSIVE WITH PUMP                  |              |       |       |            |          |
|* 12 |        INDEX UNIQUE SCAN                    | SYS_C008225  |     1 |       |     0   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID           | REPLACEMENTS |     1 |   117 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"="MAX_ID")
   7 - access("R"."ID"=1)
  12 - access("O"."ID"="R"."ID"+1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan

32 rows selected.

Let me try to explain how this recursive WITH clause works:

The parameters of the WITH clause

WITH string_replace_recursive (id, m_id, replaced)

Note that what makes Oracle to understands that this WITH subquery is a recursive one is the parameters, as well as of course mentioning the subquery name inside itself.

Also note that the parameters must match the columns returned by the WITH subquery. In my case, they are ID, M_ID and REPLACED.

The first query in the UNION ALL

Within the WITH clause, note that I have two queries united with a UNION ALL set clause.

The first one is used to identify the top level of the hierarchy:

	select r.id r_id, m.id m_id, replace(m.message, r.from_string, r.to_string) replaced
	from replacements r, messages m
	where r.id = 1

Yes, I had not mentioned this before, but the recursive WITH clause is used to create an hierarchical query, just like when you use the CONNECT BY operator. One difference that I am exploring here is that with the recursive WITH clause I can access the columns of both the parent and the child records, which is not exactly possible with a CONNECT BY.

My case is not exactly one that requires a hierarchy per se. I just need to create a chain that will make the character replacements over and over, but using the previous replacement as the input for the next.

I just want to take advantage of the construction to produce an expression (the character replacement) in the first row of the hierarchy, and use this output as the input value to do the same with the second row, and then use the output of the second row as the input for the third, and so on until the last row.

So, the “hierarchy” I am building just connects one row of REPLACEMENTS with the next, to create the chain I need.

Back to the first query, it does a cartesian join between REPLACEMENTS and MESSAGES, and filters the starting point of the “hierarchies” as being the the first row of the REPLACEMENTS table (id = 1). Note that I am doing the first character replacement and generating the new string in the column REPLACED.

This first query of the UNION ALL is executed only once during the first pass of the recursivity, because it only returns the top level of the hierarchy. For all the subsequent recursive calls it is ignored.

Every partial result is saved to be appended in the next pass.

The second query in the UNION ALL

Now that the top level was identified by the first query, this second one is executed recursively, each time getting the previous output (the column REPLACED) as an input to produce the chain of string replacements.

UNION ALL
select o.id, m_id, replace(r.replaced,o.from_string,o.to_string) replaced
from replacements o, string_replace_recursive r
where o.id = r.id +1

Note that this query joins the original REPLACEMENTS table with the subquery of the WITH clause to create the hierarchy, by connecting each row with the one with the next ID. Mentioning the subquery of the WITH clause here is what really makes it recursive.

So, the second query goes over the REPLACEMENTS table always getting the next level and adding to the result, until the last one. After all rows have been returned, the recursive calls end and the final result is presented.

IMPORTANT: despite I am explaining the process here as if Oracle was doing it row-by-row, which is not good for performance, actually it is doing this in just one single read of the table!

The final query

Now the WITH clause is done, and all my results are ready to be returned by the main query outside the WITH clause.

But note that the WITH clause ended up generating all the steps of replacement for each message. So, back to my initial explanation of the case, if the REPLACEMENTS table had 200 possible string replacements and MESSAGES had 1000 messages, at this point my result would have 200 thousand rows. But this is not what I want, I just want the final pass for each message, since this final pass has all the replacements in it.

So, my final query is using an Analytical Function (MAX) to persist in the final results only the rows with the maximum value of the ID from REPLACEMENTS for each message, i.e., the final pass :-).

select m_id, replaced
from (
	select id, m_id, max(id) over (partition by m_id) max_id, replaced
	from string_replace_recursive
	--order by id desc fetch first 1 rows only
	)
where id = max_id

I hope my explanation was clear enough for you to understand how it works, and apply in your specific cases when necessary 😉 .

That’s it. I hope you enjoy!

See you next time!