Tag Archives: WITH

How I solved a real world problem using a RECURSIVE WITH clause, and a detailed explanation of how this feature works

Hello,

Some days ago one of my fellow colleagues came up with a problem that he was trying to solve using pure SQL, to replace a very costly PL/SQL procedure.

His client has an application that needs to send messages to clients, but these messages need some transformation before being ready for sending.

Basically, they need to replace specific characters in the original message with different strings. For example, they need to replace spaces with ‘%20’, and hyphen with ‘%2D’ and so on. The list of possible replacements is extensive, so they have a table that contains all the possibilities.

The current PL/SQL procedure traverses every message, character by character, searching and replacing when necessary. That’s why the procedure is very expensive. Imagine you have 1000 messages to send, each with 200 characters on average. It would make the procedure loops 1000 x 200 = 200 thousand times, probably querying the table for each iteration.

My friend was able to come up with a better solution replacing the original FOR loops with FORALL and BULK COLLECT. But I decided to insist in providing a solution use solely SQL. As Tom Kyte used to say, if you can do it all using only SQL, do it 🙂 .

After some tries, I was able to come up with the solution by using a recursive WITH clause. As you may guess, a recursive WITH clause is one that calls itself, over and over.

Let me present the test case and the solution, I guess it can be useful for many.

First, I will create the table that contains all the possible string replacements:

create table replacements (id number primary key, from_string varchar2(100), to_string varchar2(100));

Table created.

insert into replacements values (1,' ','%20');

1 row created.

insert into replacements values (2,'-','%2D');

1 row created.

insert into replacements values (3,'º','%BA');

1 row created.

commit;

commit complete.

Then, I will create a table with the messages to be sent:

create table messages (id number primary key, message varchar2(100));

Table created.

insert into messages values (1,'COMPANY: SEU PROTOCOLO DE SAC Nº 123456 FOI RESPONDIDO. POR FAVOR, VERIFICAR SEU E-MAIL.');

1 row created.

insert into messages values (2,'COMPANY: E-MAIL ENVIADO.');

1 row created.

commit;

Commit complete.

And now finally I can present my solution, to go over every message in the MESSAGES table and do all necessary replacements using the strings in the REPLACEMENTS table, with pure SQL, nice and clean, with good performance, using a recursive WITH clause:

col replaced for a120

WITH string_replace_recursive(id, m_id, replaced) as (
	select r.id r_id, m.id m_id, replace(m.message, r.from_string, r.to_string) replaced
	from replacements r, messages m
	where r.id = 1
	UNION ALL
	select o.id, m_id, replace(r.replaced,o.from_string,o.to_string) replaced
	from replacements o, string_replace_recursive r
	where o.id = r.id +1
)
select m_id, replaced
from (
	select id, m_id, max(id) over (partition by m_id) max_id, replaced
	from string_replace_recursive
	--order by id desc fetch first 1 rows only
	)
where id = max_id
;

      M_ID REPLACED
---------- ------------------------------------------------------------------------------------------------------------------------
         1 COMPANY:%20SEU%20PROTOCOLO%20DE%20SAC%20N%BA%20123456%20FOI%20RESPONDIDO.%20POR%20FAVOR,%20VERIFICAR%20SEU%20E%2DMAIL.
         2 COMPANY:%20E%2DMAIL%20ENVIADO.

2 rows selected.

Perfect! This seems to be the result we wanted!

And here is the execution plan used by the query:

select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
Plan hash value: 126274356

------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |              |     4 |  8164 |    23   (5)| 00:00:01 |
|*  1 |  VIEW                                       |              |     4 |  8164 |    23   (5)| 00:00:01 |
|   2 |   WINDOW SORT                               |              |     4 |  8112 |    23   (5)| 00:00:01 |
|   3 |    VIEW                                     |              |     4 |  8112 |    22   (0)| 00:00:01 |
|   4 |     UNION ALL (RECURSIVE WITH) BREADTH FIRST|              |       |       |            |          |
|   5 |      NESTED LOOPS                           |              |     2 |   364 |     4   (0)| 00:00:01 |
|   6 |       TABLE ACCESS BY INDEX ROWID           | REPLACEMENTS |     1 |   117 |     1   (0)| 00:00:01 |
|*  7 |        INDEX UNIQUE SCAN                    | SYS_C008225  |     1 |       |     1   (0)| 00:00:01 |
|   8 |       TABLE ACCESS FULL                     | MESSAGES     |     2 |   130 |     3   (0)| 00:00:01 |
|   9 |      NESTED LOOPS                           |              |     2 |  4290 |    18   (0)| 00:00:01 |
|  10 |       NESTED LOOPS                          |              |     2 |  4290 |    18   (0)| 00:00:01 |
|  11 |        RECURSIVE WITH PUMP                  |              |       |       |            |          |
|* 12 |        INDEX UNIQUE SCAN                    | SYS_C008225  |     1 |       |     0   (0)| 00:00:01 |
|  13 |       TABLE ACCESS BY INDEX ROWID           | REPLACEMENTS |     1 |   117 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"="MAX_ID")
   7 - access("R"."ID"=1)
  12 - access("O"."ID"="R"."ID"+1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan

32 rows selected.

Let me try to explain how this recursive WITH clause works:

The parameters of the WITH clause

WITH string_replace_recursive (id, m_id, replaced)

Note that what makes Oracle to understands that this WITH subquery is a recursive one is the parameters, as well as of course mentioning the subquery name inside itself.

Also note that the parameters must match the columns returned by the WITH subquery. In my case, they are ID, M_ID and REPLACED.

The first query in the UNION ALL

Within the WITH clause, note that I have two queries united with a UNION ALL set clause.

The first one is used to identify the top level of the hierarchy:

	select r.id r_id, m.id m_id, replace(m.message, r.from_string, r.to_string) replaced
	from replacements r, messages m
	where r.id = 1

Yes, I had not mentioned this before, but the recursive WITH clause is used to create an hierarchical query, just like when you use the CONNECT BY operator. One difference that I am exploring here is that with the recursive WITH clause I can access the columns of both the parent and the child records, which is not exactly possible with a CONNECT BY.

My case is not exactly one that requires a hierarchy per se. I just need to create a chain that will make the character replacements over and over, but using the previous replacement as the input for the next.

I just want to take advantage of the construction to produce an expression (the character replacement) in the first row of the hierarchy, and use this output as the input value to do the same with the second row, and then use the output of the second row as the input for the third, and so on until the last row.

So, the “hierarchy” I am building just connects one row of REPLACEMENTS with the next, to create the chain I need.

Back to the first query, it does a cartesian join between REPLACEMENTS and MESSAGES, and filters the starting point of the “hierarchies” as being the the first row of the REPLACEMENTS table (id = 1). Note that I am doing the first character replacement and generating the new string in the column REPLACED.

This first query of the UNION ALL is executed only once during the first pass of the recursivity, because it only returns the top level of the hierarchy. For all the subsequent recursive calls it is ignored.

Every partial result is saved to be appended in the next pass.

The second query in the UNION ALL

Now that the top level was identified by the first query, this second one is executed recursively, each time getting the previous output (the column REPLACED) as an input to produce the chain of string replacements.

UNION ALL
select o.id, m_id, replace(r.replaced,o.from_string,o.to_string) replaced
from replacements o, string_replace_recursive r
where o.id = r.id +1

Note that this query joins the original REPLACEMENTS table with the subquery of the WITH clause to create the hierarchy, by connecting each row with the one with the next ID. Mentioning the subquery of the WITH clause here is what really makes it recursive.

So, the second query goes over the REPLACEMENTS table always getting the next level and adding to the result, until the last one. After all rows have been returned, the recursive calls end and the final result is presented.

IMPORTANT: despite I am explaining the process here as if Oracle was doing it row-by-row, which is not good for performance, actually it is doing this in just one single read of the table!

The final query

Now the WITH clause is done, and all my results are ready to be returned by the main query outside the WITH clause.

But note that the WITH clause ended up generating all the steps of replacement for each message. So, back to my initial explanation of the case, if the REPLACEMENTS table had 200 possible string replacements and MESSAGES had 1000 messages, at this point my result would have 200 thousand rows. But this is not what I want, I just want the final pass for each message, since this final pass has all the replacements in it.

So, my final query is using an Analytical Function (MAX) to persist in the final results only the rows with the maximum value of the ID from REPLACEMENTS for each message, i.e., the final pass :-).

select m_id, replaced
from (
	select id, m_id, max(id) over (partition by m_id) max_id, replaced
	from string_replace_recursive
	--order by id desc fetch first 1 rows only
	)
where id = max_id

I hope my explanation was clear enough for you to understand how it works, and apply in your specific cases when necessary 😉 .

That’s it. I hope you enjoy!

See you next time!