Sunday 15 February 2015

Adverse SQR performance as a result of the Oracle setting _unnest_subquery=FALSE

As I mentioned in an earlier post that the performance tuning recommendation from Oracle for PeopleSoft customers requires setting the init.ora parameter _unnest_subquery to FALSE.

While this is probably a panacea to many a performance issues seen in the system, it is not the case in some occasions. Just like an adverse reaction of a drug that is meant to cure one disease ends up you having another disease; in the case of my customer's PeopleSoft Campus Solution system it caused the custom attendance calculation process to see a sudden and huge spike in the time take to complete (from ~40 odd minutes to 7.5 hours).

Diagnosis gone wrong:
The issue was dealt by many a consultants and system admins over the course of a year without any tangible difference in performance and then I happened to join the team and thus all and sundry were eager to have their hands off this issue. And so I got to know that all cliched solutions were tried out by this time.

As I expected, the issue was misdiagnosed or rather no proper diagnosis was done. The very mention of performance issue and the guys went about lining up the usual suspects (cliched performance fixes) - 
  • Gathering stats on the tables
  • Flushing shared pool
  • altering session parameters
  • rebuilding indexes
  • putting hints and doing trial and error
When nothing worked, perhaps they just clung on to hope that the problem would disappear just as it miraculously appeared out of nowhere.

What I did differently:
The first thing I did was to do a real-time monitoring of the SQL's as they were being executed by the SQR while the process took long time to run. I used the below SQL to do that - 
 SELECT S.username,   
     s.sid,   
     s.inst_id,   
     s.serial#,   
     s.osuser,   
     t.sql_id,   
     sql_text   
 FROM  v$sqltext_with_newlines t,   
     gv$session s   
 WHERE t.address = s.sql_address   
     AND t.hash_value = s.sql_hash_value   
     AND s.status = 'ACTIVE'   
     AND s.username <> 'SYSTEM'   
     AND osuser <> :1   
 ORDER BY s.sid,   
      t.piece   

From this I noticed a couple of SQL's that were always seen running and running long. Both SQL's had an "exists" clause and when I extracted the explain plan I found that the plan for both the SQL's were significantly costlier as shown below.
SQL1 (Before):
 SELECT Count(*)   
 FROM  ps_attend_tmplt C   
 WHERE C.institution = :1   
     AND C.strm = :2   
     AND C.class_attend_dt <= :3   
     AND EXISTS (SELECT 'x'   
           FROM  ps_stdnt_enrl_l_vw C1   
           WHERE C1.emplid = :4   
              AND C1.crse_id = :5   
              AND C1.strm = :2   
              AND C1.class_nbr = C.class_nbr   
              AND C1.stdnt_enrl_status = 'E')   

SQL2 (Before):
 SELECT Count(*)   
 FROM  ps_attend_tmplt A   
 WHERE A.institution = :1   
     AND A.strm = :2   
     AND EXISTS (SELECT 'x'   
           FROM  ps_stdnt_enrl_l_vw A1   
           WHERE A.institution = A1.institution   
              AND a.strm = a1.strm   
              AND a.class_nbr = a1.class_nbr   
              AND A1.emplid = :3   
              AND A1.crse_id = :4   
              AND A1.strm = :2   
              AND A1.class_nbr = A.class_nbr   
              AND A1.stdnt_enrl_status = 'E')   

Note: I did get a much better plan when the session parameter _unnest_subquery is TRUE. However there was no going back on that one. It had to remain FALSE. So the only option was to rewrite the SQL's to perform better without altering the program logic.

Now that it was clear that the plan was too costly I rewrote the SQL's to do a equi-join to get the row count rather than using exists clause as shown below. I was probably lucky that the SQL with an equi-join would suffice in this situation. A more straightforward and easier solution was to use nested begin-select statement within the SQR program, which I thought of using if equi-join is not an option.

With the revised SQL, the plan prepared by Oracle was way better than before.

SQL1 (after):
 SELECT Count(1)   
 FROM  ps_attend_tmplt C,   
     ps_stdnt_enrl_l_vw C1   
 WHERE C.institution = :1   
     AND C.strm = :2   
     AND C.class_attend_dt <= :3   
     AND C.institution = C1.institution   
     AND C.strm = C1.strm   
     AND C.class_nbr = C1.class_nbr   
     AND C1.emplid = :4   
     AND C1.crse_id = :5   
     AND C1.strm = :2   
     AND C1.class_nbr = C.class_nbr   
     AND C1.stdnt_enrl_status = 'E'   

SQL2 (After):
 SELECT Count(*)   
 FROM  ps_attend_tmplt A,   
     ps_stdnt_enrl_l_vw A1   
 WHERE A.institution = :1   
     AND A.strm = :2   
     AND A.institution = A1.institution   
     AND A.strm = A1.strm   
     AND A.class_nbr = A1.class_nbr   
     AND A1.emplid = :3   
     AND A1.crse_id = :4   
     AND A1.strm = :2   
     AND A1.class_nbr = A.class_nbr   
     AND A1.stdnt_enrl_status = 'E'   

Thus I was able to bring back the run time performance from 7.5 hours+ to the usual 30 to 40 minutes. What did you do in a similar situation or is there a better way out? Leave a comment below. Thanks!!

No comments:

Post a Comment