Tuesday, October 20, 2020

Fetching Session Trace Files as CLOBs


I'm often asked to help developers and analysts debug problematic SQL, and usually it comes down to tracing sessions running that SQL. A SQL (or, aka session) trace is simply a request to the Oracle server to divulge any wait events, resource stats, and execution plans for the SQL and PL/SQL statements running under that session during the trace. As DBAs know, these traces are stored in flat files on the database server out of reach of anyone but system and database administrators.

For development and testing databases, developers and analysts need as much information as possible to work out their SQL and PL/SQL so that production deployments of their code don't incur any unwelcome performance problems on release day. Besides, DBAs who also manage production databases will prioritize issues and requests there before fetching some trace file off the test database server for every other random business analyst out there with execute privileges on DBMS_MONITOR. Scrum milestones be damned, production comes first, and some DBAs view anything but as a distant second.

So, why not let users run a database query that returns their trace files with TKPROF applied to it? (Briefly for those who don't know, TKPROF is a small program provided by Oracle that sits on the database server and renders trace files more readable by aggregating waits and resource stats.) Please note that this is not recommended for use in production unless it is strictly for a select few savvy individuals who use it sparingly over encrypted database connections. Trace files can fill up server filesystems quickly so a prudent approach should be considered for putting this code in production. Also, traces can contain bind data even if you're not intending to track it. Bind data can include PII and PHI in the form of literals in ad-hoc queries or hard-coded into views and PL/SQL objects. So, if your database connection isn't encrypted and your queries could contain PII/PHI, don't use this at all!

Before I dive into an explanation of the PL/SQL code, we have to set up O/S permissions and ownership for a few files on the database server since we will be using TKPROF there as an external procedure. The following Linux O/S commands assume that oracle is the user that owns the Oracle installation and oinstall is the group that owns the Oracle Home:

$ sudo chown root:oinstall $ORACLE_HOME/rdbms/admin/externaljob.ora
$ sudo chmod 640 $ORACLE_HOME/rdbms/admin/externaljob.ora
$ sudo chown root:oinstall $ORACLE_HOME/bin/extjob
$ sudo chown root:oinstall $ORACLE_HOME/bin/extjobo
$ sudo chmod 4750 $ORACLE_HOME/bin/extjob
$ sudo chmod 4750 $ORACLE_HOME/bin/extjobo

The next step is to make sure that externaljob.ora has the run_user and run_group variables correctly set to the oracle user and oinstall group respectively. Oracle ships this file with "nobody" set for both. The following is what I've edited in my externaljob.ora:
run_user = oracle
run_group = oinstall

Now, on to the database privileges. It's important that you create a dedicated user account specifically for this code as it will be able to execute command-line utilities as oracle on the database server. In our case, it will be TKPROF which isn't as dangerous as, say, rm -rf *.  This account needs the system privileges CREATE SESSION, RESOURCE, CREATE JOB, and CREATE EXTERNAL JOB. It also needs SELECT privilege granted on SYS.V_$MYSTAT, SYS.V_$SESSION, and SYS.V_$PROCESS as well as the EXECUTE privilege on DBMS_SCHEDULER. Then, a directory object called TRACE_DIR needs to be created for your trace filesystem directory (check the value for "Diag Trace" in V$DIAG_INFO). Make sure this account has the READ privilege to the TRACE_DIR directory object. 

Finally, the code. There is one PL/SQL stored procedure that contains all the logic and one PL/SQL function which is a wrapper for the stored procedure so that it can be called from a query. The stored procedure gets the invoker's session ID from which its process ID is determined. Trace files are templated as "database name_ora_process ID.trc" so we know which trace file to look for on the server for our particular session. Then, we create the job with the proper arguments including the output file which is simply the trace filename with a txt extension instead of trc. We give the job 90 seconds to run before timing out, but this is only to prevent an infinite loop as it only takes a few seconds for most traces to be processed by TKPROF. Once the job completes, we pull the file into a CLOB and return it as an out variable. In a separate out variable, I return the size of the CLOB on successful completions; otherwise, it is used to report the error for failures. Remember to grant to your users the EXECUTE privilege on the PL/SQL function only as well as DBMS_MONITOR if you haven't already done so.

One note about the PL/SQL wrapper function is the use of PRAGMA AUTONOMOUS_TRANSACTION. If you do not use it, Oracle will error on the use of DDL in a referenced object, in this case DBMS_SCHEDULER. 

CREATE OR REPLACE PROCEDURE prcGetMyTrace (outRtrnMsg OUT VARCHAR2, outTraceClob OUT CLOB)
AS
 lTimestamp        VARCHAR2(14);
 lCurrentTime      DATE;
 lProcessID        NUMBER;
 lSessionID        NUMBER;
 lTraceState       VARCHAR2(8);

 lOutputBFile      BFILE;
 lDoffset          INTEGER := 1;
 lSoffset          INTEGER := 1;
 lLangContext      INTEGER := 0;
 lWarning          INTEGER := 0;

 lJobName          VARCHAR2(128);
 lJobStatus        VARCHAR2(30);
 lJobCompleted     BOOLEAN := FALSE;
 lJobFailReason    VARCHAR2(128);

 cTKPROF           CONSTANT VARCHAR2(43) := '/opt/oracle/product/19c/dbhome_1/bin/tkprof';
 cTraceDir         CONSTANT VARCHAR2(45) := '/opt/oracle/diag/rdbms/orclcdb/ORCLCDB/trace/';
 lTraceFilename    VARCHAR2(64);
 lOutputFilename   VARCHAR2(64);

BEGIN 
 -- the timestamp will be used with the session ID and process ID to create a unique job name
 SELECT TO_CHAR(SYSTIMESTAMP, 'YYMMDDHH24MISSFF2')
 INTO lTimestamp
 FROM DUAL; 

 -- besides being part of the unique job name, the session ID will be used to grab the process ID
 -- which is part of the filename of the trace file we want to process with TKPROF and return to the user
 SELECT DISTINCT SID
 INTO lSessionID
 FROM SYS.V_$MYSTAT;

 -- obviously SQL tracing has to have been previously enabled by the user or else we have to fail here
 SELECT SQL_TRACE
 INTO lTraceState
 FROM SYS.V_$SESSION
 WHERE SID=lSessionID;
 
 IF lTraceState <> 'ENABLED' THEN
  RAISE_APPLICATION_ERROR(-20001, 'SQL Tracing is not enabled for your session. Use DBMS_MONITOR to enable tracing.'); 
 END IF;
 
 -- grabbing the process ID here
 SELECT SPID
 INTO lProcessID 
 FROM SYS.V_$PROCESS 
 WHERE ADDR IN (SELECT PADDR FROM SYS.V_$SESSION WHERE SID=lSessionID);

 -- assigning the name variables for the unique job and filenames
 lJobName := 'TKPROF_' || TO_CHAR(lProcessID) || '_' || lTimestamp;
 lTraceFilename :=  'ORCLCDB_ora_' || TO_CHAR(lProcessID) || '.trc';
 lOutputFilename :=  'ORCLCDB_ora_' || TO_CHAR(lProcessID) || '.txt';

 -- create the TKPROF job as an O/S executable with 4 arguments
 DBMS_SCHEDULER.CREATE_JOB (
   job_name               => lJobName,
   job_type               => 'EXECUTABLE',
   job_action             => cTKPROF,
   number_of_arguments    => 4,
   enabled                => FALSE
   );

 -- arg 1: the full path to the trace file
 DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => lJobName,
   argument_position       => 1,
   argument_value          => cTraceDir || lTraceFilename);

 -- arg 2: the full path to the output file
 DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => lJobName,
   argument_position       => 2,
   argument_value          => cTraceDir || lOutputFilename);

 -- arg 3: set sys=no 
 DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => lJobName,
   argument_position       => 3,
   argument_value          => 'sys=no');

 -- arg 4: set waits=yes
 DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => lJobName,
   argument_position       => 4,
   argument_value          => 'waits=yes');

 -- run the job
 DBMS_SCHEDULER.ENABLE (name => lJobName);

 lCurrentTime := SYSDATE;

 -- we will wait up to 90 seconds for the job to complete 
 WHILE NOT lJobCompleted AND lCurrentTime < SYSDATE + 1.5/1440
 LOOP 
  BEGIN

   SELECT STATUS
   INTO lJobStatus
   FROM USER_SCHEDULER_JOB_RUN_DETAILS
   WHERE JOB_NAME=lJobName;

  EXCEPTION WHEN NO_DATA_FOUND THEN -- may happen in the first few seconds of this execution
   NULL;
  END;

  IF lJobStatus = 'SUCCEEDED' THEN
   lJobCompleted := TRUE;

  ELSIF lJobStatus = 'FAILED' THEN
   -- grab the failure reason and return as an error
   SELECT ADDITIONAL_INFO
   INTO lJobFailReason
   FROM USER_SCHEDULER_JOB_RUN_DETAILS
   WHERE JOB_NAME=lJobName;

   RAISE_APPLICATION_ERROR(-20001, 'Job ' || lJobName || ' failed: ' || SUBSTR(lJobFailReason, 1, 128));
  END IF; 

 END LOOP;

 -- return a timeout error if the job hasn't completed within 90 seconds
 IF NOT lJobCompleted THEN  
  RAISE_APPLICATION_ERROR(-20001, 'Job Timeout: Job ' || lJobName || ' is taking longer than 90 seconds to execute, exiting.');
 END IF;

 -- locate the output file and open it
 lOutputBFile := BFILENAME('TRACE_DIR', lOutputFilename);

 DBMS_LOB.OPEN(lOutputBFile, DBMS_LOB.LOB_READONLY);   

 -- initalize the output clob variable
 DBMS_LOB.CREATETEMPORARY (
   lob_loc => outTraceClob,
   cache   => false);

 -- load the clob with the output file contents
 DBMS_LOB.LOADCLOBFROMFILE (
   dest_lob      => outTraceClob, 
   src_bfile     => lOutputBFile, 
   amount        => DBMS_LOB.LOBMAXSIZE, 
   bfile_csid    => 0,
   warning       => lWarning,
   lang_context  => lLangContext,
   dest_offset   => lDoffset, 
   src_offset    => lSoffset);

 -- grab the length of the output file as filler for this variable,
 -- probably unnecessary but at least it acts as a poorman's checksum :-)
 outRtrnMsg := lOutputFilename || ': length is ' || DBMS_LOB.GETLENGTH(outTraceClob);

EXCEPTION WHEN OTHERS THEN 
 outRtrnMsg := SQLERRM;
END;


CREATE OR REPLACE FUNCTION fncGetMyTrace RETURN CLOB
IS
 PRAGMA AUTONOMOUS_TRANSACTION;
 
 lRtrnMsg   VARCHAR2(4000);
 lTraceClob CLOB;
BEGIN 
 prcGetMyTrace(lRtrnMsg, lTraceClob);
 
 IF lTraceClob IS NULL THEN
  RAISE_APPLICATION_ERROR(-20001, lRtrnMsg);
 ELSE
  RETURN lTraceClob;
 END IF;

END;



No comments:

Post a Comment