I'm often asked to help developers and analysts debug problematic SQL, and usually it comes down to tracing sessions running that SQL. A SQL (or, aka session) trace is simply a request to the Oracle server to divulge any wait events, resource stats, and execution plans for the SQL and PL/SQL statements running under that session during the trace. As DBAs know, these traces are stored in flat files on the database server out of reach of anyone but system and database administrators.
For development and testing databases, developers and analysts need as much information as possible to work out their SQL and PL/SQL so that production deployments of their code don't incur any unwelcome performance problems on release day. Besides, DBAs who also manage production databases will prioritize issues and requests there before fetching some trace file off the test database server for every other random business analyst out there with execute privileges on DBMS_MONITOR. Scrum milestones be damned, production comes first, and some DBAs view anything but as a distant second.
So, why not let users run a database query that returns their trace files with TKPROF applied to it? (Briefly for those who don't know, TKPROF is a small program provided by Oracle that sits on the database server and renders trace files more readable by aggregating waits and resource stats.) Please note that this is not recommended for use in production unless it is strictly for a select few savvy individuals who use it sparingly over encrypted database connections. Trace files can fill up server filesystems quickly so a prudent approach should be considered for putting this code in production. Also, traces can contain bind data even if you're not intending to track it. Bind data can include PII and PHI in the form of literals in ad-hoc queries or hard-coded into views and PL/SQL objects. So, if your database connection isn't encrypted and your queries could contain PII/PHI, don't use this at all!
Before I dive into an explanation of the PL/SQL code, we have to set up O/S permissions and ownership for a few files on the database server since we will be using TKPROF there as an external procedure. The following Linux O/S commands assume that oracle is the user that owns the Oracle installation and oinstall is the group that owns the Oracle Home:
Before I dive into an explanation of the PL/SQL code, we have to set up O/S permissions and ownership for a few files on the database server since we will be using TKPROF there as an external procedure. The following Linux O/S commands assume that oracle is the user that owns the Oracle installation and oinstall is the group that owns the Oracle Home:
$ sudo chown root:oinstall $ORACLE_HOME/rdbms/admin/externaljob.ora
$ sudo chmod 640 $ORACLE_HOME/rdbms/admin/externaljob.ora
$ sudo chown root:oinstall $ORACLE_HOME/bin/extjob
$ sudo chown root:oinstall $ORACLE_HOME/bin/extjobo
$ sudo chmod 4750 $ORACLE_HOME/bin/extjob
$ sudo chmod 4750 $ORACLE_HOME/bin/extjobo
The next step is to make sure that externaljob.ora has the run_user and run_group variables correctly set to the oracle user and oinstall group respectively. Oracle ships this file with "nobody" set for both. The following is what I've edited in my externaljob.ora:
run_user = oracle
run_group = oinstall
Now, on to the database privileges. It's important that you create a dedicated user account specifically for this code as it will be able to execute command-line utilities as oracle on the database server. In our case, it will be TKPROF which isn't as dangerous as, say, rm -rf *. This account needs the system privileges CREATE SESSION, RESOURCE, CREATE JOB, and CREATE EXTERNAL JOB. It also needs SELECT privilege granted on SYS.V_$MYSTAT, SYS.V_$SESSION, and SYS.V_$PROCESS as well as the EXECUTE privilege on DBMS_SCHEDULER. Then, a directory object called TRACE_DIR needs to be created for your trace filesystem directory (check the value for "Diag Trace" in V$DIAG_INFO). Make sure this account has the READ privilege to the TRACE_DIR directory object.
Finally, the code. There is one PL/SQL stored procedure that contains all the logic and one PL/SQL function which is a wrapper for the stored procedure so that it can be called from a query. The stored procedure gets the invoker's session ID from which its process ID is determined. Trace files are templated as "database name_ora_process ID.trc" so we know which trace file to look for on the server for our particular session. Then, we create the job with the proper arguments including the output file which is simply the trace filename with a txt extension instead of trc. We give the job 90 seconds to run before timing out, but this is only to prevent an infinite loop as it only takes a few seconds for most traces to be processed by TKPROF. Once the job completes, we pull the file into a CLOB and return it as an out variable. In a separate out variable, I return the size of the CLOB on successful completions; otherwise, it is used to report the error for failures. Remember to grant to your users the EXECUTE privilege on the PL/SQL function only as well as DBMS_MONITOR if you haven't already done so.
One note about the PL/SQL wrapper function is the use of PRAGMA AUTONOMOUS_TRANSACTION. If you do not use it, Oracle will error on the use of DDL in a referenced object, in this case DBMS_SCHEDULER.
Finally, the code. There is one PL/SQL stored procedure that contains all the logic and one PL/SQL function which is a wrapper for the stored procedure so that it can be called from a query. The stored procedure gets the invoker's session ID from which its process ID is determined. Trace files are templated as "database name_ora_process ID.trc" so we know which trace file to look for on the server for our particular session. Then, we create the job with the proper arguments including the output file which is simply the trace filename with a txt extension instead of trc. We give the job 90 seconds to run before timing out, but this is only to prevent an infinite loop as it only takes a few seconds for most traces to be processed by TKPROF. Once the job completes, we pull the file into a CLOB and return it as an out variable. In a separate out variable, I return the size of the CLOB on successful completions; otherwise, it is used to report the error for failures. Remember to grant to your users the EXECUTE privilege on the PL/SQL function only as well as DBMS_MONITOR if you haven't already done so.
One note about the PL/SQL wrapper function is the use of PRAGMA AUTONOMOUS_TRANSACTION. If you do not use it, Oracle will error on the use of DDL in a referenced object, in this case DBMS_SCHEDULER.
CREATE OR REPLACE PROCEDURE prcGetMyTrace (outRtrnMsg OUT VARCHAR2, outTraceClob OUT CLOB)
AS
lTimestamp VARCHAR2(14);
lCurrentTime DATE;
lProcessID NUMBER;
lSessionID NUMBER;
lTraceState VARCHAR2(8);
lOutputBFile BFILE;
lDoffset INTEGER := 1;
lSoffset INTEGER := 1;
lLangContext INTEGER := 0;
lWarning INTEGER := 0;
lJobName VARCHAR2(128);
lJobStatus VARCHAR2(30);
lJobCompleted BOOLEAN := FALSE;
lJobFailReason VARCHAR2(128);
cTKPROF CONSTANT VARCHAR2(43) := '/opt/oracle/product/19c/dbhome_1/bin/tkprof';
cTraceDir CONSTANT VARCHAR2(45) := '/opt/oracle/diag/rdbms/orclcdb/ORCLCDB/trace/';
lTraceFilename VARCHAR2(64);
lOutputFilename VARCHAR2(64);
BEGIN
-- the timestamp will be used with the session ID and process ID to create a unique job name
SELECT TO_CHAR(SYSTIMESTAMP, 'YYMMDDHH24MISSFF2')
INTO lTimestamp
FROM DUAL;
-- besides being part of the unique job name, the session ID will be used to grab the process ID
-- which is part of the filename of the trace file we want to process with TKPROF and return to the user
SELECT DISTINCT SID
INTO lSessionID
FROM SYS.V_$MYSTAT;
-- obviously SQL tracing has to have been previously enabled by the user or else we have to fail here
SELECT SQL_TRACE
INTO lTraceState
FROM SYS.V_$SESSION
WHERE SID=lSessionID;
IF lTraceState <> 'ENABLED' THEN
RAISE_APPLICATION_ERROR(-20001, 'SQL Tracing is not enabled for your session. Use DBMS_MONITOR to enable tracing.');
END IF;
-- grabbing the process ID here
SELECT SPID
INTO lProcessID
FROM SYS.V_$PROCESS
WHERE ADDR IN (SELECT PADDR FROM SYS.V_$SESSION WHERE SID=lSessionID);
-- assigning the name variables for the unique job and filenames
lJobName := 'TKPROF_' || TO_CHAR(lProcessID) || '_' || lTimestamp;
lTraceFilename := 'ORCLCDB_ora_' || TO_CHAR(lProcessID) || '.trc';
lOutputFilename := 'ORCLCDB_ora_' || TO_CHAR(lProcessID) || '.txt';
-- create the TKPROF job as an O/S executable with 4 arguments
DBMS_SCHEDULER.CREATE_JOB (
job_name => lJobName,
job_type => 'EXECUTABLE',
job_action => cTKPROF,
number_of_arguments => 4,
enabled => FALSE
);
-- arg 1: the full path to the trace file
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
job_name => lJobName,
argument_position => 1,
argument_value => cTraceDir || lTraceFilename);
-- arg 2: the full path to the output file
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
job_name => lJobName,
argument_position => 2,
argument_value => cTraceDir || lOutputFilename);
-- arg 3: set sys=no
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
job_name => lJobName,
argument_position => 3,
argument_value => 'sys=no');
-- arg 4: set waits=yes
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
job_name => lJobName,
argument_position => 4,
argument_value => 'waits=yes');
-- run the job
DBMS_SCHEDULER.ENABLE (name => lJobName);
lCurrentTime := SYSDATE;
-- we will wait up to 90 seconds for the job to complete
WHILE NOT lJobCompleted AND lCurrentTime < SYSDATE + 1.5/1440
LOOP
BEGIN
SELECT STATUS
INTO lJobStatus
FROM USER_SCHEDULER_JOB_RUN_DETAILS
WHERE JOB_NAME=lJobName;
EXCEPTION WHEN NO_DATA_FOUND THEN -- may happen in the first few seconds of this execution
NULL;
END;
IF lJobStatus = 'SUCCEEDED' THEN
lJobCompleted := TRUE;
ELSIF lJobStatus = 'FAILED' THEN
-- grab the failure reason and return as an error
SELECT ADDITIONAL_INFO
INTO lJobFailReason
FROM USER_SCHEDULER_JOB_RUN_DETAILS
WHERE JOB_NAME=lJobName;
RAISE_APPLICATION_ERROR(-20001, 'Job ' || lJobName || ' failed: ' || SUBSTR(lJobFailReason, 1, 128));
END IF;
END LOOP;
-- return a timeout error if the job hasn't completed within 90 seconds
IF NOT lJobCompleted THEN
RAISE_APPLICATION_ERROR(-20001, 'Job Timeout: Job ' || lJobName || ' is taking longer than 90 seconds to execute, exiting.');
END IF;
-- locate the output file and open it
lOutputBFile := BFILENAME('TRACE_DIR', lOutputFilename);
DBMS_LOB.OPEN(lOutputBFile, DBMS_LOB.LOB_READONLY);
-- initalize the output clob variable
DBMS_LOB.CREATETEMPORARY (
lob_loc => outTraceClob,
cache => false);
-- load the clob with the output file contents
DBMS_LOB.LOADCLOBFROMFILE (
dest_lob => outTraceClob,
src_bfile => lOutputBFile,
amount => DBMS_LOB.LOBMAXSIZE,
bfile_csid => 0,
warning => lWarning,
lang_context => lLangContext,
dest_offset => lDoffset,
src_offset => lSoffset);
-- grab the length of the output file as filler for this variable,
-- probably unnecessary but at least it acts as a poorman's checksum :-)
outRtrnMsg := lOutputFilename || ': length is ' || DBMS_LOB.GETLENGTH(outTraceClob);
EXCEPTION WHEN OTHERS THEN
outRtrnMsg := SQLERRM;
END;
CREATE OR REPLACE FUNCTION fncGetMyTrace RETURN CLOB
IS
PRAGMA AUTONOMOUS_TRANSACTION;
lRtrnMsg VARCHAR2(4000);
lTraceClob CLOB;
BEGIN
prcGetMyTrace(lRtrnMsg, lTraceClob);
IF lTraceClob IS NULL THEN
RAISE_APPLICATION_ERROR(-20001, lRtrnMsg);
ELSE
RETURN lTraceClob;
END IF;
END;
No comments:
Post a Comment