Audit file pileup in $GRID_HOME/rdbms/audit

One of the items highlighted by the dashboard is where filesystem utilization is greater than 90% on any server where an Oracle instance is found to be running (There are various other servers maintained within the same OEM estate, so this filter was implemented to focus on the relevant hosts for the DBA team)

Here is how the interactive report for this particular item looked.

When building the initial revisions of the various items, I felt it important to get some context for greater insights and built in the historical indicators to compare the current value to the values of 7 and 30 days prior.

What the above image shows is that for some_host_01, there has been a gradual increase in the utilization for the /opt/apps filesystem over the past 30 days, which happens to be the mount point used for the Oracle software binaries.

That called for a logon to the host in question where, after some digging around, the following was discovered.

grid@some_host_01:/opt/apps/oracle/grid/12.2.0/rdbms> du -sh *
99M     admin
12G     audit
12K     demo
12K     doc
168K    install
7.2M    jlib
38M     lib
1.1M    log
70M     mesg
144K    public
464K    utl
45M     xml
grid@some_host_01:/opt/apps/oracle/grid/12.2.0/rdbms/audit> ls -1f | wc -l

grid@some_host_01:/opt/apps/oracle/grid/12.2.0/rdbms/audit> df -h .
Filesystem            Size  Used Avail Use% Mounted on
                       50G   44G  3.6G  93% /opt/apps

Well that wasn’t expected! This specific directory filling up with audit files and consuming wads of space is a known issue documented in various MOS notes [Manage Audit File Directory Growth with cron (Doc ID 1298957.1) being the most prominent], to the extent that scripts have been deployed and implemented via cron jobs on all our systems to manage it.

What we missed here was that after a GI upgrade from to, the shell script was not updated to reflect the new $GRID_HOME. Admittedly it was a bad idea on our part to hardcode the $GRID_HOME path in the first place and that’s another positive that’s come out of this, in that we’ve revised the script to be more flexible going forward. By the way, I’m including the line below as we’ve found it to be one of the more efficient methods to remove a large number of files on a Linux system.

grid@some_host_01:/opt/apps/oracle/grid/12.2.0/rdbms/audit> find . -name "*.aud" -type f -delete
grid@some_host_01:/opt/apps/oracle/grid/12.2.0/rdbms/audit> df -h .
Filesystem            Size  Used Avail Use% Mounted on
                       50G   32G   15G  68% /opt/apps

In closing, it’s yet another lesson to not take those “simple” administrative tasks for granted and a great win for the APEX Oracle Database Dashboard!


P.S. For those of you interested, here is the SQL that queries the OEM repository to produce the report as above. As with anything you find on the internet, please take care when executing this within your own environment

	, mc.key_value 
	, ROUND( 100 - mc.value, 2 ) current_val 
	, hist.day7 prev_week_val
	, hist.day30 prev_month_val
	, mc.collection_timestamp
FROM sysman.mgmt$metric_current mc, (
		, MAX( CASE WHEN prop.property_name = 'orcl_gtp_contact' THEN prop.property_value END ) contact
		, MAX( CASE WHEN prop.property_name = 'orcl_gtp_lifecycle_status' THEN prop.property_value END ) lifecycle
	FROM sysman.mgmt$target_properties prop
	GROUP BY target_guid 
) prop, ( 
	SELECT DISTINCT host_name FROM sysman.mgmt$target db WHERE target_type IN ( 'rac_database', 'oracle_database' ) 
) db, (
		, key_value
		, SUM( case when rollup_timestamp = trunc(sysdate)-7 then average else null end ) day7
		, SUM( case when rollup_timestamp = trunc(sysdate)-30 then average else null end ) day30
			, md.key_value
			, ROUND( 100 - md.average, 2 ) average
			, md.rollup_timestamp
		FROM sysman.mgmt$metric_daily md
		WHERE md.target_type = 'host' 
		AND md.metric_name = 'Filesystems' 
		AND md.metric_column = 'pctAvailable' 
		AND md.rollup_timestamp BETWEEN TRUNC(sysdate-30) AND TRUNC(sysdate - 7)  
	) GROUP BY target_guid, key_value 
) hist
WHERE mc.target_type = 'host' 
AND mc.metric_name = 'Filesystems' 
AND mc.metric_column = 'pctAvailable' 
AND mc.key_value != '/proc'
AND ROUND( 100 - mc.value, 2 ) > 90
AND mc.target_name = db.host_name
AND mc.target_guid = prop.target_guid
AND mc.target_guid = hist.target_guid
AND mc.key_value = hist.key_value
AND prop.lifecycle IN ('MissionCritical','Production')
ORDER BY 1, 2, 4 ;

Leave a Comment