Lessons learnt working with Oracle support

This one can be a bit of a frustrating, hair pulling and (Sometimes) time consuming exercise! While I wish I could have provided you all with a silver bullet, I’m afraid it doesn’t exist, but I’d like to share some of the lessons learnt while engaging with Oracle support which may hopefully be of benefit to you as well.

In one particular example, our team were informed of several core dump messages being written to OS messages file, which were a cause for concern since this machine hosted the entire OEM environment and also served as a media server for backups.

Apr 10 05:51:04 <insert_machine_name> abrt[15241]: Saved core dump of pid 14607 (/u01/app/oracle/agent/agent_13. to /var/spool/abrt/ccpp-2017-04-10-05:50:31-14607 (33535582208 bytes) 

In $AGENT_INST/sysman/emd directory 
-rw-------. 1 oracle oinstall 33535582208 Apr 10 05:51 core.14607 
-rw-r-----. 1 oracle oinstall 2016 Apr 10 05:50 hs_err_pid14607.log 

Contents of the hs_err_pid*.log
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007faee7191000, 12288, 0) failed; error='Cannot allocate memory' (errno=12) 
# There is insufficient memory for the Java Runtime Environment to continue. 
# Native memory allocation (malloc) failed to allocate 12288 bytes for committing reserved memory. 
# An error report file with more information is saved as: 
# /u01/app/oracle/agent/agent_inst/sysman/emd/hs_err_pid14607.log 
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007faee7393000, 12288, 0) failed; error='Cannot allocate memory' (errno=12) 

The initial investigations didn’t yield much information, and we logged a SR with support to identify the root cause.

*I had been considering placing some of the contents of the actual SR in here, but have decided against doing so*

Unfortunately, we got a generic “Contact your system admin team to find out why memory is being depleted“. Needless to say I wasn’t happy! But instead of responding in the moment, I let it sit for a day or two and once I had channelled my inner peace, responded in a manner to engage the support analyst to not shrug me off but help me pinpoint the problem.

Once the understanding was clear,we made good progress! To such an extent that we actually identified that the ZFS plugin was actually causing the agent core dumps. The plugin issue was then handed off to a different SR, but for all intents and purposes, this one was closed (The ZFS plugin was undeployed from the agent in the interim, resulting in no further core dumps).

Many of the Oracle gurus I really look up have always emphasized the point of properly understanding the problem before doing any form of troubleshooting. That’s the first thing to get across.

Next up, is to provide as much relevant information as possible. Don’t upload tons of information for the sake of it, but these days the SR logging procedure prompts you for various bits of information in order to provide faster resolution. So if a Remote Diagnostic Agent output is required…upload it. Do they need Trace File Analyzer information? Provide it (In fact, look at upgrading to TFA 12.2 or higher, as it includes OSWatcher in the toolset)

You can also look at the contents of MOS note SRDC – Service Request Data Collection Catalog: All Products – Database – Exadata – EBS – Fusion – GBUs – JDE – Middleware – Peoplesoft – Siebel – Sun Systems (Doc ID 1987484.2). It provides a wealth of information providing you with all the steps needed to follow in terms of gathering information depending on the problem experienced.

Hope this helps.

Leave a Comment