Where have my #Oracle Parallel Query slaves gone?

Recently I started looking at where we could implement HCC for one of our customers who is running on a SuperCluster.

After skimming through the dictionary, I narrowed down a candidate list, which was then vetted by one of the application users. Given my past experience with HCC in the telco space, I figured it would be a relatively straight-forward exercise. Why is it that, sometimes, it never is!

In my test bed, I explicitly declared a few parallel_ parameters which I had bedded down in past testing. In particular, I set the following (This is on

alter session set parallel_force_local=true ;

This is to restrict PQ activity to the local instance, and should data require sorting (Which would benefit compression and the efficacy of Exadata storage indexes), sort extents allocated to the instance would be reused

alter session set parallel_degree_limit=32;

Works in conjunction with Automatic DOP to assist in limiting the number of PQ slaves assigned to queries. The value of 32 is a number based on past testing and real-world experiences

alter session set parallel_degree_policy='LIMITED' ;

This facilitates the use of Automatic DOP in its simplest guise, assuming other prerequisites are met

Yet with each test that I ran (I did about 4 iterations), I could see that my CTAS (Create Table As Select) command was consistently downgraded! #startscratchinghead

Over to my good friend…My Oracle Support. After some trial and error, I found a note that provided instructions on how to enable tracing of parallel execution, using this command…

SQL> alter session set "_px_trace"="compilation","execution","messaging";

That’s one of the things I really love about the #Oracle database, in many places you can drill down and get a feel for what and why things are happening under the covers. Extended SQL trace is another wonderful example of this.

From the resultant trace file, I found a key piece of information which cleared things up quickly enough

        Getting instance info for default group

        kxfplist status:2 #insts:2

        2017-02-27 14:41:05.433941*:PX_Messaging:kxfp.c@18569:kxfpiinfo(): 

        inst [cpus:mxslv:#pg]

        1    [1   :128  :1  ]

        2    [1   :128  :1  ]

        instance_id:1, cpu_count:1 #pg:1

        instance_id:2, cpu_count:1 #pg:1

          total cpu_count:2 #pg:2

The system believes there are only two cores across the cluster! Turns out, the previous incumbents had this parameter explicitly set, and as we were tasked to take the databases over in a “lift and shift” fashion, no parameter changes were implemented.

SYS> show parameter cpu_
NAME                                 TYPE                             VALUE

------------------------------------ -------------------------------- ------------------------------

cpu_count                            integer                          2

resource_manager_cpu_allocation      integer                          120

The environment this database runs on has 120 vCPUs per node, so resetting the parameter from the spfile and modifying it dynamically has since resolved the issue.

What’s the takeaway? Don’t only just look at the parallel_* parameters when trying to diagnose parallel query problems 🙂

Leave a Comment