Tuesday, 16 August 2011

RAC Implementation

Hi guyz i would like to share some RAC implementation points

 
First OS should be installed in our nodes, this is will be taken care by the OS Team.

 
For RAC setup, Oracle has provided Clusterware, Oracle Clusterware is the cross platform cluster software required to run the Real Application Clusters (RAC) option for Oracle Database. It provides the basic clustering services at the operating system level that enable Oracle software to run in clustering mode.

 
Important files of Clusterware:-

 
  • OCR
  • VOTING DISK/VOTING FILE
  • Two RAW disk  for OCR and Voting file
  • OCFS to place OCR/Voting disk(optional)
  • ASM [it is still not available, because ASM binaries comes along with oracle binaries. we didnt' install Oracle yet]
  • Oracle Home [Oracle binaries will be stored]
If we need to create a database, storage area should be in place, so that we need to start ASM instance in all the nodes.
In Oracle 11gR2 we can place OCR & Voting File in ASM Diskgroup since ASM binaries comes bundled along with Clusterware binaries that is grid infrastructure services for cluster
Network configuration are done by the network team: Configuring Public IP, Private IP and Virtual IP.

 
Configuration of SSH: 

Whenever we create oracle users in all the participating nodes the groupid, userid should be identical in all the nodes. if we create group or users OS(Operating system) will recognize based on the id. otherwise the cluster installation will fail. Using RSA/DSA(Algorithm we are going to configure SSH from the /home/oracle/.ssh[incidentally the user's home directory]. Once the SSH is configured the password is encrypted using 128 bit encryption algorithm

11gR2----> click to configure SSH automatically(RSA/DSA) copy the public keys from the nodes and paste it or make it available in all the participating nodes.

eg: 2 nodes===>4 public keys

High Level RAC Implementation:

  1. Install hte same version of OS in all the participating nodes. 
  2. Create the required number of groups and users accounts in all the cluster nodes(groupid and userid should be identical in all the nodes)
  3. Create the directory structures for CRS_HOME and DB_HOME in all the cluster nodes
  4. Configure Kernel parameters & set the semaphore settings as per oracle  document.
  5. set shell limits for oracle user account
  6. edit /etc/hosts and specify the public/private ip & Virtual IP for all the cluster nodes
  7. Establish trust relationship and user equivalence by configuring SSH
  8. create the required number of partitions in the shared storage for OCR, Voting File and ASM Diskgroups
  9. if Implementing RAC in Linux using ocfs2 and ASM interface download the corresponding RPM based on the kernel version of the node's OS.
  10. ensure the date and time are close as possible in all the cluster nodes.
All the above steps will be configured by the OS Admin and Network Admin with the co-ordination of Oracle DBA. Here starts the work of DBA
  1. Install Clusterware in the first node and at the end of the installation if cluvfy fails it is considered as a bug in 10g. Workaround according to the oracle metalink.
  2. configure listener
  3. configure ASM instance and create diskgroups needed for it
  4. Create RAC database in the shared storage area. 

Sunday, 14 August 2011

ASM Instance Creation(RAC)

We have two node setup,
RAC1.corp.com--->node1
RAC2.corp.com--->node2

all done, network is pinging across the nodes.

We decided to go with a database that is working in the Oracle's Logical Volume Manager, ASM.

First to setup this we need to create an ASM Instance, while we creating an ASM Instance we received an error which was really frustating for to solve later we found it that this error was because of some human error this has occurred.


In this message box it was clearly mentioning us to execute the above script.

#localconfig delete
#localconfig add



# /u01/app/oracle/product/11.1.0/db_1/bin/localconfig add
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Configuration for local CSS has been initialized

Cleaning up Network socket directories
Setting up Network socket directories
Adding to inittab
Startup will be queued to init within 30 seconds.
Checking the status of new Oracle init process...
Expecting the CRS daemons to be up within 600 seconds.
Giving up: Oracle CSS stack appears NOT to be running.
Oracle CSS service would not start as installed
Automatic Storage Management(ASM) cannot be used until Oracle CSS service is started



when this error occurred giving up: there are two options we can go for try to localconfig add or reset or contact oracle support.

when we browsed for many docs in the oracle metalink, we could not find the one which was suitable for us.

Our team decided to go for  Service Request, but i told them let us check from the first what we did and how we did then we can raise an service request.

when we started our process of rechecking all the steps, I recollected that we got an error at the end of the clusterware installation.



I'm sure that we executed this command, coz i did it on my own[Risk Reduction]. later i told the team to execute the mandatory step root.sh. Actually i told them to execute it few minutes before we got this error. At the heat of the moment they forgot it. But some team members were denying it, later we came to conclusion that we didn't execute this root.sh

Workaround


when we ran root.sh from the /crs_home/bin .. it worked fine!!

then we ran the #localconfig delete

#localconfig add

Now the services evmd and cssd are up and running.

Cheers!!





OCFS2 was not mounting ocfs2_hb_ctl

OCFS--->Oracle cluster file system, it was not mounting. We have two node RAC.

RAC1.corp.com----> first node
RAC2.corp.com----> second node

Here in this configuration ocfs was used to store OCR and Voting disk. Since OCFS2 is a seperate mount  point, it was not mounting because of some error.

when we typed df -h, all the mount points were shown except the /ocfs which was storing OCR & Voting Disk. Without these two files we cannot run the databases. very crucial for the RAC systems.

when we tried to mount the /ocfs manually,
# mount -t /ocfs2 -a
 we found this error




Workaround:


first i checked the network connections between the nodes...


when i pinged the node2 it was not pinging and it is actually getting routed to vip of the local node. then found it because of the device name, changed it later the network was working fine, after that

stopped ocfs:
# ./o2cb stop ocfs2

then started ocfs:
# /etc/init.d/o2cb start

then mounted the ocfs::

mount -t ocfs2 -o datavolume,nointr /dev/sdb1 ocfs

it worked!!!