Está en la página 1de 4

NSD Analysis

OK, so here goes on a lengthy post for the admins amongst us on NSD Analysis. An area I feel I know
quite well …. however as you’ll remember from my last post this is based on publicly available
information.

NSD (or Notes System Diagnostic) is the name given to software bundled in Domino to give a snapshot
of what the Domino system is doing. The tool produces text files with enormous ammounts of
information and can be run manually or will run automatically during a crash. Interested ? …… (it may
seem a bit dull but the information here could save you a lot of time!).

Memory

Without memory nothing works. In the operating system memory is divided into Kernel and User
Address Space. The Kernel looks after the OS, hardware drivers and communications with the
hardware. The User Address Space is where our applications run, and this includes Domino.

So when Domino crashes it happens in the User Address Space ….. this means that Domino won’t
directly cause a blue screen of death!!!! However Domino may, for example, be attempting to read or
write to an area of disk which could cause a kernel memory error remembering that the kernel must deal
with the disk.

As we know, Domino is made up of a number of individual processes (nserver, nreplica, nrouter etc).Â
Each of these processes all do their own little bit to make up the server. Each process is doing a
number of tasks at any one time, these are called threads. And within each thread there is a specific
set of individual actions. These are called function calls.

Crashes (in a paragraph!)

Yeah, Yeah, Blah, Blah so what does this mean for me? Well Domino is a fairly complex beast. Now
and again a thread will try and use some memory which is reserved or in use by another process. This
is a memory exception and at this point everything will go a bit messy. A panic will be recorded in the
thread, Domino will freeze everything that it is doing and the nsd task will run. This will gain a
snapshot of the environment immediately before the crash storing the important results in
\data\ibm_technical_support on either the client or server which has crashed.

Hangs

Hangs are a different beast and I’ll not do much here to go into them. To recognise a hang the easiest
way to look for the hang is to examine in real time the memory allocated to each Domino or Notes
process. Remember from earlier each process is made up of a number of threads. New threads are
constantly starting and old threads are constantly stopping. So for each process you should see the
memory allocated to that process changing with time. If you don’t see changes in the memory
allocated to a thread then you possibly have a hang. A server can recover from a hang. A hung
process may or may not prevent user sessions on a server. To troubleshoot a hang you need to run
the nsd process 3 times at 5 minute intervals and then engage IBM Support to help resolve the issue.Â

Running NSD Manually

The important thing to remember when running NSD is that by default it will kill all the processes ….. so
if you want to run it without killing the PIDs check out the extensions by running nsd -?. Normally
advice is to run nsd -detach as that leaves the processes alone after running.

The file

Well the file produced will always have a common naming convention:

type_plaftorm_systemname_date@time.log

Each platform has its own format and for sake of making this post a record length I’m going to stick to
Wintel.

Sections in Wintel NSD’s

First section is the header with system information, a list of each Domino instance and a list of the
processes running therein. You’ll see some strange entries for Found X processes, matched Y. If Y
is one less than X then providing you are running Domino as a Windows service don’t worry! nsd
examines all processes from nserver down, nservice is the parent of nserver. nsd sees nservice is
running but also sees it isn’t running under nserver so it says for example found 22 matched 21.

Next we have the process table. From here you can see all processes on the server. Processes nsd
recognises as Domino are indicated with “->”. The position of “[” denotes parent and child status -
indents denoting children. You’ll see nsd as a child of whichever process crashed.

OK so this section helps gather a picture of what was running on the server

Below this section there is a dump of each process, what files the process was using, and then
importantly a dump of each thread. On the thread which resulted in the crash the name will change
from thread to “fatal thread”. Best option once you have looked through the process table is to search
for “FATAL”.

Fatal Thread

So once you’ve searched for fatal you may see something like this:

### FATAL THREAD 39/83 [ nSERVER:07c0: 2764]


### FP=0743f548, PC=60197cf3, SP=0743ebd0, stksize=2424
Exception code: c0000005 (ACCESS_VIOLATION)
############################################################
@[ 1] 0×60197cf3 nnotes._Panic@4+483 (7430016,496dae76,0,496dace8)
@[ 2] 0×600018a4 nnotes._OSBBlockAddr@8+148 (1153f38,2000000,743f608,1)
@[ 3] 0×6000bd92 nnotes._CollectionNavigate@24+610 (0,743fc74,f,0)
@[ 4] 0×600626cc nnotes._ReadEntries@68+2860 (4c5440e8,4cfb8dba,800f,1)
@[ 5] 0×600b9f6f nnotes._NIFReadEntriesExt@72+351 (0,4cfb8dba,800f,1)
@[ 6] 0×10032d40 nserverl._ServerReadEntries@8+1424 (0,8d0c0035,4b64b5bc,4ae46dd6)
@[ 7] 0×100191fc nserverl._DbServer@8+2284 (41b0383,cb740064,0,23696f8)
@[ 8] 0×1002b8c8 nserverl._WorkThreadTask@8+1576 (4711d68,0,3,563fb10)
@[ 9] 0×100016cb nserverl._Scheduler@4+763 (0,563fb10,0,10ec334)
@[10] 0×6011e5e4 nnotes._ThreadWrapper@4+212 (0,10ec334,563fb10,0)
[11] 0×77e887dd KERNEL32.GetModuleFileNameA+465

So what does all this mean. Well the header block is fairly obvious.  Lines 1 through 11 are the
function calls that the thread performed. These are in sequence. For wintel 1 is the event closes to
the crash and 11 the event furthest from the crash. So the server performed 11, 10, 9, 8, …… 2, then
crashed and 1 shows the panic.

So what does each line mean? The @ sign means nsd has annotated it and recognised the thread as
a domino function. The 0x lines I assume to be the address (but someone may correct me). The bit
before the full stop is the class (nnotes, nserverl etc). The bit after the full stop and before the @ sign
is the function call.

So here the function calls are _ThreadWrapper, _Scheduler, _WorkThreadTask etc.

Call Stack

Listing all these functions we get the call stack.

Panic
OSBBlockAddr
_CollectionNavigate
_ReadEntries
_NIFReadEntriesExt
_ServerReadEntries
_DbServer
_WorkThreadTask
_Scheduler
_ThreadWrapper

Finding the fault


Well now is the point where you have some data which can be searched in the IBM Knowledge Base

My only tip here is to ensure a good search strip off the leading underscore, and also add * to the
beginning and end of the call stack. Take 2 items from the call stack list and search for them in turn.Â
i.e. search for 11 and 10, then 10 and 9 then ….. you need to compare your call stack with any call
stacks listed in the knowledgebase.

Reference material

• UNIX NSD Analysis : http://www-1.ibm.com/support/docview.wss?rs=0&uid=swg27003396

• Nash!com presentation : http://www.nashcom.de/nshweb/pages/lotusphere.htm

• Redbooks technote : http://www.redbooks.ibm.com/abstracts/tips0053.html?Open

• LDD Article : http://www-128.ibm.com/developerworks/lotus/library/domino-server-crashes/

REMEMBER IBM ARE THE EXPERTS

As a footnote please remember that locked in a deep vault somewhere in IBM is a team of people who
spend all day every day looking at NSD’s (and even having fun). They are experts. If you need to
examine an NSD I’d recommend before you start you log the call with IBM. While you are waiting for
them to get back to you have a go at resolving the NSD yourself.

También podría gustarte