Oracle/Sun Solaris - Fault Manager - Memory and CPU
Description
This connector parses fmadm faulty command output looking for faulty memory modules.
This connector requires the Enterprise edition of MetricsHub.
This connector is superseded by:
Target
Typical platform: Sun/Oracle Servers
Operating system: Oracle Solaris
Prerequisites
Leverages: Sun Solaris system commands (fmadm)
Technology and protocols: Commands
This connector requires advanced privileges on the managed host for the command below:
/usr/sbin/fmadm
This connector therefore needs to run as root or you need to configure a privilege-escalation mechanism like sudo on the managed host to allow the monitoring account to run the command listed above.
Sample of /etc/sudoers to allow the above command to be run as root by the metricshub account:
metricshub ALL=(root) NOPASSWD: /usr/sbin/fmadm
Examples
CLI
metricshub HOSTNAME -t solaris -c +SunFmadm --ssh -u USER --sudo-command-list /usr/sbin/fmadm
metricshub.yaml
resourceGroups:
<RESOURCE_GROUP>:
resources:
<HOSTNAME-ID>:
attributes:
host.name: <HOSTNAME> # Change with actual host name
host.type: solaris
connectors: [ +SunFmadm ] # Optional, to load only this connector
protocols:
ssh:
username: <USERNAME> # Change with actual credentials
password: <PASSWORD> # Encrypted using metricshub-encrypt
useSudo: true
useSudoCommands: [ "/usr/sbin/fmadm" ]
Connector Activation Criteria
The Oracle/Sun Solaris - Fault Manager - Memory and CPU connector will be automatically activated, and its status will be reported as OK if all the below criteria are met:
- The device type must be one of: SunOS, Solaris
- The command below succeeds on the monitored host:
- Command:
/bin/uname -r - Output contains:
5\.1[0-9](regex)
- Command:
- The command below succeeds on the monitored host:
- Command:
/usr/sbin/fmadm faulty;/usr/bin/echo errorlevel $? - Output contains:
^errorlevel 0$(regex)
- Command:
- The command below succeeds on the monitored host:
- Command:
/usr/sbin/fmadm config | grep cpumem - Output contains:
active(regex)
- Command:
Metrics
| Type | Collected Metrics | Specific Attributes |
|---|---|---|
| memory | hw.memory.limithw.status{hw.type="memory", state="degraded|failed|ok"}hw.status{hw.type="memory", state="present"} | hw.parent.typeidname |
| other_device | hw.status{hw.type="other_device", state="degraded|failed|ok"}hw.status{hw.type="other_device", state="present"} | device_typehw.parent.typeidname |