Monitoring Apache Yarn Monitoring
Hertzbeat monitors Apache Yarn node monitoring metrics.
Protocol Used: HTTP
Pre-monitoring Actions
Retrieve the HTTP monitoring port of Apache Yarn. Value: yarn.resourcemanager.webapp.address
Configuration Parameters
| Parameter Name | Parameter Description |
|---|---|
| Target Host | IP address, IPV6, or domain name of the monitored endpoint. Without protocol header. |
| Port | Monitoring port number of Apache Yarn, default is 8088. |
| Query Timeout | Timeout for querying Apache Yarn, in milliseconds, default is 6000 milliseconds. |
| Metrics Interval | Time interval for monitoring data collection, in seconds, minimum interval is 30 seconds. |
Collected Metrics
Metric Set: ClusterMetrics
| Metric Name | Unit | Metric Description |
|---|---|---|
| NumActiveNMs | Number of currently active NodeManagers | |
| NumDecommissionedNMs | Number of currently decommissioned NodeManagers | |
| NumDecommissioningNMs | Number of nodes currently decommissioning | |
| NumLostNMs | Number of lost nodes in the cluster | |
| NumUnhealthyNMs | Number of unhealthy nodes in the cluster |
Metric Set: JvmMetrics
| Metric Name | Unit | Metric Description |
|---|---|---|
| MemNonHeapCommittedM | MB | Current committed size of non-heap memory in JVM |
| MemNonHeapMaxM | MB | Maximum available non-heap memory in JVM |
| MemNonHeapUsedM | MB | Current used size of non-heap memory in JVM |
| MemHeapCommittedM | MB | Current committed size of heap memory in JVM |
| MemHeapMaxM | MB | Maximum available heap memory in JVM |
| MemHeapUsedM | MB | Current used size of heap memory in JVM |
| GcTimeMillis | JVM GC time | |
| GcCount | Number of JVM GC occurrences |
Metric Set: QueueMetrics
| Metric Name | Unit | Metric Description |
|---|---|---|
| queue | Queue name | |
| AllocatedVCores | Allocated virtual cores (allocated) | |
| ReservedVCores | Reserved cores | |
| AvailableVCores | Available cores (unallocated) | |
| PendingVCores | Blocked scheduling cores | |
| AllocatedMB | MB | Allocated (used) memory size |
| AvailableMB | MB | Available memory (unallocated) |
| PendingMB | MB | Blocked scheduling memory |
| ReservedMB | MB | Reserved memory |
| AllocatedContainers | Number of allocated (used) containers | |
| PendingContainers | Number of blocked scheduling containers | |
| ReservedContainers | Number of reserved containers | |
| AggregateContainersAllocated | Total aggregated containers allocated | |
| AggregateContainersReleased | Total aggregated containers released | |
| AppsCompleted | Number of completed applications | |
| AppsKilled | Number of killed applications | |
| AppsFailed | Number of failed applications | |
| AppsPending | Number of pending applications | |
| AppsRunning | Number of currently running applications | |
| AppsSubmitted | Number of submitted applications | |
| running_0 | Number of jobs running for less than 60 minutes | |
| running_60 | Number of jobs running between 60 and 300 minutes | |
| running_300 | Number of jobs running between 300 and 1440 minutes | |
| running_1440 | Number of jobs running for more than 1440 minutes |
Metric Set: runtime
| Metric Name | Unit | Metric Description |
|---|---|---|
| StartTime | Startup timestamp |