Wednesday, July 15, 2015

Ambari Metrics

Ambari Metrics System("AMS") is a system for collecting, aggregating and serving Hadoop and system metrics in Amabari-managed clusters.

-> it was introduced with Ambari2.0.0

AMS: The built-in metrics collection system for Ambari
Metrics Collector: The standlone server that collects metics, aggregates metrics,serves metrics from the Hadoop service sinks and the Metrics Monitor
Metrics Monitor:Installed on each host in the cluster to collect system-level metrics and forward to the Metrics Collector.
Metrics Hadoop Sinks:Plugs into the various Hadoop components sinks to send Hadoop metrics to the Metrics Collector.











The Metrics Collector is daemon that receives data from registered publishers (the Monitors and Sinks). The Collector itself is build using Hadoop technologies such as HBase Phoenix and ATS. The Collector can store data on the local filesystem (referred to as "embedded mode") or use an external HDFS (referred to as "distributed mode").

-------------------------------------------------------------
Note: Restarting Metrics Collector and Metrics Monitor services will fix some cache issue if you din't re-start your services more then 30 to 45 days.

Basis Commands to Trouble shoot the issues:
top
netstat -ntupl | grep 39025
/etc/ambari-metrics-collector/conf
grep -i heapsize *
ams-env.sh:export AMS_COLLECTOR_HEAPSIZE=2048m( we changed it from 1024m to 2048m)
metrics_collector_heapsize & hbase_master_heapsize --> increased from 1024m to 2048m
jstack -l 31823
pstack
pstack 31823
Metrics Collector pid dir:
cd /var/run/ambari-metrics-collector/
ls -alrt
cat *pid
cat ambari-metrics-collector.pid
18856
netstat -ntupl | grep 18856
Restart will fix most of isues.

Metrics Collector installed on 17 server
Metrix Monitor is installed on all the nodes
Metrics Service operation mode --distributed( Storing Metrics in HDFS, hbase.rootdir=hdfs://abc01/amshbase)
Metrics service checkpoint delay --60 sec
hbase.cluster.distributed --true

hbase.rootdir Owner will display as ams:
drwxrwxr-x   - ams            hdfs              0 2015-07-15 06:39 /amshbase

metrics_collector_heapsize --1024m or 2048m
hbase_master_heapsize --1024m or 2048m

Error:
MetricsPropertyProvider:201 - Error getting timeline metrics. Can not connect to collector, socket error. 
MetricsPropertyProvider:201 - Error getting timeline metrics. Can not connect to collector, socket error. 
MetricsPropertyProvider:201 - Error getting timeline metrics. Can not connect to collector, socket error. 
MetricsPropertyProvider:201 - Error getting timeline metrics. Can not connect to collector, socket error. 

INFO [main-SendThread(localhost:61181)] ClientCnxn:975 - Opening socket connection to server localhost/ 127.0.0.1:61181. Will not attempt to authenticate using SASL (unknown error) WARN  [main-SendThread (localhost:61181)] ClientCnxn:1102 - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused

20191, exception=org.apache.hadoop.hbase.MasterNotRunningException: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.


No comments:

Post a Comment