For a user-facing system like Apache Impala, bad performance and downtime can have serious negative impacts on your business. Given the complexity of the system and all the moving parts, troubleshooting can be time-consuming and overwhelming.
In this blog post series, we are going to show how the charts and metrics on Cloudera Manager (CM) can help troubleshoot Impala performance issues. They can also help to monitor the system to predict and prevent future outages.
CM provides a comprehensive suite of time-series and pre-aggregated metrics and charts at varying levels of granularity to ease the pain of diagnosing and troubleshooting CDH. With so many metrics available today, it becomes imperative to know which metrics to look at, and when and how to look at them.
In this blog post, we cover the various CM metrics for monitoring and troubleshooting specific issues with Impala metadata . Observing trends and outliers in these metrics helps identify concerning behavior and implement best practices proactively. The next post will cover metrics pertaining to ImpalaD processes, the roles of coordinators and executors and highlight OS/system hardware-level monitoring. This helps identify possible hotspots and troubleshoot query performance.
To get started with a custom dashboard, go to Charts → Create Dashboard and enter a name for the dashboard. Then either use the default or set the duration you want it to cover. You can then add charts to the dashboard based on the metrics you’d like to view. To learn more about building dashboards, please visithere .
CM also provides the capability to import tsqueries in JSON format—a file for all the below charts can be found here . You are required to replace the entity name placeholders with entity names and/or host IDs. The entity name or host id can be found using any of the charts on the status page of the service component.
The customized dashboard from the tsqueries look similar to this:
Impala caches metadata for speed. The caching mechanism requires loading metadata from persistent stores, like Hive MetaStore, NameNode, and Sentry by CatalogD. This is subsequently compressed and sent to the Statestore to be broadcast to dedicated coordinators. Such a complex system is easily subject to numerous bottlenecks which make it imperative to monitor the key relationships among Impala’s components.
The following diagram shows how the catalog and statestore service interacts with other parts of Impala’s distributed system, both internal and external.
While most metadata operations are lightweight or trivial and thus have little to no impact on performance, there are a number of situations in which metadata operations can negatively affect performance. These “metadata workload anti-patterns,” can negatively affect the performance as data, users, and applications scale up. They may cause scalability snags. This makes it necessary to monitor the metadata growth rate, identify anti-patterns, and take preventative measures to ensure smooth functioning.
Some of the top anti-patterns are listed below:
Longer planning wait time and slow DDL statement execution can be an indication of Impala hitting performance issues as a result of metadata load on the system. To identify proactively , you can monitor and study the Planning Wait Time and Planning Wait Time Percentage visualization, which can be imported from Clusters → Impala → Best Practices and the DDL Run time metric, which can be built using the below tsquery:
SELECT query_duration from IMPALA_QUERIES WHERE service_name = "REPLACE-WITH-IMPALA-SERVICE-NAME" AND query_type = "DDL"
**Max value for Y range in DDL Run time defaults to 100ms, make sure it’s unset.
Note:The planning wait time is for searching and finding DML commands that are waiting for a metadata update. As one might wonder why DML waits for a metadata update isn’t it that metadata is read from cache making it a fairly quick operation? Well, the fact is that a DML statement can trigger a metadata update request under certain situations like service restart or “ INVALIDATE METADATA ” metadata operation run before the DML operation.
These are a few key metrics to identify and troubleshoot metadata specific issues.
The metadata-specific memory footprint can be tracked, using the following metrics. They, in turn, can help track metadata growth over time and understand variations that can help identify anti-patterns.
SELECT mem_rss WHERE entityName = "REPLACE-WITH-CATALOG_HOSTID" AND category = role
SELECT mem_rss WHERE serviceName="REPLACE-WITH-IMPALA-SERVICE-NAME" AND roleType=CATALOGSERVER AND category = ROLE
SELECT impala_catalogserver_jvm_heap_current_usage_bytes WHERE entityName = "<catalogd_host_id>" AND category = role
SELECT impala_catalogserver_jvm_heap_current_usage_bytes WHERE serviceName="REPLACE-WITH-IMPALA-SERVICE-NAME" AND roleType = "CATALOGSERVER" AND category = ROLE
SELECT statestore_total_topic_size_bytes WHERE roleType = STATESTORE AND category = ROLE
The actual metadata topic size after compaction is reflected by StatestoreD topic size metric. StatestoreD metric is very useful for identifying workload patterns. For example, an INVALIDATE METADATA or DROP STATS on a large partitioned table immediately triggers a drop in topic size and easily identifiable while RSS/heap may not have slightest indication of it.
CPU usage on CatalogD and StatestoreD usually stays low. However, CatalogD requires additional processing power to compact and serialize metadata. CatalogD CPU utilization of 20% or more can be concerning and slow down service operations.
SELECT cpu_user_rate / getHostFact(numCores, 1) * 100, cpu_system_rate / getHostFact(numCores, 1) * 100 WHERE roleType = "CATALOGSERVER"
Network throughput on the Statestore is a critical metric to monitor, as it is an important indicator of performance and quality of network connection. The Statestore / catalog network is very vulnerable to the above “anti-patterns.” That, in turn, has a snowball effect on the cluster. Having a large number of hosts act as coordinators can cause unnecessary network overhead, even timeout errors, as each of those hosts communicates with the Statestore daemon for metadata updates.
As Impala requires the propagation of the entire table metadata with each catalog update, frequent metadata operations like REFRESH on large tables increase the host network throughput. Occasional spikes due to service restarts or the impalad service going down can be ignored. It’s highly recommended to colocate the Catalog and Statestore on the same host to reduce network load. They should not be colocated them with other network intensive services such as Namenode.
SELECT total_bytes_receive_rate_across_network_interfaces, total_bytes_transmit_rate_across_network_interfaces WHERE hostName="<STATESTORE_HOST_NAME>"
Note: Catalog server and Statestore are usually co-located on the same node, but should they be on separate nodes, run the above query against the hostname for each.
CatalogD generally makes RPC calls to Namenode to fetch the file block location and file permission information. It is hard to track down the RPC call per service but generally a high RPC load can slow down Impala metadata fetches. As GC latency could drastically impact RPC, it would be prudent to monitor it.
SELECT get_block_locations_rate, get_file_info_rate, get_listing_rate WHERE roleType = "NAMENODE"
SELECT rpc_processing_time_avg_time, rpc_queue_time_avg_time WHERE roleType = "NAMENODE"
SELECT integral(jvm_gc_time_ms_rate) WHERE roleType=NAMENODE AND category = ROLE
Don’t forget to configure the above for both primary and secondary Name Node. Besides the foundational pillars of memory, processing and network consumption, that make up the building blocks of a distributed service such as Impala, checking dependent systems especially the NameNode and HiveMetastore can be helpful. However, detailed interpretation of those above metrics will be out of scope for this blog post. Although, there is no specific key metric to monitor HMS, an overall health check is recommended.
Below are some common scenarios to assess the aforementioned charts to infer possible mitigative measures.
Description:For a specific time period, a few metadata-dependent queries exhibit slowness, and you observe spikes in Catalog RSS memory, Catalog heap usage as well as Statestore topic size .
Description:Workload experiencing metadata propagation delays and you observe spikes StatestoreD/CatalogD Network throughput and slight or no change on Catalog RSS memory and heap usage
Actions:Avoid frequent refresh of large tables and heavy concurrency of DDL operations. Employalternate mechanismfor querying fast data. Ensure Statestored is not co-located with other network intensive services on your cluster. Use of dedicated coordinators can reduce the network load. Correlating with TCP retransmissions and dropped packet errors could help in determining if the performance issue is network-related.
Description:Inconsistent DDL run times and you observe Statestored topic size falls and rise up to the previous state.
Actions: INVALIDATE METADATA usage should be limited.
Description:Statestored topic size drops to the initial state and you observe all queries run after the drop is slow and eventually returns to normal once the topic size is restored
Actions:Avoid full service, and catalog and statestored restarts if not necessary. Avoid global or database-level INVALIDATE METADATA , restrict it to table level and perform itonly when necessary.
Description:Statestored topic size growing at a fast rate associated with high network throughput and Impala query performance deteriorating every day
Actions:Switch to a tool designed to handle rapidly ingested data like Kudu, HBase, etc.
Description:Queries exhibiting slowness and you observe high Catalog CPU usage (>20%)
Actions:Reduce DDL concurrency. Decrease overall memory footprint for catalog update.
|CatalogD RSS memory||CatalogD heap space usage||StatestoreD topic size(zoom in, if necessary)||Catalogd CPU usage||CatalogD/ StatestoreD host network throughput(host-level)||Namenode RPC workload summary(RPC calls per sec)|
|Safe range||< 80% of total process memory allocation||< 80% of total or sudden spike beyond 20 GB||–||< 20%||Few Mbs to 2 Gbs per second||30-40K/s|
|Metadata workload anti-pattern|
|Compute incremental stats on large wide partitioned tables||increases||increases||increases||low||med||n/a|
|Large # of databases, tables, partitions and small files growing at a fast rate||increases drastically||increases drastically||increases drastically||med||med||high|
|Frequently refreshing large tables(table or partition)||stable||stable||stable or increases slightly||low||very high||stable|
|Indiscriminate use of INVALIDATE METADATA commands||stable||stable||drops momentarily and rises again||med||very high||high|
|High number of concurrent DDL operations||increases drastically||increases drastically||increases drastically (> 2 GB)||high (> 20%)||very high||high|
|Catalog or statestore service restarts||drops drastically & rises||drops drastically and rises||drops drastically and rises||medium||high||stable|
|High number of coordinator nodes||n/a||n/a||n/a||n/a||very high||n/a|
In this post, we explored several key Cloudera Manager metrics which monitor and diagnose possible metadata specific performance issues in Apache Impala. When troubleshooting a complex distributed service such as Impala, it is important to establish solid foundation to monitor the critical components and their interaction within the architecture. Understanding the relationship between memory and processing power in the running processes and observing outlier behavior helps us forge a clearer path for diagnostics and drill down to a root cause. Although the Statestore and Catalog daemon are not critical to the actual uptime of the Impala service, they possess invaluable information to ensure the smooth functioning of the service.
Stay tuned for part 2!
Cloudera Manager only provides network throughput metric per host and not per service. Metric can be hard to interpret and correlate if we have other services hosted on the server
Estimating metadata size
Raw size= #tables * 5KB + #partitions * 2kb + cols * 100B + #files * 750B + #file_blocks * 300B
+ 400MB * cols * partitions (for incremental stats)
Compacted size= ~1/10th of Raw size
The metadata catalog update parallelism is limited by num_metadata_loading_threads , which defaults to 16, and lack of throttling mechanism for DDL, heavy concurrency can overload CatalogD and degrade overall performance.
As an alternative to Compute incremental, either switch to compute stats(full) with TABLESAMPLE (CDH 5.15 / Impala 2.12 and higher) or manual stats using alter table or provide external hints in queries using the tables to circumvent the impact of missing stats.
Mansi Maharana is a Senior Solutions Architect at Cloudera.Suhita Goswami is a Solutions Consultant at Cloudera.