High-CPU Thread Detection / Infinite Loop Detection System (ILDS)

Ever had your live JVM go to 100% CPU on one or more cores and knock your application offline?

It's often difficult to find the cause of the problem, and the usual solution is to restart the JVM as soon as possible. Hopefully you generated a stack trace before restarting, and now have a list of active threads that you can go through to try and find out what was running at the time.

This is a slow and painful process that affects uptime of your application and consumes considerable developer resources.

Automatic Thread Detection and JSP Code Identification

Since around 2002, the Metawerx monitoring system (ERAI) has been able to detect abnormally high CPU in a given Java VM and has automatically restarted the JVM causing the issue. This system is based on multiple thresholds to avoid false-positives and identify the main problems (such as infinite loops). The technology allows us to protect against a single JVM consuming excessive compute resources of a server node and affecting the performance of the JVM itself or other JVMs running on the same server cluster.

Our monitoring systems will attempt to identify the specific thread in your JVM which is consuming excessive CPU and report the full stack trace of the thread by email.

In addition, if the infinite loop is found in a JSP file, the source lines of the compiled JSP are also automatically identified.

Alert Email

The High CPU notification alert contains 4 sections:

  • High-CPU notification
  • Stack Trace of affected thread
  • JSP identification and source dump (when caused by a JSP file as in the case below)
  • System call summary over 1 second (for filing JRE bug reports)

Example Report

ALERT: Process usage has passed CPU threshold of 88% for 6 or more tests in a row and will be restarted.
Service Name: Tomcat 1072 (neale2012) Service
Process CPU Usage: 96% (100% = 1x3Ghz core @ 100%)
CPU High Level: 88%
CPU High Threshold: 6 times in a row
Times high in a row: 7
Total high since ERAI startup: 7
Condition: High CPU over an extended period

- If this level of CPU usage is expected for your application,
please contact support and ask for the thresholds to be changed.

Metawerx Analysis
==================

- This report shows information about the thread with the current highest CPU usage, if it is available.
- Please note that in some cases this may not be the thread that caused the high CPU alert.
- Where possible, the currently executing lines of Java will also be displayed.

Thread 7851: (state = BLOCKED)
 - org.apache.jsp.cpu_jsp._jspService(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=96, line=72 (Compiled frame)
 - org.apache.jasper.runtime.HttpJspBase.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=3, line=70 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=728 (Interpreted frame)
 - org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, boolean) @bci=440, line=432 (Interpreted frame)
 - org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse, java.lang.String, boolean) @bci=112, line=390 (Interpreted frame)
 - org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) @bci=345, line=334 (Interpreted frame)
 - javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=30, line=728 (Interpreted frame)
 - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=446, line=305 (Interpreted frame)
 - org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse) @bci=101, line=210 (Interpreted frame)
 ...

The full source for [cpu.jsp] is at [/org/apache/jsp/cpu_jsp.java]
Line [69] in method [_jspService] was executing at the time of the thread dump.
Source code lines [67-71]:

67:     double i = 0.1;
68:     int count = 0;
69:     while(count < 20) {
70:         i = i * 12315.12512;
71:     }

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 85.80    0.688601         162      4260       213 futex
 12.96    0.104006       52003         2         2 restart_syscall
  1.24    0.009956           8      1323           sched_yield
  0.00    0.000000           0       159           mprotect
------ ----------- ----------- --------- --------- ----------------
100.00    0.802563                  5744       215 total

How much does it cost?

This system is included automatically in our JVM monitoring, which is free of charge on all Private JVMs.

What are the benefits?

  • Increased application uptime
  • Increased performance for the affected JVM and all other processes on the same cluster
  • Decreased debugging time

Reliable 99.95% Uptime Guarantee
Helping to keep the net safe

 
Home | Java Hosting | News | Wiki | Privacy Policy | Login | Contact | Apply Now
(c) Metawerx 1997-2024 - All rights reserved