从源码角度看 CPU 相关日志

简介

(所有源码与注释都能够在YogiAi/CPULoad找到)

安卓系统中,普通开发者常常遇到的是ANR(Application Not Responding)问题,即应用主线程没有相应。根本的原因在于安卓框架特殊设定,将专门做UI相关的、用户能够敏锐察觉到的操作放在了一个专门的线程中,即主线程。一旦这个线程在规定的时间内没有完成某个任务,如广播onReceive,Activity跳转,或者bindApplication,那么框架就会产生所谓的ANR,弹出对话框,让用户选择继续等待或者杀死没有完成任务的应用进程。

老生常谈的说法是开发者在主线程中执行了耗时操作导致了任务执行时间太久,这样的问题通常很好定位与解决,往往打个bugreport这个ANR的root cause就原形毕露了,再不济我们也能够通过LOG定位到耗时点。

今天我们讲的是另一种常见的情况,这种情况往往是由于CPU硬件设备的落后、底层CPU调控策略不当导致的。这种问题很恼人,明明按照常理绝对不会出现耗时的地方因为它也能够出现ANR和卡顿,给用户带来极其糟糕的体验。

这篇文章我将先给出一个示例,通过ANR日志反应当前的系统状况,然后从源码角度看安卓framework是如何打出这些LOG的。

示例

以下是我截取的一段LOG,系统频繁的打出BIND_APPLICATION的耗时日志,这些日志出现的十分频繁,平均每两秒就出现一次,调用bindApplication有时甚至能够超过10s+,

这种情况是十分严重的,当系统要启动一个前台广播时,就需要10s内完成这个任务,否则就会出现ANR。如果启动的这个前台广播要运行在一个没有启动的进程中,那么在启动广播之前就要开启一个进程,然后调用bindApplication以触发Application.onCreate。这期间会先将BIND_APPLICATION、RECEIVER依次enqueue到ActivityThread$H主线程队列中,如果BIND_APPLICATION的处理时间过长,将会间接的导致RECEIER的任务没有得到处理,最终导致ANR。同样的原理,这种情况甚至会导致Input的任务没有及时得到处理,最终导致用户可察觉的卡顿。

  • 08-28 20:35:58.737 4635 4635 I tag_activity_manager: [0,com.android.providers.calendar,110,3120]
  • 08-28 20:35:58.757 4653 4653 I tag_activity_manager: [0,com.xiaomi.metoknlp,110,3073]
  • 08-28 20:35:58.863 4601 4601 I tag_activity_manager: [0,android.process.acore,110,3392]
  • 08-28 20:36:00.320 5040 5040 I tag_activity_manager: [0,com.lbe.security.miui,110,3045]
  • 08-28 20:36:00.911 4233 4233 I tag_activity_manager: [0,com.miui.securitycenter.remote,110,8653]
  • 08-28 20:36:03.254 4808 4808 I tag_activity_manager: [0,com.android.phone,110,7059]
  • 08-28 20:36:05.538 5246 5246 I tag_activity_manager: [0,com.xiaomi.market,110,3406]
  • 08-28 20:36:09.006 5153 5153 I tag_activity_manager: [0,com.miui.klo.bugreport,110,10166]
  • 08-28 20:36:09.070 5118 5118 I tag_activity_manager: [0,com.android.settings,110,10680]
  • 08-28 20:36:11.259 5570 5570 I tag_activity_manager: [0,com.miui.core,110,4895]

ActivityManagerService通过Binder call调用到应用的ActivityThread方法,然后将任务enqueue到处理主线程队列中

  • // bind call 调用到这个方法
  • public final void bindApplication(String processName, ApplicationInfo appInfo,
  • List<ProviderInfo> providers, ComponentName instrumentationName,
  • ProfilerInfo profilerInfo, Bundle instrumentationArgs,
  • IInstrumentationWatcher instrumentationWatcher,
  • IUiAutomationConnection instrumentationUiConnection, int debugMode,
  • boolean enableOpenGlTrace, boolean isRestrictedBackupMode, boolean persistent,
  • Configuration config, CompatibilityInfo compatInfo, Map<String, IBinder> services,
  • Bundle coreSettings) {
  • ...
  • // 封装AppBindData对象
  • AppBindData data = new AppBindData();
  • ...
  • sendMessage(H.BIND_APPLICATION, data);
  • }
  • ...
  • private class H extends Handler {
  • ...
  • public static final int BIND_APPLICATION = 110;
  • ...
  • public void handleMessage(Message msg) {
  • if (DEBUG_MESSAGES) Slog.v(TAG, ">>> handling: " + codeToString(msg.what));
  • switch (msg.what) {
  • ...
  • case BIND_APPLICATION:
  • // 在主线程队列中进程执行
  • Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "bindApplication");
  • AppBindData data = (AppBindData)msg.obj;
  • handleBindApplication(data);
  • Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
  • break;
  • ...
  • }
  • ...

上面的例子中,系统很快不出所料的出现了ANR,并且这个问题不是由于在APP中做了耗时操作,而是因为系统CPU负载过高导致的。以下贴出CPU负载日志:

  • // bindApplication卡了主线程57s+
  • Running message is { when=-57s68ms what=110 obj=AppBindData{appInfo=ApplicationInfo{be09873 com.android.settings}} target=android.app.ActivityThread$H planTime=1504652580856 dispatchTime=1504652580897 finishTime=0 }
  • Message 0: { when=-57s35ms what=140 arg1=5 target=android.app.ActivityThread$H planTime=1504652580890 dispatchTime=0 finishTime=0 }
  • // 分析了一下发生异常时的系统状态
  • 08-28 20:36:13.000 2692 2759 E ActivityManager: ANR in com.android.settings
  • 08-28 20:36:13.000 2692 2759 E ActivityManager: PID: 5118
  • // 从左到右,分别是最近1分钟,5分钟,15分钟的CPU负载,超过11就是负载过度
  • 08-28 20:36:13.000 2692 2759 E ActivityManager: Load: 20.12 / 13.05 / 6.96
  • // 发生ANR时,CPU的使用情况
  • 08-28 20:36:13.000 2692 2759 E ActivityManager: CPU usage from 2967ms to -4440ms ago:  
  • // systemserver过于繁忙
  • 08-28 20:36:13.000 2692 2759 E ActivityManager: 73% 2692/system_server: 57% user + 15% kernel / faults: 16217 minor 11 major
  • 08-28 20:36:13.000 2692 2759 E ActivityManager: 61% 4840/com.miui.home: 55% user + 5.4% kernel / faults: 26648 minor 17 major
  • 08-28 20:36:13.000 2692 2759 E ActivityManager: 19% 330/mediaserver: 17% user + 2.1% kernel / faults: 5180 minor 18 major
  • 08-28 20:36:13.000 2692 2759 E ActivityManager: 18% 4096/com.android.systemui: 14% user + 4% kernel / faults: 12965 minor 30 major
  • ...

当然,证明系统导致ANR不仅仅需要CPU Load日志,同时也需要排除当前应用是否有耗时操作、耗时binder call的调用、是否等待锁等等情况。排除之后,就可以判断确认当前问题是由于CPU资源稀缺,导致应用执行bindApplication没有拿到足够的时间片,导致任务没有及时的完成,最终间接的导致队列排名靠后的广播或服务ANR。

深入源码

AMS.appNotResponding

  • private static final String TAG = TAG_WITH_CLASS_NAME ? "ActivityManagerService" : TAG_AM;
  • final void appNotResponding(ProcessRecord app, ActivityRecord activity,
  • ActivityRecord parent, boolean aboveSystem, final String annotation) {
  • ...
  • long anrTime = SystemClock.uptimeMillis();
  • // 如果开启了CPU监听,将会先更新CPU的使用情况
  • if (MONITOR_CPU_USAGE) {
  • updateCpuStatsNow();
  • }
  • ...
  • // 新建一个用户跟踪CPU使用情况的对象
  • // 将会打印所以线程的CPU使用状况
  • final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
  • ...
  • String cpuInfo = null;
  • if (MONITOR_CPU_USAGE) {
  • updateCpuStatsNow();
  • synchronized (mProcessCpuTracker) {
  • cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);
  • }
  • info.append(processCpuTracker.printCurrentLoad());
  • info.append(cpuInfo);
  • }
  • info.append(processCpuTracker.printCurrentState(anrTime));
  • // 直接将LOG打印出来
  • Slog.e(TAG, info.toString());
  • ...
  • }

AMS.updateCpuStatsNow

  • void updateCpuStatsNow() {
  • synchronized (mProcessCpuTracker) {
  • mProcessCpuMutexFree.set(false);
  • final long now = SystemClock.uptimeMillis();
  • boolean haveNewCpuStats = false;
  • // 最少每5秒更新CPU的数据
  • if (MONITOR_CPU_USAGE &&
  • mLastCpuTime.get() < (now - MONITOR_CPU_MIN_TIME)) {
  • mLastCpuTime.set(now);
  • mProcessCpuTracker.update();
  • if (mProcessCpuTracker.hasGoodLastStats()) {
  • haveNewCpuStats = true;
  • //Slog.i(TAG, mProcessCpu.printCurrentState());
  • //Slog.i(TAG, "Total CPU usage: "
  • // + mProcessCpu.getTotalCpuPercent() + "%");
  • // Slog the cpu usage if the property is set.
  • if ("true".equals(SystemProperties.get("events.cpu"))) {
  • // 用户态时间
  • int user = mProcessCpuTracker.getLastUserTime();
  • // 系统态时间
  • int system = mProcessCpuTracker.getLastSystemTime();
  • // IO等待时间
  • int iowait = mProcessCpuTracker.getLastIoWaitTime();
  • // 硬中断时间
  • int irq = mProcessCpuTracker.getLastIrqTime();
  • // 软中断时间
  • int softIrq = mProcessCpuTracker.getLastSoftIrqTime();
  • // 闲置时间
  • int idle = mProcessCpuTracker.getLastIdleTime();
  • int total = user + system + iowait + irq + softIrq + idle;
  • if (total == 0) total = 1;
  • // 输出百分比
  • EventLog.writeEvent(EventLogTags.CPU,
  • ((user + system + iowait + irq + softIrq) * 100) / total,
  • (user * 100) / total,
  • (system * 100) / total,
  • (iowait * 100) / total,
  • (irq * 100) / total,
  • (softIrq * 100) / total);
  • }
  • }
  • }
  • // 各类CPU时间归类与更新
  • final BatteryStatsImpl bstats = mBatteryStatsService.getActiveStatistics();
  • synchronized (bstats) {
  • synchronized (mPidsSelfLocked) {
  • if (haveNewCpuStats) {
  • if (bstats.startAddingCpuLocked()) {
  • int totalUTime = 0;
  • int totalSTime = 0;
  • // 遍历所有ProcessCpuTracker
  • final int N = mProcessCpuTracker.countStats();
  • for (int i = 0; i < N; i++) {
  • ProcessCpuTracker.Stats st = mProcessCpuTracker.getStats(i);
  • if (!st.working) {
  • continue;
  • }
  • // ProcessRecord的CPU时间更新
  • ProcessRecord pr = mPidsSelfLocked.get(st.pid);
  • totalUTime += st.rel_utime;
  • totalSTime += st.rel_stime;
  • if (pr != null) {
  • BatteryStatsImpl.Uid.Proc ps = pr.curProcBatteryStats;
  • if (ps == null || !ps.isActive()) {
  • pr.curProcBatteryStats = ps = bstats.getProcessStatsLocked(
  • pr.info.uid, pr.processName);
  • }
  • ps.addCpuTimeLocked(st.rel_utime, st.rel_stime);
  • pr.curCpuTime += st.rel_utime + st.rel_stime;
  • } else {
  • BatteryStatsImpl.Uid.Proc ps = st.batteryStats;
  • if (ps == null || !ps.isActive()) {
  • st.batteryStats = ps = bstats.getProcessStatsLocked(
  • bstats.mapUid(st.uid), st.name);
  • }
  • ps.addCpuTimeLocked(st.rel_utime, st.rel_stime);
  • }
  • }
  • // 将数据更新到BatteryStatsImpl
  • final int userTime = mProcessCpuTracker.getLastUserTime();
  • final int systemTime = mProcessCpuTracker.getLastSystemTime();
  • final int iowaitTime = mProcessCpuTracker.getLastIoWaitTime();
  • final int irqTime = mProcessCpuTracker.getLastIrqTime();
  • final int softIrqTime = mProcessCpuTracker.getLastSoftIrqTime();
  • final int idleTime = mProcessCpuTracker.getLastIdleTime();
  • bstats.finishAddingCpuLocked(totalUTime, totalSTime, userTime,
  • systemTime, iowaitTime, irqTime, softIrqTime, idleTime);
  • }
  • }
  • }
  • // 每30分钟写入电池数据
  • if (mLastWriteTime < (now - BATTERY_STATS_TIME)) {
  • mLastWriteTime = now;
  • mBatteryStatsService.scheduleWriteToDisk();
  • }
  • }
  • }
  • }

BatteryStatsImpl.finishAddingCpuLocked

  • public void finishAddingCpuLocked(int totalUTime, int totalSTime, int statUserTime,
  • int statSystemTime, int statIOWaitTime, int statIrqTime,
  • int statSoftIrqTime, int statIdleTime) {
  • if (DEBUG) Slog.d(TAG, "Adding cpu: tuser=" + totalUTime + " tsys=" + totalSTime
  • + " user=" + statUserTime + " sys=" + statSystemTime
  • + " io=" + statIOWaitTime + " irq=" + statIrqTime
  • + " sirq=" + statSoftIrqTime + " idle=" + statIdleTime);
  • mCurStepCpuUserTime += totalUTime;
  • mCurStepCpuSystemTime += totalSTime;
  • mCurStepStatUserTime += statUserTime;
  • mCurStepStatSystemTime += statSystemTime;
  • mCurStepStatIOWaitTime += statIOWaitTime;
  • mCurStepStatIrqTime += statIrqTime;
  • mCurStepStatSoftIrqTime += statSoftIrqTime;
  • mCurStepStatIdleTime += statIdleTime;
  • }

ProcessCpuTracker.update

update方法主要是读取/proc/stat与/proc/loadavg文件的数据来更新当前的CPU时间,其中CPU负载接口onLoadChange在LoadAverageService中有使用,用于展示一个动态的View在界面,便于查看CPU的实时数据。

具体关于这两个文件,我会在最后列出这两个节点文件的实例数据并作出简单的解析。

关于/proc目录,它其实是一个虚拟目录,其子目录与子文件也都是虚拟的,并不占用实际的存储空间,它允许动态的读取出系统的实时信息。

```java
public void update() {
if (DEBUG) Slog.v(TAG, "Update: " + this);

  • final long nowUptime = SystemClock.uptimeMillis();
  • final long nowRealtime = SystemClock.elapsedRealtime();
  • // 复用size=7的LONG数组
  • final long[] sysCpu = mSystemCpuData;
  • // 读取/proc/stat文件
  • if (Process.readProcFile("/proc/stat", SYSTEM_CPU_FORMAT,
  • null, sysCpu, null)) {
  • // Total user time is user + nice time.
  • final long usertime = (sysCpu[0]+sysCpu[1]) * mJiffyMillis;
  • // Total system time is simply system time.
  • final long systemtime = sysCpu[2] * mJiffyMillis;
  • // Total idle time is simply idle time.
  • final long idletime = sysCpu[3] * mJiffyMillis;
  • // Total irq time is iowait + irq + softirq time.
  • final long iowaittime = sysCpu[4] * mJiffyMillis;
  • final long irqtime = sysCpu[5] * mJiffyMillis;
  • final long softirqtime = sysCpu[6] * mJiffyMillis;
  • // This code is trying to avoid issues with idle time going backwards,
  • // but currently it gets into situations where it triggers most of the time. :(
  • if (true || (usertime >= mBaseUserTime && systemtime >= mBaseSystemTime
  • && iowaittime >= mBaseIoWaitTime && irqtime >= mBaseIrqTime
  • && softirqtime >= mBaseSoftIrqTime && idletime >= mBaseIdleTime)) {
  • mRelUserTime = (int)(usertime - mBaseUserTime);
  • mRelSystemTime = (int)(systemtime - mBaseSystemTime);
  • mRelIoWaitTime = (int)(iowaittime - mBaseIoWaitTime);
  • mRelIrqTime = (int)(irqtime - mBaseIrqTime);
  • mRelSoftIrqTime = (int)(softirqtime - mBaseSoftIrqTime);
  • mRelIdleTime = (int)(idletime - mBaseIdleTime);
  • mRelStatsAreGood = true;

...

  • mBaseUserTime = usertime;
  • mBaseSystemTime = systemtime;
  • mBaseIoWaitTime = iowaittime;
  • mBaseIrqTime = irqtime;
  • mBaseSoftIrqTime = softirqtime;
  • mBaseIdleTime = idletime;
  • } else {
  • mRelUserTime = 0;
  • mRelSystemTime = 0;
  • mRelIoWaitTime = 0;
  • mRelIrqTime = 0;
  • mRelSoftIrqTime = 0;
  • mRelIdleTime = 0;
  • mRelStatsAreGood = false;
  • Slog.w(TAG, "/proc/stats has gone backwards; skipping CPU update");
  • return;
  • }
  • }
  • mLastSampleTime = mCurrentSampleTime;
  • mCurrentSampleTime = nowUptime;
  • mLastSampleRealTime = mCurrentSampleRealTime;
  • mCurrentSampleRealTime = nowRealtime;
  • final StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskReads();
  • try {
  • // 收集/proc文件节点信息
  • mCurPids = collectStats("/proc", -1, mFirst, mCurPids, mProcStats);
  • } finally {
  • StrictMode.setThreadPolicy(savedPolicy);
  • }
  • final float[] loadAverages = mLoadAverageData;
  • // 读取/proc/loadavg文件信息
  • // 即最新1分钟,5分钟,15分钟的CPU负载
  • if (Process.readProcFile("/proc/loadavg", LOAD_AVERAGE_FORMAT,
top Created with Sketch.