Android 7.0 ActivityManagerService(10) App的crash处理

在这篇博客中，我们来看一下AMS处理App crash时涉及到的主要流程。

一、设置异常处理器
在Android平台中，应用进程fork出来后会为虚拟机设置一个未截获异常处理器，
即在程序运行时，如果有任何一个线程抛出了未被截获的异常，
那么该异常最终会抛给未截获异常处理器处理。

我们首先看看Android N中设置异常处理器的这部分代码。
在ZygoteInit.java的runSelectLoop中：

private static void runSelectLoop(String abiList) throws MethodAndArgsCaller {
    ...........
    while (true) {
        ..........
        for (int i = pollFds.length - 1; i >= 0; --i) {
            ..........
            if (i == 0) {
                //zygote中的server socket收到消息后，建立起ZygoteConnection
                ZygoteConnection newPeer = acceptCommandPeer(abiList);
                peers.add(newPeer);
                fds.add(newPeer.getFileDesciptor());
            } else {
                //ZygoteConnection建立后，收到消息调用自己的runOnce函数
                boolean done = peers.get(i).runOnce();
                .............
            }
        }
    }
}

我们知道zygote启动后，会在自己的进程中定义一个server socket，专门接收创建进程的消息。
如上面的代码所示，收到创建进程的消息后，zygote会创建出ZygoteConnection，并调用其runOnce函数：

boolean runOnce() throws ZygoteInit.MethodAndArgsCaller {
    ...............
    try {
        ...........
        //fork出子进程
        pid = Zygote.forkAndSpecialize(.......);
    } catch (ErrnoException ex) {
        ............
    } catch (IllegalArgumentException ex) {
        ...........
    } catch (ZygoteSecurityException ex) {
        ..........
    }

    try {
        if (pid == 0) {
            ........
            //进程fork成功后，进行处理
            handleChildProc(parsedArgs, descriptors, childPipeFd, newStderr);
            ........
        } else {
            ...........
        }
    } finally {
        ..........
    }
}

我们跟进一下handleChildProc函数：

private void handleChildProc(......) {
    ............
    if (parsedArgs.invokeWith != null) {
        ..........
    } else {
        //进入到RuntimeInit中的zygoteInit函数
        RuntimeInit.zygoteInit(parsedArgs.targetSdkVersion,
                parsedArgs.remainingArgs, null /* classLoader */);
    }
}

顺着流程看一看RuntimeInit中的zygoteInit函数：

public static final void zygoteInit(int targetSdkVersion, String[] argv, ClassLoader classLoader)
        throws ZygoteInit.MethodAndArgsCaller {
    ............
    //跟进commonInit
    commonInit();
    ............
}

private static final void commonInit() {
    ...........
    /* set default handler; this applies to all threads in the VM */
    //到达目的地！
    Thread.setDefaultUncaughtExceptionHandler(new UncaughtHandler());
    ...........
}

从上面的代码可以看出，fork出进程后，将在进程commonInit的阶段设置异常处理器UncaughtHandler。

二、异常处理的流程
1、UncaughtHandler的异常处理
接下来我们看看UncaughtHandler如何处理未被捕获的异常。

private static class UncaughtHandler implements Thread.UncaughtExceptionHandler {
    public void uncaughtException(Thread t, Throwable e) {
        try {
            // Don't re-enter -- avoid infinite loops if crash-reporting crashes.
            if (mCrashing) return;
            mCrashing = true;

            if (mApplicationObject == null) {
                Clog_e(TAG, "*** FATAL EXCEPTION IN SYSTEM PROCESS: " + t.getName(), e);
            } else {
                //打印进程的crash信息
                .............
            }
            .............
            // Bring up crash dialog, wait for it to be dismissed
            //调用AMS的接口，进行处理
            ActivityManagerNative.getDefault().handleApplicationCrash(
                    mApplicationObject, new ApplicationErrorReport.CrashInfo(e));
        } catch (Throwable t2) {
            if (t2 instanceof DeadObjectException) {
                // System process is dead; ignore
            } else {
                try {
                    Clog_e(TAG, "Error reporting crash", t2);
                } catch (Throwable t3) {
                    // Even Clog_e() fails!  Oh well.
                }
            }
        } finally {
            // Try everything to make sure this process goes away.
            //crash的最后，会杀死进程
            Process.killProcess(Process.myPid());
            //并exit
            System.exit(10);
        }
    }
}

从代码来看，UncaughtHandler对异常的处理流程比较清晰，基本上就是：
1、记录log信息；
2、调用AMS的接口进行一些处理；
3、杀死出现crash的进程。

其中比较重要的应该是AMS处理crash的流程，接下来我们跟进一下这部分流程的代码。

2、AMS的异常处理

public void handleApplicationCrash(IBinder app, ApplicationErrorReport.CrashInfo crashInfo) {
    //得到crash app对应的信息
    ProcessRecord r = findAppProcess(app, "Crash");
    final String processName = app == null ? "system_server"
            : (r == null ? "unknown" : r.processName);

    //调用handleApplicationCrashInner进一步处理
    handleApplicationCrashInner("crash", r, processName, crashInfo);
}

void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,
        ApplicationErrorReport.CrashInfo crashInfo) {
    ...............
    //Write a description of an error (crash, WTF, ANR) to the drop box.
    //记录信息到drop box
    addErrorToDropBox(eventType, r, processName, null, null, null, null, null, crashInfo);

    //调用内部类AppErrors的crashApplication函数
    mAppErrors.crashApplication(r, crashInfo);
}

我们跟进一下AppErrors类中的crashApplication函数：

/**
* Bring up the "unexpected error" dialog box for a crashing app.
* Deal with edge cases (intercepts from instrumented applications,
* ActivityController, error intent receivers, that sort of thing).
* /
void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
    final long origId = Binder.clearCallingIdentity();
    try {
        //实际的处理函数为crashApplicationInner
        crashApplicationInner(r, crashInfo);
    } finally {
        Binder.restoreCallingIdentity(origId);
    }
}

此处实际的处理函数为crashApplicationInner。

void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {
    long timeMillis = System.currentTimeMillis();

    //从应用进程传递过来的crashInfo中获取相关的信息
    String shortMsg = crashInfo.exceptionClassName;
    String longMsg = crashInfo.exceptionMessage;
    String stackTrace = crashInfo.stackTrace;
    ................

    AppErrorResult result = new AppErrorResult();
    TaskRecord task;
    synchronized (mService) {
        /**
        * If crash is handled by instance of {@link android.app.IActivityController},
        * finish now and don't show the app error dialog.
        */
        //通知观察者处理crash
        //如果存在观察者且能够处理crash，那么不显示error dialog
        //例如在进行Monkey Test，那么可设置检测到crash后，就停止测试等
        if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,
                timeMillis)) {
            return;
        }

        /**
        * If this process was running instrumentation, finish now - it will be handled in
        * {@link ActivityManagerService#handleAppDiedLocked}.
        */
        if (r != null && r.instrumentationClass != null) {
            return;
        }

        // Log crash in battery stats.
        if (r != null) {
            mService.mBatteryStatsService.noteProcessCrash(r.processName, r.uid);
        }

        AppErrorDialog.Data data = new AppErrorDialog.Data();
        data.result = result;
        data.proc = r;

        // If we can't identify the process or it's already exceeded its crash quota,
        // quit right away without showing a crash dialog.
        // 调用makeAppCrashingLocked进行处理，如果返回false，则无需进行后续处理
        if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) {
            return;
        }

        //发送SHOW_ERROR_UI_MSG给mUiHandler，将弹出一个对话框，提示用户某进程crash
        //用户可以选择"退出"或"退出并报告"等
        //一般的厂商应该都定制了这个界面
        Message msg = Message.obtain();
        msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;

        task = data.task;
        msg.obj = data;
        mService.mUiHandler.sendMessage(msg);
    }

    //调用AppErrorResult的get函数，该函数是阻塞的，直到用户处理了对话框为止
    //注意此处涉及了两个线程的工作
    //crashApplicationInner函数工作在Binder调用所在的线程
    //对话框工作于AMS的Ui线程
    int res = result.get();

    Intent appErrorIntent = null;

    //以下开始根据对话框中用户的选择，进行对应的处理
    ...................
    //长时间未点击对话框或者点击取消，那么相当于选择强行停止crash进程
    if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {
        res = AppErrorDialog.FORCE_QUIT;
    }

    //根据res的值进行相应的处理
    synchronized (mService) {
        //选择不再提示错误
        if (res == AppErrorDialog.MUTE) {
            //将进程名加入到AMS的mAppsNotReportingCrashes表中
            stopReportingCrashesLocked(r);
        }

        //选择了重新启动
        if (res == AppErrorDialog.RESTART) {
            mService.removeProcessLocked(r, false, true, "crash");
            if (task != null) {
                try {
                    //尝试重启进程
                    mService.startActivityFromRecents(task.taskId,
                            ActivityOptions.makeBasic().toBundle());
                } catch (IllegalArgumentException e) {
                    // Hmm, that didn't work, app might have crashed before creating a
                    // recents entry. Let's see if we have a safe-to-restart intent.
                    if (task.intent.getCategories().contains(
                            Intent.CATEGORY_LAUNCHER)) {
                        //换一种方式重启
                        mService.startActivityInPackage(............);
                    }
                }
            }
        }

        //选择强行停止
        if (res == AppErrorDialog.FORCE_QUIT) {
            long orig = Binder.clearCallingIdentity();
            try {
                // Kill it with fire!
                //handleAppCrashLocked主要是结束activity，并更新oom_adj
                mService.mStackSupervisor.handleAppCrashLocked(r);

                if (!r.persistent) {
                    //如果不是常驻应用，则在此处kill掉
                    mService.removeProcessLocked(r, false, false, "crash");
                    mService.mStackSupervisor.resumeFocusedStackTopActivityLocked();
                }
            } finally {
                Binder.restoreCallingIdentity(orig);
            }
        }

        //选择强制停止并报告
        if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {
            //该函数中将生成错误信息，并构造一个Intent用于拉起报告界面
            appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);
        }

        if (r != null && !r.isolated && res != AppErrorDialog.RESTART) {
            // XXX Can't keep track of crash time for isolated processes,
            // since they don't have a persistent identity.
            //记录crash时间
            mProcessCrashTimes.put(r.info.processName, r.uid,
                    SystemClock.uptimeMillis());
        }
    }

    if (appErrorIntent != null) {
        try {
            //如果选择了强制停止并报告，那么此时就会拉起报告界面
            mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));
        } catch (ActivityNotFoundException e) {
            ..............
        }
    }
}

整体来看AMS处理crash的流程还是相当清晰的：
1、首先记录crash相关的信息到drop box；
2、如果存在可以处理App crash的ActivityController，那么将crash交给它处理；
否则，弹出crash对话框，然用户选择后续操作。
3、根据用户的选择，AMS可以进行重启应用、强行停止应用或拉起报告界面等操作。

不过上述流程中，在拉起对话框前，先调用了makeAppCrashingLocked函数。
若这个函数返回false，那么后续的流程就不会继续进行。
我们来看看这个函数的具体用途。

private boolean makeAppCrashingLocked(ProcessRecord app,
        String shortMsg, String longMsg, String stackTrace, AppErrorDialog.Data data) {
    app.crashing = true;

    //就是创建一个对象，其中包含了所有的错误信息
    app.crashingReport = generateProcessError(app,
            ActivityManager.ProcessErrorStateInfo.CRASHED, null, shortMsg, longMsg, stackTrace);

    //前面的代码已经提到过，系统可以通过Intent拉起一个crash报告界面
    //startAppProblemLocked函数，就是在系统中找到这个报告界面对应的ComponentName
    //此外，如果crash应用正好在处理有序广播，那么为了不影响后续广播接受器的处理，
    //startAppProblemLocked会停止crash应用对广播的处理流程，
    //即后续的广播接受器可以跳过crash应用，直接开始处理有序广播
    startAppProblemLocked(app);

    //停止“冻结”屏幕
    app.stopFreezingAllLocked();

    //进行一些后续的处理
    //从代码来看，如果应用不是在1min内连续crash，该函数都会返回true
    return handleAppCrashLocked(app, "force-crash" /*reason*/, shortMsg, longMsg, stackTrace,
            data);
}

根据上面的代码，可以看出makeAppCrashingLocked函数最主要的工作主要有两个：
1、查找crash报告界面对应的componentName；
2、避免进程短时间内连续crash，导致频繁拉起对话框。

三、后续的清理工作
根据前面的流程，我们知道当进程crash后，最终将被kill掉，
此时AMS还需要完成后续的清理工作。

我们先来回忆一下进程启动后，注册到AMS的部分流程：

//进程启动后，对应的ActivityThread会attach到AMS上
private final boolean attachApplicationLocked(IApplicationThread thread,
        int pid) {
    ............
    ProcessRecord app;
    if (pid != MY_PID && pid >= 0) {
        synchronized (mPidsSelfLocked) {
            app = mPidsSelfLocked.get(pid);
        }
    } else {
        app = null;
    }
    ............
    final String processName = app.processName;
    try {
        //生成了一个“讣告”接收者
        AppDeathRecipient adr = new AppDeathRecipient(
                app, pid, thread);
        thread.asBinder().linkToDeath(adr, 0);
        app.deathRecipient = adr;
    } catch (RemoteException e) {
        app.resetPackageList(mProcessStats);
        startProcessLocked(app, "link fail", processName);
        return false;
    }
    ................
}

从上面的代码可以看出，当进程注册到AMS时，AMS注册了一个“讣告”接收者注册到进程中。
因此，当crash进程被kill后，AppDeathRecipient中的binderDied函数将被回调：

@Override
public void binderDied() {
    ..........
    synchronized(ActivityManagerService.this) {
        appDiedLocked(mApp, mPid, mAppThread, true);
    }
}

根据代码可知，接收到进程“死亡”的通知后，最后还是调用AMS的appDiedLocked函数进行处理：

final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,
        boolean fromBinderDied) {
    // First check if this ProcessRecord is actually active for the pid.
    synchronized (mPidsSelfLocked) {
        ProcessRecord curProc = mPidsSelfLocked.get(pid);
        if (curProc != app) {
            ...........
            return;
        }
    }
    .............
    if (!app.killed) {
        if (!fromBinderDied) {
            Process.killProcessQuiet(pid);
        }
        killProcessGroup(app.uid, pid);
        app.killed = true;
    }
    //以上都是一些保证健壮性的代码

    if (app.pid == pid && app.thread != null &&
            app.thread.asBinder() == thread.asBinder()) {
        //进程是正常启动的，非测试启动，那么需要内存调整
        boolean doLowMem = app.instrumentationClass == null;
        boolean doOomAdj = doLowMem;

        if (!app.killedByAm) {
            ............
            mAllowLowerMemLevel = true;
        } else {
            mAllowLowerMemLevel = false;
            doLowMem = false;
        }
        ..............
        //handleAppDiedLocked进行实际的工作
        handleAppDiedLocked(app, false, true);

        if (doOomAdj) {
            //重新更新进程的oom_adj
            updateOomAdjLocked();
        }
        if (doLowMem) {
            //在必要时，触发系统中的进程做内存回收
            doLowMemReportIfNeededLocked(app);
        }
    }.........
    ..........
}

appDiedLocked函数中比较重要的是handleAppDiedLocked函数：

private final void handleAppDiedLocked(ProcessRecord app,
        boolean restarting, boolean allowRestart) {
    int pid = app.pid;
    //进行进程中service、ContentProvider、BroadcastReceiver等的收尾工作
    //这个函数虽然很长，但实际的功能还是很清晰的，这里不作进一步展开
    //比较重要的是：1、对于crash进程中的Bounded Service而言，会清理掉service与客户端之间的联系；
    //此外若service的客户端重要性过低，还会被直接kill掉
    //2、清理ContentProvider时，在removeDyingProviderLocked函数中，可能清理掉其客户端进程(对于stable contentProvider而言)
    boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1);
    if (!kept && !restarting) {
        //不再保留和重启时，从LRU表中移除
        removeLruProcessLocked(app);
        if (pid > 0) {
            ProcessList.remove(pid);
        }
    }

    ..................

    // Remove this application's activities from active lists.
    //进行Activity相关的收尾工作
    boolean hasVisibleActivities = mStackSupervisor.handleAppDiedLocked(app);

    app.activities.clear();

    if (app.instrumentationClass != null) {
        ..............
    }

    if (!restarting && hasVisibleActivities
            && !mStackSupervisor.resumeFocusedStackTopActivityLocked()) {
        // If there was nothing to resume, and we are not already restarting this process, but
        // there is a visible activity that is hosted by the process...  then make sure all
        // visible activities are running, taking care of restarting this process.
        // 从注释来看，若当前只有crash进程中存在可视Activity，那么AMS还是会试图重启该进程
        mStackSupervisor.ensureActivitiesVisibleLocked(null, 0, !PRESERVE_WINDOWS);
    }
}

上述代码中cleanUpApplicationRecordLocked函数，在此不做深入分析。
其中唯一比较麻烦的就是Bounded Service和ContentProvider的清理，
因为这两种组件全部要考虑其客户端进程。

四、总结

整体来讲，Android中进程crash后的处理流程基本上如上图所示。
这个流程相对来说是比较简单的，唯一麻烦点的地方可能是进程结束后，
AMS进行的清理工作。

作者：Gaugamela 发表于2017/1/14 15:17:49 原文链接

阅读：18 评论：0 查看评论

Android 7.0 ActivityManagerService(10) App的crash处理

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本