ART 虚拟机之 Trace 原理

分析Art虚拟机的trace原理,相关源码都位于/art/runtime目录:

  • /art/runtime/
  • - signal_catcher.cc
  • - runtime.cc
  • - intern_table.cc
  • - thread_list.cc
  • - java_vm_ext.cc
  • - class_linker.cc
  • - gc/heap.cc

一、概述

Android 6.0系统采用的art虚拟机,所有的Java进程都运行在art之上,当应用发生ANR(Application Not Response,其中最终的一个环节便是向目标进程发送信号SIGNAL_QUIT, 传统的linux则是终止程序并输出core;而对于Android进程来说当收到SIGQUIT时,Java层面的进程都是跑在虚拟机之上的,ART虚拟机会捕获该信号,并输出相应的traces信息保存到目录/data/anr/traces.txt。

当然也可以通过一条命令来获取指定进程的traces信息,例如输出pid=888的进程信息:

  • adb shell kill -3 888 //可指定进程pid

执行完该命令后traces信息的结果保存到文件/data/anr/traces.txt,如下:

  • //[见小节2.2]
  • ----- pid 888 at 2016-11-11 22:22:22 -----
  • Cmd line: system_server
  • ABI: arm
  • Build type: optimized
  • //[见小节3.1]
  • Zygote loaded classes=4113 post zygote classes=3239
  • //[见小节3.2]
  • Intern table: 57550 strong; 9315 weak
  • //共加载16动态库 [见小节3.3]
  • JNI: CheckJNI is off; globals=2418 (plus 115 weak)
  • Libraries: /system/lib/libandroid.so /system/lib/libandroid_servers.so /system/lib/libaudioeffect_jni.so /system/lib/libcompiler_rt.so /system/lib/libjavacrypto.so /system/lib/libjnigraphics.so /system/lib/libmedia_jni.so /system/lib/librs_jni.so /system/lib/libsechook.so /system/lib/libshell_jni.so /system/lib/libsoundpool.so /system/lib/libwebviewchromium_loader.so /system/lib/libwifi-service.so /vendor/lib/libalarmservice_jni.so /vendor/lib/liblocationservice.so libjavacore.so (16)
  • //已分配堆内存大小40MB,其中29M已用,总分配207772个对象 [见小节3.4]
  • Heap: 27% free, 29MB/40MB; 307772 objects
  • ... //省略GC相关信息
  • //当前进程总99个线程[见小节3.5]
  • DALVIK THREADS (99):
  • //主线程调用栈[见小节3.6]
  • "main" prio=5 tid=1 Native
  • | group="main" sCount=1 dsCount=0 obj=0x75bd9fb0 self=0x5573d4f770
  • | sysTid=12078 nice=-2 cgrp=default sched=0/0 handle=0x7fa75fafe8
  • | state=S schedstat=( 5907843636 827600677 5112 ) utm=453 stm=137 core=0 HZ=100
  • | stack=0x7fd64ef000-0x7fd64f1000 stackSize=8MB
  • | held mutexes=
  • //内核栈[见小节3.6.2]
  • kernel: __switch_to+0x70/0x7c
  • kernel: SyS_epoll_wait+0x2a0/0x324
  • kernel: SyS_epoll_pwait+0xa4/0x120
  • kernel: cpu_switch_to+0x48/0x4c
  • native: #00 pc 0000000000069be4 /system/lib64/libc.so (__epoll_pwait+8)
  • native: #01 pc 000000000001cca4 /system/lib64/libc.so (epoll_pwait+32)
  • native: #02 pc 000000000001ad74 /system/lib64/libutils.so (_ZN7android6Looper9pollInnerEi+144)
  • native: #03 pc 000000000001b154 /system/lib64/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+80)
  • native: #04 pc 00000000000d4bc0 /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+48)
  • native: #05 pc 000000000000082c /data/dalvik-cache/arm64/system@framework@boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+144)
  • at android.os.MessageQueue.nativePollOnce(Native method)
  • at android.os.MessageQueue.next(MessageQueue.java:323)
  • at android.os.Looper.loop(Looper.java:135)
  • at com.android.server.SystemServer.run(SystemServer.java:290)
  • at com.android.server.SystemServer.main(SystemServer.java:175)
  • at java.lang.reflect.Method.invoke!(Native method)
  • at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:738)
  • at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:628)
  • "Binder_1" prio=5 tid=8 Native
  • | group="main" sCount=1 dsCount=0 obj=0x12c610a0 self=0x5573e5c750
  • | sysTid=12092 nice=0 cgrp=default sched=0/0 handle=0x7fa2743450
  • | state=S schedstat=( 796240075 863170759 3586 ) utm=50 stm=29 core=1 HZ=100
  • | stack=0x7fa2647000-0x7fa2649000 stackSize=1013KB
  • | held mutexes=
  • kernel: __switch_to+0x70/0x7c
  • kernel: binder_thread_read+0xd78/0xeb0
  • kernel: binder_ioctl_write_read+0x178/0x24c
  • kernel: binder_ioctl+0x2b0/0x5e0
  • kernel: do_vfs_ioctl+0x4a4/0x578
  • kernel: SyS_ioctl+0x5c/0x88
  • kernel: cpu_switch_to+0x48/0x4c
  • native: #00 pc 0000000000069cd0 /system/lib64/libc.so (__ioctl+4)
  • native: #01 pc 0000000000073cf4 /system/lib64/libc.so (ioctl+100)
  • native: #02 pc 000000000002d6e8 /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+164)
  • native: #03 pc 000000000002df3c /system/lib64/libbinder.so (_ZN7android14IPCThreadState20getAndExecuteCommandEv+24)
  • native: #04 pc 000000000002e114 /system/lib64/libbinder.so (_ZN7android14IPCThreadState14joinThreadPoolEb+124)
  • native: #05 pc 0000000000036c38 /system/lib64/libbinder.so (???)
  • native: #06 pc 000000000001579c /system/lib64/libutils.so (_ZN7android6Thread11_threadLoopEPv+208)
  • native: #07 pc 0000000000090598 /system/lib64/libandroid_runtime.so (_ZN7android14AndroidRuntime15javaThreadShellEPv+96)
  • native: #08 pc 0000000000014fec /system/lib64/libutils.so (???)
  • native: #09 pc 0000000000067754 /system/lib64/libc.so (_ZL15__pthread_startPv+52)
  • native: #10 pc 000000000001c644 /system/lib64/libc.so (__start_thread+16)
  • (no managed stack frames)
  • ... //此处省略剩余的N个线程.

接下来从虚拟机角度说说目标进程收到该信号的处理过程,每一行关键信息都说明其所对应的输出方法。

二. ART信号捕获

2.1 SignalCatcher

[-> SignalCatcher.cc]

  • void* SignalCatcher::Run(void* arg) {
  • SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  • Runtime* runtime = Runtime::Current();
  • Thread* self = Thread::Current();
  • //当前进程状态处于非Runnable是
  • DCHECK_NE(self->GetState(), kRunnable);
  • {
  • MutexLock mu(self, signal_catcher->lock_);
  • signal_catcher->thread_ = self;
  • signal_catcher->cond_.Broadcast(self);
  • }
  • //设置需要handle的信号
  • SignalSet signals;
  • signals.Add(SIGQUIT); //信号3
  • signals.Add(SIGUSR1); //信号10
  • while (true) {
  • int signal_number = signal_catcher->WaitForSignal(self, signals);
  • if (signal_catcher->ShouldHalt()) {
  • runtime->DetachCurrentThread();
  • return nullptr;
  • }
  • switch (signal_number) {
  • case SIGQUIT:
  • //收到信号3 【见小节2.2】
  • signal_catcher->HandleSigQuit();
  • break;
  • case SIGUSR1:
  • signal_catcher->HandleSigUsr1();
  • break;
  • default:
  • LOG(ERROR) << "Unexpected signal %d" << signal_number;
  • break;
  • }
  • }
  • }

2.2 SignalCatcher::HandleSigQuit

[-> signal_catcher.cc]

  • void SignalCatcher::HandleSigQuit() {
  • Runtime* runtime = Runtime::Current();
  • std::ostringstream os;
  • os << "\n" << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";
  • DumpCmdLine(os);
  • std::string fingerprint = runtime->GetFingerprint();
  • os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";
  • os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";
  • os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";
  • // [见小节2.3]
  • runtime->DumpForSigQuit(os);
  • os << "----- end " << getpid() << " -----\n";
  • // [见小节3.7]
  • Output(os.str());
  • }

2.3 Runtime::DumpForSigQuit

[-> runtime.cc]

  • void Runtime::DumpForSigQuit(std::ostream& os) {
  • GetClassLinker()->DumpForSigQuit(os); //[见小节3.1]
  • GetInternTable()->DumpForSigQuit(os); //[见小节3.2]
  • GetJavaVM()->DumpForSigQuit(os); //[见小节3.3]
  • GetHeap()->DumpForSigQuit(os); //[见小节3.4]
  • TrackedAllocators::Dump(os);
  • os << "\n";
  • thread_list_->DumpForSigQuit(os); //[见小节3.5]
  • BaseMutex::DumpAll(os);
  • }

三. trace信息

3.1 ClassLinker

[-> class_linker.cc]

  • void ClassLinker::DumpForSigQuit(std::ostream& os) {
  • Thread* self = Thread::Current();
  • if (dex_cache_image_class_lookup_required_) {
  • ScopedObjectAccess soa(self);
  • MoveImageClassesToClassTable();
  • }
  • ReaderMutexLock mu(self, *Locks::classlinker_classes_lock_);
  • os << "Zygote loaded classes=" << pre_zygote_class_table_.Size() << " post zygote classes="
  • << class_table_.Size() << "\n";
  • }

3.2 InternTable

[-> intern_table.cc]

  • void InternTable::DumpForSigQuit(std::ostream& os) const {
  • os << "Intern table: " << StrongSize() << " strong; " << WeakSize() << " weak\n";
  • }

3.3 JavaVMExt

[-> java_vm_ext.cc]

  • void JavaVMExt::DumpForSigQuit(std::ostream& os) {
  • os << "JNI: CheckJNI is " << (check_jni_ ? "on" : "off");
  • if (force_copy_) {
  • os << " (with forcecopy)";
  • }
  • Thread* self = Thread::Current();
  • {
  • ReaderMutexLock mu(self, globals_lock_);
  • os << "; globals=" << globals_.Capacity();
  • }
  • {
  • MutexLock mu(self, weak_globals_lock_);
  • if (weak_globals_.Capacity() > 0) {
  • os << " (plus " << weak_globals_.Capacity() << " weak)";
  • }
  • }
  • os << '\n';
  • {
  • MutexLock mu(self, *Locks::jni_libraries_lock_);
  • os << "Libraries: " << Dumpable<Libraries>(*libraries_) << " (" << libraries_->size() << ")\n";
  • }
  • }

3.4 Heap

[-> heap.cc]

  • void Heap::DumpForSigQuit(std::ostream& os) {
  • os << "Heap: " << GetPercentFree() << "% free, " << PrettySize(GetBytesAllocated()) << "/"
  • << PrettySize(GetTotalMemory()) << "; " << GetObjectsAllocated() << " objects\n";
  • DumpGcPerformanceInfo(os); //输出大量gc相关的信息
  • }

DumpGcPerformanceInfo()这个方法的参数非常多,先省略, 后续再单独用一篇文章来讲解.

3.5 ThreadList

[-> thread_list.cc]

  • void ThreadList::DumpForSigQuit(std::ostream& os) {
  • {
  • ScopedObjectAccess soa(Thread::Current());
  • if (suspend_all_historam_.SampleSize() > 0) {
  • Histogram<uint64_t>::CumulativeData data;
  • suspend_all_historam_.CreateHistogram(&data);
  • suspend_all_historam_.PrintConfidenceIntervals(os, 0.99, data); // Dump time to suspend.
  • }
  • }
  • Dump(os); // [见小节3.5.1]
  • DumpUnattachedThreads(os); //[见小节3.5.2]
  • }

3.5.1 Dump

[-> thread_list.cc]

  • void ThreadList::Dump(std::ostream& os) {
  • {
  • MutexLock mu(Thread::Current(), *Locks::thread_list_lock_);
  • //输出当前进程的线程个数
  • os << "DALVIK THREADS (" << list_.size() << "):\n";
  • }
  • DumpCheckpoint checkpoint(&os);
  • //执行checkpoint检查
  • size_t threads_running_checkpoint = RunCheckpoint(&checkpoint);
  • if (threads_running_checkpoint != 0) {
  • checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint);
  • }
  • }

DALVIK THREADS (25)代表的是当前虚拟机中的线程个数为25. 另外, 此处RunCheckpoint方法比较重要,涉及进程的suspend状态问题.

3.5.2 DumpUnattachedThreads

[-> thread_list.cc]

top Created with Sketch.