当前位置:网站首页>The problem come from line screening process
The problem come from line screening process
2022-08-05 06:13:00 【sick caterpillar】
Troubleshooting
- For various common online problems, sort out the troubleshooting ideas.
Business questions
- Online problems are mostly caused by business problems. When most requests in the online environment are normal, when some or a user has problems, how to troubleshoot them?
- Under the current microservice system, there are generally distributed link tracking systems and ELK log systems. We can find the point of the problem through the monitoring platform:
- Crawling of exception logs
- At this point, we can get the current user's request information through log tracking:
- Use the watch command of Arths to monitor the corresponding abnormal interface, get the corresponding parameters through the log, and simulate the request of the online user through the invoke command of Dubbo, so as to reproduce the problem and solve the problem
Non-business questions
- Arthas tool is a good tool for locating problems online, easy to install
- In the troubleshooting process for non-business problems, it is necessary to first check the computer core resources such as CPU, memory, threads, etc.
- We can get the corresponding information in this service through the dashboard command, and get the latest data every few seconds.
- You can see in the thread monitoring area: thread id, name, status, CPU usage, whether to guard the thread, etc.
- Memory Hee Hee: Heap Memory, Eden Area, Survivor Area, Old Age, Method Area
- Machine condition
As above, we can get the key thread id of the corresponding thread information
Then you can query the execution stack of a thread through Thread thread_id without even dumping
There is also decompiled jad, and online query of the source code information of the corresponding class is convenient for troubleshooting
However, most of the online incidents do not have time to search temporarily, corresponding to the generation system, there is not much time for online positioning,
I will proceed as follows:
- Sequentially restart the problematic machines to see if that fixes the problem,
- At the same time, execute the jmap -dump command on the last machine to be restarted to save the thread status of the java heap
- If the machine cannot be restored after restarting, it will be rolled back to the previous version to ensure normal online business
- Import the saved dump file to the local
- Use the java visualVM tool that comes with jdk to import the dump file
- visualVM can view the classes used in the dump file records through the visual interface, the objects in each class and the specific content in various current environments can be analyzed offline and solved after analyzing the specific reasons.
边栏推荐
- Spark源码-任务提交流程之-6.1-sparkContext初始化-创建spark driver端执行环境SparkEnv
- unity实现第一人称漫游(保姆级教程)
- ACLs and NATs
- Call the TensorFlow Objection Detection API for object detection and save the detection results locally
- Spark source code - task submission process - 4-container to start executor
- I217-V network disconnection problem in large traffic under openwrt soft routing
- 【Day8】磁盘及磁盘的分区有关知识
- spark operator-wholeTextFiles operator
- spark source code - task submission process - 5-CoarseGrainedExecutorBackend
- 深度 Zabbix 使用指南——来自惨绿少年
猜你喜欢
随机推荐
【Day8】 RAID磁盘阵列
入门文档03 区分开发与生产环境(生产环境才执行‘热更新’)
Spark source code-task submission process-6.1-sparkContext initialization-create spark driver side execution environment SparkEnv
【Day6】文件系统权限管理 文件特殊权限 隐藏属性
图片压缩失效问题
Getting Started 04 When a task depends on another task, it needs to be executed in sequence
SSL证书提示过期或者无效,该怎么处理呢?
spark operator-parallelize operator
【Day8】Knowledge about disk and disk partition
To TrueNAS PVE through hard disk
ACLs and NATs
spark算子-repartition算子
One-arm routing and 30% switch
智能运维会取代人工运维吗?
spark源码-RPC通信机制
IP packet format (ICMP protocol and ARP protocol)
spark operator-textFile operator
入门文档01 series按顺序执行
深度 Zabbix 使用指南——来自惨绿少年
unity实现第一人称漫游(保姆级教程)