排查DataGrip的Mongo驱动执行慢的问题

准备工作

启动应用程序

xx.xxxx.xxxxx.utils.Application     : Started Application in 10.313 seconds (JVM running for 10.995)

启动arthas

java -jar arthas-boot.jar

输入我们应用程序的编号然后回车，我这里是输入1号，显示下边界面就是输入成功

  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'

wiki       https://arthas.aliyun.com/doc
tutorials  https://arthas.aliyun.com/doc/arthas-tutorials.html
version    3.6.7
main_class
pid        12704
time       2023-02-02 20:21:54

我们可以直接命令行交互，也可以通过web页面交互

在浏览器输入 http://127.0.0.1:8563/ 进入web页面交互

排查思路

目前是在执行 orderExecute 这个方法非常慢

com.xxxx.xxx.utils.service.impl.MongoServiceImpl#orderExecute

我们使用arthas的 trace 命令来对这个函数进行分析排查

我们写一个trace命令来追踪这些方法

trace -E com.xxxx.xxx.utils.service.impl.MongoServiceImpl orderExecute

postman请求业务接口

先执行一次接口，让类加载一下

找到耗时的方法

执行我们上边写的trace命令

trace -E com.xxxx.xxx.utils.service.impl.MongoServiceImpl orderExecute
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 2) cost in 302 ms, listenerId: 6

然后多次执行我们的业务接口

可以发现这个方法占执行时间的百分之97-99（由于打了断点，正常情况下这个业务接口要4-7s左右）可以确定是这个方法在整个流程中执行时间较久

我们继续跟踪这个方法

–skipJDKMethod false 参数可以跟踪包含jdk内的方法

trace -E com.xxx.xxx.utils.service.impl.MongoServiceImpl execute --skipJDKMethod false

之后就继续跟踪耗时较久的方法

到这一步，大概就能看出来耗时出在哪里了。
我们去看一下对应的源码

查看源码

    private MongoShell shell; 
    private Future<MongoShell> shellFuture;

public synchronized MongoShell getShell(@NotNull MongoConnection connection) throws SQLException {
        if (this.shell == null) {
            try {
                //这里获取shell的耗时比较久
                this.shell = (MongoShell)this.shellFuture.get();
                this.shell.setClient(connection.getService().getMongoClient());
                this.shell.eval("use " + connection.getSchema());
            } catch (ExecutionException | InterruptedException var3) {
                throw new SQLException(var3);
            }
        }

        return this.shell;
    }

发现在获取MongoShell的时候会比较慢。最高可能达到4s左右的时间。

class MongoShell(client: MongoClient? = null, engine: Engine? = null) {
    private val context = MongoShellContext(engine)
    private val wrapper = ValueWrapper(context)
    private val converter = MongoShellConverter(context, wrapper)
    private val evaluator = MongoShellEvaluator(client, context, converter, wrapper)
    private val consoleLog = ConsoleLogSupport(context, converter)

    fun setClient(client: MongoClient): Unit = evaluator.setClient(client)

    fun eval(@Language("js") script: String): MongoShellResult<*> {
        ....
        ....
    }

    fun close() = evaluator.close()
}

  private Future<MongoShell> submitNewShell() {
    return executorService.submit(() -> {
        //这里会每次都new 一个MongoShell
      MongoShell shell = new MongoShell(null, engine);
      shell.eval("'initial warm up'");
      return shell;
    });
  }

大概看了下，这个mongoShell是node打包出来的一个用来执行mongo命令的引擎，按理说这个对象可以一直缓存在内存中使用，只需要保持执行的MongoClient不一样即可。

修改源码

    public synchronized MongoShell getShell(@NotNull MongoConnection connection) {
        if (shell == null) {
            try {
                //获取MongoShell
                shell = MongoShellFactory.getMongoShell();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            //设置MongoClient
            shell.setClient(connection.getService().getMongoClient());
            shell.eval("use " + connection.getSchema());
        }
        return shell;
    }

把原来每次都new一个MongoShell改为从Factory获取。

public class MongoShellFactory {

    private static Lock lock = new ReentrantLock();
    private static Condition awaitCondition = lock.newCondition();
    private static int defaultSize = 10;
    private static int notUseListSize = 0;
    private static int useListSize = 0;
    private static Engine engine = Engine.create("js");
    private static List<MongoShell> notUseList = new ArrayList<>();
    private static List<MongoShell> useList = new ArrayList<>();
	
    //初始化一定数量的MongoShell
    public static void initPool(int maxSize){
        lock.lock();
        try {
            defaultSize = maxSize;
            for (int i = 0; i < defaultSize; i++) {
                notUseList.add(new MongoShell(null,engine));
            }
        }finally {
            notUseListSize = notUseList.size();
            lock.unlock();
        }
    }
	//获取未被正在使用的MongoShell
    public static MongoShell getMongoShell() throws InterruptedException {
        lock.lock();
        try {
            int diffSize = notUseListSize + useListSize - defaultSize;
            if (diffSize > 0){
                for (int i = 0; i < diffSize; i++) {
                    notUseList.add(new MongoShell(null,engine));
                    notUseListSize++;
                }
            }
            if (notUseList.isEmpty()){
                //wait
                awaitCondition.await();
            }
            MongoShell mongoShell = notUseList.remove(0);
            useList.add(mongoShell);
            return mongoShell;
        }finally {
            useListSize++;
            notUseListSize--;
            lock.unlock();
        }
    }
	//归还正在使用的MongoShell
    public static void close(MongoShell mongoShell){
        lock.lock();
        try {
            notUseList.add(mongoShell);
            awaitCondition.signal();
            for (int i = 0; i < useList.size(); i++) {
                MongoShell useMongoShell = useList.get(i);
                if (useMongoShell.toString().equals(mongoShell.toString())){
                    useList.remove(i);
                }
            }
        }finally {
            notUseListSize++;
            useListSize--;
            lock.unlock();
        }
    }

}

打包验证

我们再次追踪 orderExecute接口，可以发现，原先的 execute接口由 5-7s 降到了平均的 1s 左右。

这这个平均 1s 是 js引擎本地执行mongo命令的时间。

知足常乐

知足常乐

排查DataGrip的Mongo驱动执行慢的问题

排查DataGrip的Mongo驱动执行慢的问题

准备工作

启动应用程序

启动arthas

排查思路

postman请求业务接口

找到耗时的方法

查看源码

修改源码

打包验证