【PgSQL】定位函数中最耗时的SQL

一、问题背景#

对于 pg 中某函数执行时间超过 30 分钟，查看慢在其中哪一段 SQL。但是如果直接通过 pg_stat_activity 查询，只能看到上层执行函数的语句，而不像 oracle 和 SqlServer 能看到当前在执行什么。

create table test(id int);
INSERT INTO test(id) VALUES (generate_series(1, 10000));
 
create table test2(id int);
INSERT INTO test2(id) VALUES (generate_series(1, 10000));

create or replace function test_f() returns bigint
as $$
	select count(*) from test;
	select count(*) from test2;
	select count(*) from test,test2;
$$ language sql;

执行函数

test_db=# \timing
Timing is on.
test_db=# select test_f();
  test_f
-----------
 100000000
(1 row)
Time: 16203.883 ms

查看运行情况

-- PG >= 9.6
select pid,usename,substring(query from 0 for 50),now()-query_start as time,wait_event_type,wait_event from pg_stat_activity where state = 'active';


  pid   | usename  |                     substring                     |      time       | wait_event_type | wait_event
--------+----------+---------------------------------------------------+-----------------+-----------------+------------
 395096 | postgres | select test_f();                                  | 00:00:01.197204 |                 |

二、方法汇总及对比#

方法	直接分析函数代码	raise notice 打标记	auto_explain 设置 pg_nested_statements=true	pg_stat_statements.trask=all
优点	直截了当，适合函数代码非常简单的场景。	1. 直截了当 2. 可以在程序运行时定位慢 SQL	1. 可以显示函数中每个 SQL 的执行计划和执行时间。 2. 可以在全局或会话级开启	1. 可以显示函数中每个 sQL 的统计数据，间接分析慢 SQL 2. 可以在全局或会话级开启。
缺点	对于复杂函数，基本不可行。	对于复杂函数，需要打的标记非常多，不太方便	1. 对于高负载、函数多的库，基本不能在全局开启 2. 会话级开启需要手动执行，不能在程序运行时分析 3. 对于有 dml 操作的函数，不建议执行或者需开启事务执行后凤滚。	1. 对于高负载、函数多的库，基本不能在全局开启 2. 对于 SQL 执行多的库，比 auto_explain 更难定位 3. 会话级开启需要手动执行，不能在程序运行时分析 4. 对于有 dml 操作的函数，不建议执行或者需开启事务执行后回滚，

附上之前记录的 Oracle 和 SQL Server 的定位方法：

Oracle 存储过程定位慢 SQL 方法_oracle 慢 sql_Hehuyi_In 的博客 - CSDN 博客

SQL Server 如何找到存储过程中最耗时的部分_Hehuyi_In 的博客 - CSDN 博客

直接分析函数代码#

\sf 函数名 可以查看函数代码，这适用于函数非常简单的场景

\sf test_f

如果函数中 SQL 很长，输出可能会错行，不方便分析，可以用 psql 将其导出成文本。

psql -c "\sf test_f" > test_f.sql

raise notice 打标记#

注意 raise 要用 plpgsql 语言，像上面用 sql 会报错，为方便测试稍微做点改造。

CREATE OR REPLACE FUNCTION test_f_2() RETURNS integer AS $$
declare num int;
BEGIN
	raise notice 'notice 1, start at: %',clock_timestamp();
	insert into test select count(*) from test;
	raise notice 'notice 2, start at: %',clock_timestamp();
	select count(*) into num from test,test2;
	raise notice 'notice 3, start at: %',clock_timestamp();
	insert into test2 select count(*) from test2;
	raise notice 'notice end, end at: %',clock_timestamp();
	return num;
END;
$$ LANGUAGE plpgsql;

select test_f_2();

auto_explain 显示每个 SQL 执行计划#

对于业务量大的库，不适合全局抓取 SQL，可以在客户端开启。
设置以下参数，核心是 log_nested_statements = true，展示函数内所有 SQL 执行计划。

扩展 auto_explain 的安装，详见：[[6. Extensions-PG#auto_explain]]

LOAD 'auto_explain';
set client_min_messages='log';
set auto_explain.log_min_duration = 0;
set auto_explain.log_analyze = true;
set auto_explain.log_verbose = true;
set auto_explain.log_buffers = true;
set auto_explain.log_nested_statements = true;

test_db=# select test_f();
-- SQL 1 执行时间
LOG:  duration: 0.984 ms  plan:
Query Text:
select count(*) from test;
select count(*) from test2;
select count(*) from test,test2;

Aggregate  (cost=170.00..170.01 rows=1 width=8) (actual time=0.983..0.983 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=45
  ->  Seq Scan on public.test  (cost=0.00..145.00 rows=10000 width=0) (actual time=0.003..0.568 rows=10002 loops=1)
        Output: id
        Buffers: shared hit=45

-- SQL 2 执行时间
LOG:  duration: 1.114 ms  plan:
Query Text:
select count(*) from test;
select count(*) from test2;
select count(*) from test,test2;

...

-- 总的执行时间
LOG:  duration: 11760.851 ms  plan:
Query Text: select test_f();
Result  (cost=0.00..0.26 rows=1 width=4) (actual time=11760.840..11760.841 rows=1 loops=1)
  Output: test_f()
  Buffers: shared hit=232 dirtied=1
  test_f
-----------
 100040004
(1 row)

设置 pg_stat_statements.track=all#

基本上也只适合客户端开启，或者在测试环境 reset 后单独跑，比 auto_explain 更难找
postgresql.conf

shared_preload_libraries = 'pg_stat_statements'

compute_query_id = on
pg_stat_statements.max = 10000
pg_stat_statements.track = all

set pg_stat_statements.track="all";
select test_f();

select userid::regrole,total_exec_time,query from pg_stat_statements order by total_exec_time desc;

搬运自：
PostgreSQL 定位函数中最耗时的 SQL