【PgSQL】定位函數中最耗時的SQL

一、問題背景#

對於 pg 中某函數執行時間超過 30 分鐘，查看慢在其中哪一段 SQL。但是如果直接通過 pg_stat_activity 查詢，只能看到上層執行函數的語句，而不像 oracle 和 SqlServer 能看到當前在執行什麼。

create table test(id int);
INSERT INTO test(id) VALUES (generate_series(1, 10000));
 
create table test2(id int);
INSERT INTO test2(id) VALUES (generate_series(1, 10000));

create or replace function test_f() returns bigint
as $$
	select count(*) from test;
	select count(*) from test2;
	select count(*) from test,test2;
$$ language sql;

執行函數

test_db=# \timing
Timing is on.
test_db=# select test_f();
  test_f
-----------
 100000000
(1 row)
Time: 16203.883 ms

查看運行情況

-- PG >= 9.6
select pid,usename,substring(query from 0 for 50),now()-query_start as time,wait_event_type,wait_event from pg_stat_activity where state = 'active';


  pid   | usename  |                     substring                     |      time       | wait_event_type | wait_event
--------+----------+---------------------------------------------------+-----------------+-----------------+------------
 395096 | postgres | select test_f();                                  | 00:00:01.197204 |                 |

二、方法彙總及對比#

方法	直接分析函數代碼	raise notice 打標記	auto_explain 設置 pg_nested_statements=true	pg_stat_statements.trask=all
優點	直截了當，適合函數代碼非常簡單的場景。	1. 直截了當 2. 可以在程序運行時定位慢 SQL	1. 可以顯示函數中每個 SQL 的執行計劃和執行時間。 2. 可以在全局或會話級開啟	1. 可以顯示函數中每個 sQL 的統計數據，間接分析慢 SQL 2. 可以在全局或會話級開啟。
缺點	對於複雜函數，基本不可行。	對於複雜函數，需要打的標記非常多，不太方便	1. 對於高負載、函數多的庫，基本不能在全局開啟 2. 會話級開啟需要手動執行，不能在程序運行時分析 3. 對於有 dml 操作的函數，不建議執行或者需開啟事務執行後回滾。	1. 對於高負載、函數多的庫，基本不能在全局開啟 2. 對於 SQL 執行多的庫，比 auto_explain 更難定位 3. 會話級開啟需要手動執行，不能在程序運行時分析 4. 對於有 dml 操作的函數，不建議執行或者需開啟事務執行後回滾，

附上之前記錄的 Oracle 和 SQL Server 的定位方法：

Oracle 存儲過程定位慢 SQL 方法_oracle 慢 sql_Hehuyi_In 的博客 - CSDN 博客

SQL Server 如何找到存儲過程中最耗時的部分_Hehuyi_In 的博客 - CSDN 博客

直接分析函數代碼#

\sf 函數名 可以查看函數代碼，這適用於函數非常簡單的場景

\sf test_f

如果函數中 SQL 很長，輸出可能會錯行，不方便分析，可以用 psql 將其導出成文本。

psql -c "\sf test_f" > test_f.sql

raise notice 打標記#

注意 raise 要用 plpgsql 語言，像上面用 sql 會報錯，為方便測試稍微做點改造。

CREATE OR REPLACE FUNCTION test_f_2() RETURNS integer AS $$
declare num int;
BEGIN
	raise notice 'notice 1, start at: %',clock_timestamp();
	insert into test select count(*) from test;
	raise notice 'notice 2, start at: %',clock_timestamp();
	select count(*) into num from test,test2;
	raise notice 'notice 3, start at: %',clock_timestamp();
	insert into test2 select count(*) from test2;
	raise notice 'notice end, end at: %',clock_timestamp();
	return num;
END;
$$ LANGUAGE plpgsql;

select test_f_2();

auto_explain 顯示每個 SQL 執行計劃#

對於業務量大的庫，不適合全局抓取 SQL，可以在客戶端開啟。
設置以下參數，核心是 log_nested_statements = true，展示函數內所有 SQL 執行計劃。

擴展 auto_explain 的安裝，詳見：[[6. Extensions-PG#auto_explain]]

LOAD 'auto_explain';
set client_min_messages='log';
set auto_explain.log_min_duration = 0;
set auto_explain.log_analyze = true;
set auto_explain.log_verbose = true;
set auto_explain.log_buffers = true;
set auto_explain.log_nested_statements = true;

test_db=# select test_f();
-- SQL 1 執行時間
LOG:  duration: 0.984 ms  plan:
Query Text:
select count(*) from test;
select count(*) from test2;
select count(*) from test,test2;

Aggregate  (cost=170.00..170.01 rows=1 width=8) (actual time=0.983..0.983 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=45
  ->  Seq Scan on public.test  (cost=0.00..145.00 rows=10000 width=0) (actual time=0.003..0.568 rows=10002 loops=1)
        Output: id
        Buffers: shared hit=45

-- SQL 2 執行時間
LOG:  duration: 1.114 ms  plan:
Query Text:
select count(*) from test;
select count(*) from test2;
select count(*) from test,test2;

...

-- 總的執行時間
LOG:  duration: 11760.851 ms  plan:
Query Text: select test_f();
Result  (cost=0.00..0.26 rows=1 width=4) (actual time=11760.840..11760.841 rows=1 loops=1)
  Output: test_f()
  Buffers: shared hit=232 dirtied=1
  test_f
-----------
 100040004
(1 row)

設置 pg_stat_statements.track=all#

基本上也只適合客戶端開啟，或者在測試環境 reset 後單獨跑，比 auto_explain 更難找
postgresql.conf

shared_preload_libraries = 'pg_stat_statements'

compute_query_id = on
pg_stat_statements.max = 10000
pg_stat_statements.track = all

set pg_stat_statements.track="all";
select test_f();

select userid::regrole,total_exec_time,query from pg_stat_statements order by total_exec_time desc;

搬運自：
PostgreSQL 定位函數中最耗時的 SQL