Table of Contents
1 问题
1.1 背景
在线上准备用fpm的access log 统计不同api的耗时和cpu使用情况,但是发现access log里显示的内容 不准确
通过定位,发现是使用了 fastcgi_finish_request 导致的这个问题
1.2 复现
基于 php-5.5.20 版本
在fpm里启用access log,格式设置为:
access.format = "%R - %u %t %T %s %{mili}d %{kilo}M %C%%"
1.2.1 code
<?php
    echo "hi\n";
    fastcgi_finish_request();
1.2.2 期待结果
127.0.0.1 - 13/Jan/2015:12:30:07 +0800 13/Jan/2015:12:30:07 +0800 200 0.913 256 0.00%
1.2.3 实际结果
127.0.0.1 - 01/Jan/1970:08:00:00 +0800 13/Jan/2015:12:30:07 +0800 200 3896627338.811 256 0.00%
1.2.4 测试结果说明
至少有两个内容不符合预期
- 收到请求的时间,即%t: server time the request was received
 - 处理时间,即%d: time taken to serve the request
 
2 定位
源码之下无秘密,so,看源码吧
2.1 打印access.log相关逻辑
在fpm的主函数里,当处理完一个请求的时候,就先调用fpm_request_end标记请求已经完成,并计算各种打印信息,最后调用fpm_log_write打印access.log
相关代码如下:
int main(int argc, char *argv[])
{
    //.............
    fcgi_init_request(&request, fcgi_fd);
     zend_first_try {
        while (fcgi_accept_request(&request) >= 0) {
            //.................
            fpm_request_executing();
            php_execute_script(&file_handle TSRMLS_CC);
            //............
            fpm_request_end(TSRMLS_C);
            fpm_log_write(NULL TSRMLS_CC);
            //.............
        }
    //.............
}
int fpm_log_write(char *log_format TSRMLS_DC)
{
    //..................
    s = log_format;
    while (*s != '\0') {
        //................
        switch (*s) {
            case 't': /* time */
                if (!test) {
                    time_t *t;
                    if (*s == 't') {
                        t = &proc.accepted_epoch; //we need proc.accepted_epoch here
                                                  //but when we use fastcgi_finish_request, we will clean the value of proc.accepted_epoch before call fpm_log_write
                    }
                    //................
                }
                format[0] = '\0';
                break;
            case 'u':
            //......................
        }
    }
    return 0;
}
2.2 fpm_request_finished相关逻辑
当php代码中走到 fastcgi_finish_request(),就会调用关闭与client的socket,同时调用fpm_request_finished设置进程的状态,但是这里却有一个问题
在fpm_request_finished里就把proc->accepted_epoch && proc->accepted清掉,导致后续打印日志的时候,就不符合预期了
php代码的调用逻辑如下:
main -> php_execute_script -> fastcgi_finish_request -> fcgi_close -> fpm_request_finished -> fpm_request_end -> fpm_log_write
相关代码:
PHP_FUNCTION(fastcgi_finish_request)
{
    //.................
    if (request->fd >= 0) {
        //.......................
        fcgi_close(request, 0, 0);
        RETURN_TRUE;
    }
    //................
}
void fcgi_close(fcgi_request *req, int force, int destroy)
{
      //.............
    if ((force || !req->keep) && req->fd >= 0) {
        //.............
        req->fd = -1;
        fpm_request_finished();
    }
}
void fpm_request_finished()
{
    //.....................
    proc->request_stage = FPM_REQUEST_FINISHED;
    proc->tv = now;
    memset(&proc->accepted, 0, sizeof(proc->accepted));//~~~~~~~~~~~~~~~~~~~~~~~~~~~
    proc->accepted_epoch = 0; //~~~~~~~~~~~~~~~~~~~~~~~~~~
    fpm_scoreboard_proc_release(proc);
}
2.3 结论
代码中使用fastcgi_finish_request的时候,会调用fpm_request_finished,但是在fpm_request_finished里会把统计时候用的参数清零,导致后续打印日志的时候就不符合预期了
3 解决
3.2 源码上解决
说实话,我没有想到一个比较好的解决办法
只是把fpm_request_finished里清空统计参数的那两句话注释了
虽然这个也能worker,但是感觉怪怪的,感觉在留坑
相关修改代码diff:
@@ -221,8 +221,6 @@ void fpm_request_finished() /* {{{ */
        proc->request_stage = FPM_REQUEST_FINISHED;
        proc->tv = now;
    -   memset(&proc->accepted, 0, sizeof(proc->accepted));
    -   proc->accepted_epoch = 0;
        fpm_scoreboard_proc_release(proc);
     }
 /* }}} */
并向官方提了一个patch, 目前已经被merge ,第二个被php官方AC的patch:
https://github.com/php/php-src/pull/995