1   问题

1.1   背景

在线上fpm的日志里经常可以看到这种 warning 日志:

[20-Mar-2015 15:15:52] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'
[20-Mar-2015 15:15:53] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'
[20-Mar-2015 15:16:04] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'
[20-Mar-2015 15:16:19] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'
[20-Mar-2015 15:16:37] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'
[20-Mar-2015 15:16:42] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'
[20-Mar-2015 15:16:52] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'
[20-Mar-2015 15:17:24] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'
[20-Mar-2015 15:17:36] WARNING: The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'

1.2   fpm的相关使用策略

使用了一个pool, 200 个子进程,子进程采用 静态 策略(即 pm = static )

在全局配置里,设置 process.max = 200 ,这个变量是控制全部pool的子进程个数之和

2   定位

2.1   源码

根据日志,在源码里搜相关的代码,找到了如下片段:

// sapi/fpm/fpm/fpm_children.c
int fpm_children_make(struct fpm_worker_pool_s* wp, int in_event_loop, int nb_to_spawn, int is_debug)
{
    //.....................
    if (wp->config->pm == PM_STYLE_DYNAMIC) {
        //max = .....
    } else if (wp->config->pm == PM_STYLE_ONDEMAND) {
        //max = .....
    } else { // PM_STYLE_STATIC
        //max = .....
    }

    while (fpm_pctl_can_spawn_children() && wp->running_children < max &&
            (fpm_global_config.process_max < 1 || fpm_globals.running_children < fpm_global_config.process_max)) {
        //............
        pid = fork();
        //............
    }

    //here,~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    if (!warned && fpm_global_config.process_max > 0 && fpm_globals.running_children >= fpm_global_config.process_max) {
        warned = 1;
        zlog(ZLOG_WARNING, "The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'");
    }

    return 1;
}

2.2   分析

2.2.1   复现前提

在子进程接受的请求达到 pm.max_requests 次后,该子进程就会退出,主进程会重新fork子进程,于是就会走到 fpm_children_make 这个函数里

2.2.2   分析

由于线上采用的是静态生成子进程方式,进程数是固定的,同时fpm里设置了 process.max == pm.max_children

当同一个时刻只有一个子进程需要退出时,从代码上看,fork完后,fpm_globals.running_children 就达到了 pm.max_children,而 fpm_global_config.process_max 就是 process.max

这个时候两者相等,于是就会走到打印warning日志的分支里

在fpm的配置文件里是这样描述process.max的:

The maximum number of processes FPM will fork. This has been design to control
the global number of processes when using dynamic PM within a lot of pools.
Use it with caution.

主要是为了限制全部pool的子进程个数之和,针对使用了 dynamic PM

2.2.3   结论

在线上这种使用了 1个static的pool && process.max == pm.max_children 的情况下, 其实是不应该打印这条warning日志的 , 有点多余

3   解决

在打印日志的时候,应该判断一下这样情况,如果这个pool已经达到最大子进程数,就不打印warning日志,源码修改如下:

if (!warned && fpm_global_config.process_max > 0 && fpm_globals.running_children >= fpm_global_config.process_max) {
    if (wp->running_children < max) { //~~~~~
        warned = 1;
        zlog(ZLOG_WARNING, "The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max'");
    }
}