A few notes on Feldman-Cousins¶

Let’s try to do the same contours with the previous example within Feldman-Cousins approach. The only thing that’s needed in addition to the statistic map produced in the previous example is the distribution of the statistic with respect to fluctuations of the theoretical expectation on each point of the grid. To construct such a map, we need to employ ToyMC – just simple generator of fake experimental data from average prediction. Currently two such ToyMCs are implemented namely CovariatedToyMC which samples from Gaussian according to the covariance matrix and PoissonToyMC which samples Poisson distribution just by using prediction as the average. More is to be implemented for real like analysis, for example to sample nuissance parameters from some distribution.

We can try to employ both of them. To do that we just need to supply --toymc to the analysis command, which will just replace the real data with the corresponding ToyMC generator. As the argument to --toymc you may pass covariance or poisson correspondingly.

Then it’s also required to adjust the scan command. We need to add here --toymc too, but the argument means the name of toymc object (the same as of the analysis). Also we need the --toymc-type grid – this instructs the scan to generate different samples on each point of the grid (the other option is --toymc-type static is useful for example for coverage check or CLs, but it needs some debugging). Another difference is that we need to provide two minimizers – one for used for the minimum with the parameter value fixed on grid, the other is the global minimization. Finally, samples count to generate is specified with --samples (we choose 10000).

Here is the full command for one-dimensional (BackgroundRate uncertainty is taken into account in the covariance matrix) scan for CovarianceToyMC:

python ./gna ns --define peak.BackgroundRate central=14 relsigma=0.3 \
          -- gaussianpeak --name peak --nbins 10 \
          -- script scripts/gaussianpeak_data \
          -- dataset --name pulls --pull peak.BackgroundRate \
          -- analysis --name first_analysis --dataset peak_fakedata \
          --observables peak/spectrum --parameters peak.BackgroundRate --toymc covariance \
          -- chi2 first_analysis_chi2 first_analysis \
          -- minimizer ongrid_minimizer minuit first_analysis_chi2 \
          -- minimizer global_minimizer minuit first_analysis_chi2 peak.Mu \
          -- scan --output /tmp/peak_fc_covariance.hdf5 --grid peak.Mu 0 150 0.5 \
          --toymc first_analysis --toymc-type grid --samples 10000 \
          --minimizer ongrid_minimizer --minimizer global_minimizer

Give it some time to finish its work. Then check it with hdf-java or h5dump and you’ll see the similar structure as for normal chi-square map, but with a lot (10000) of entries in each point corresponding to different random samples.

The last step before the plotting is to generate so-called fcmap which is basically just sorted list of \(\Delta \chi^2\) values for quantile computation. The command is very simple:

python ./gna fcmap --fcscan /tmp/peak_fc_covariance.hdf5 --output /tmp/peak_fcmap_covariance.hdf5

Assuming you have /tmp/peak_scan_1d_covariance.hdf5 from the previous part, let’s run the plotting command:

python ./gna ns --define peak.BackgroundRate central=14 relsigma=0.3 \
          -- gaussianpeak --name peak --nbins 10 \
          -- script scripts/gaussianpeak_data \
          -- analysis --name first_analysis --dataset peak_fakedata \
          --observables peak/spectrum --parameters peak.BackgroundRate \
          -- chi2 first_analysis_chi2 first_analysis \
          -- minimizer first_analysis_minimizer minuit first_analysis_chi2 peak.Mu \
          -- contour --chi2 /tmp/peak_scan_1d_covariance.hdf5 --plot chi2ci 1s 2s \
          --minimizer first_analysis_minimizer --fc /tmp/peak_fcmap_covariance.hdf5 \
          --plot fc 1s 2s --show

In comparison to the previous plotting, we have just added --fc with the path of the fcmap and options to --plot fc. You should see nice picture – two curves almost match (up to fluctuations). That’s natural, because our case is very good – no constraints on parameter, everything is Gaussian-distributed. Try the same excercise with PoissonToyMC (just change --toymc covariance to --toymc poisson in the analysis and the filenames) and you’ll see completely different picture – it seems like just a lower limit can be obtained. No gurantee about the correctness of the conclusion though.

Another interesting thing is to see how limits on the fitting parameter influences the contour. To do that, you need to use so called minimizer spec – a short description about parameters in YAML format. It’s passed to the minimizer command with -s option. To limit peak.Mu in [0; 100] bounds, modify all minimizers with peak.Mu in the following way:

-- minimizer -s '{peak.Mu: {limits: [0, 100]}}' global_minimizer minuit \
first_analysis_chi2 peak.Mu

Don’t forget to check the details (and look for other possible options) in gna.minimizers.spec.

To make two-dimensional Feldman-Cousins contours you can use all the same machinery, just specify the correct parameters in minimizers and corresponding grids. But to save some computation time, you also can try to produce fcmap only in region where you expect your contour to be, for example by selecting the points of the standard chi-squared contour plus some margin. It can be done by saving the interesting points list to a file and giving it to the scan as input with the --points argument. The points list may be produced by the contour. For example:

contour --chi2 /tmp/peak_scan.hdf5 --plot chi2ci 1s 2s --minimizer \
first_analysis_minimizer  --show --points chi2ci 1s --savepoints /tmp/peak_points.hdf5

The given command will save to the file /tmp/peak_points.hdf5 the points around 1 sigma contour (check it in hdf-java). The width of band around the contour may be controlled by two additional numbers to --points – inside and outside width. They have no definite meaning, just larger value means wider band in the corresponding direction. The default values are 0.05 0.05. When the file is generated, just pass its path to the scan instead of all the --grid-s.

Finally, you will definitely want to run the Feldman-Cousins scanning on a cluster utilizing a lot of CPU cores. You can efficiently split the tasks by the points utilizing the --pointsrange argument to the scan. It takes up to three integer, which are interpreted as python indexing or slice: [a], [a:b] or [a:b:c] on the linearized list of all points in file, given by --points. Alternatively, you can specify interesting points directly by given several points path in form --pointspath path1 path2 ..., where path is just numerical values of parameters (in the same order as in the points file). The points file is still required. Finally, you can split by samples number. After you’ll get a lot of splitted scan results, you should merge them into one file before processing with fcmap by using fcmerge:

python ./gna fcmerge --fcscan fcfile1 fcfile2 ... --output mergedfile

where fcfile-s are partial scan outputs and mergedfile is the final output to be passed to fcmap.