A few notes on Feldman-Cousins
=================================

Let's try to do the same contours with the previous example within
Feldman-Cousins approach. The only thing that's needed in addition to
the statistic map produced in the previous example is the distribution
of the statistic with respect to fluctuations of the theoretical
expectation on each point of the grid. To construct such a map, we
need to employ ToyMC -- just simple generator of fake experimental
data from average prediction. Currently two such ToyMCs are
implemented namely ``CovariatedToyMC`` which samples from Gaussian
according to the covariance matrix and ``PoissonToyMC`` which
samples Poisson distribution just by using prediction as the
average. More is to be implemented for real like analysis, for
example to sample nuissance parameters from some distribution.

We can try to employ both of them. To do that we just need to supply
``--toymc`` to the ``analysis`` command, which will just replace the
real data with the corresponding ToyMC generator. As the argument to
``--toymc`` you may pass ``covariance`` or ``poisson``
correspondingly.

Then it's also required to adjust the ``scan`` command. We need to
add here ``--toymc`` too, but the argument means the name of ``toymc``
object (the same as of the ``analysis``). Also we need the
``--toymc-type grid`` -- this instructs the ``scan`` to generate
different samples on each point of the grid (the other option is
``--toymc-type static`` is useful for example for coverage check or
CLs, but it needs some debugging). Another difference is that we need
to provide two minimizers -- one for used for the minimum with the
parameter value fixed on grid, the other is the global
minimization. Finally, samples count to generate is specified with
``--samples`` (we choose 10000).

Here is the full command for one-dimensional (``BackgroundRate``
uncertainty is taken into account in the covariance matrix) scan for
CovarianceToyMC:

.. code-block:: bash

  python ./gna ns --define peak.BackgroundRate central=14 relsigma=0.3 \
            -- gaussianpeak --name peak --nbins 10 \
            -- script scripts/gaussianpeak_data \
            -- dataset --name pulls --pull peak.BackgroundRate \
            -- analysis --name first_analysis --dataset peak_fakedata \
            --observables peak/spectrum --parameters peak.BackgroundRate --toymc covariance \
            -- chi2 first_analysis_chi2 first_analysis \
            -- minimizer ongrid_minimizer minuit first_analysis_chi2 \
            -- minimizer global_minimizer minuit first_analysis_chi2 peak.Mu \
            -- scan --output /tmp/peak_fc_covariance.hdf5 --grid peak.Mu 0 150 0.5 \
            --toymc first_analysis --toymc-type grid --samples 10000 \
            --minimizer ongrid_minimizer --minimizer global_minimizer

Give it some time to finish its work. Then check it with ``hdf-java``
or ``h5dump`` and you'll see the similar structure as for normal
chi-square map, but with a lot (10000) of entries in each point
corresponding to different random samples.

The last step before the plotting is to generate so-called ``fcmap``
which is basically just sorted list of :math:`\Delta \chi^2` values
for quantile computation. The command is very simple:

.. code-block:: bash

  python ./gna fcmap --fcscan /tmp/peak_fc_covariance.hdf5 --output /tmp/peak_fcmap_covariance.hdf5

Assuming you have ``/tmp/peak_scan_1d_covariance.hdf5`` from the
previous part, let's run the plotting command:

.. code-block:: bash

  python ./gna ns --define peak.BackgroundRate central=14 relsigma=0.3 \
            -- gaussianpeak --name peak --nbins 10 \
            -- script scripts/gaussianpeak_data \
            -- analysis --name first_analysis --dataset peak_fakedata \
            --observables peak/spectrum --parameters peak.BackgroundRate \
            -- chi2 first_analysis_chi2 first_analysis \
            -- minimizer first_analysis_minimizer minuit first_analysis_chi2 peak.Mu \
            -- contour --chi2 /tmp/peak_scan_1d_covariance.hdf5 --plot chi2ci 1s 2s \
            --minimizer first_analysis_minimizer --fc /tmp/peak_fcmap_covariance.hdf5 \
            --plot fc 1s 2s --show

In comparison to the previous plotting, we have just added ``--fc``
with the path of the fcmap and options to ``--plot fc``. You should
see nice picture -- two curves almost match (up to
fluctuations). That's natural, because our case is very good -- no
constraints on parameter, everything is Gaussian-distributed. Try the
same excercise with ``PoissonToyMC`` (just change ``--toymc
covariance`` to  ``--toymc poisson`` in the ``analysis`` and the
filenames) and you'll see completely different picture -- it seems
like just a lower limit can be obtained. No gurantee about the
correctness of the conclusion though.

Another interesting thing is to see how limits on the fitting
parameter influences the contour. To do that, you need to use so
called minimizer spec -- a short description about parameters in YAML
format. It's passed to the ``minimizer`` command with ``-s``
option. To limit ``peak.Mu`` in ``[0; 100]`` bounds, modify all
minimizers with ``peak.Mu`` in the following way:
  
.. code-block:: bash

  -- minimizer -s '{peak.Mu: {limits: [0, 100]}}' global_minimizer minuit \
  first_analysis_chi2 peak.Mu

Don't forget to check the details (and look for other possible
options) in ``gna.minimizers.spec``.

To make two-dimensional Feldman-Cousins contours you can use all the
same machinery, just specify the correct parameters in minimizers and
corresponding grids. But to save some computation time, you also can
try to produce fcmap only in region where you expect your contour to
be, for example by selecting the points of the standard chi-squared
contour plus some margin. It can be done by saving the interesting
points list to a file and giving it to the ``scan`` as input with the
``--points`` argument. The points list may be produced by the
``contour``. For example:

.. code-block:: bash

  contour --chi2 /tmp/peak_scan.hdf5 --plot chi2ci 1s 2s --minimizer \
  first_analysis_minimizer  --show --points chi2ci 1s --savepoints /tmp/peak_points.hdf5

The given command will save to the file ``/tmp/peak_points.hdf5`` the
points around 1 sigma contour (check it in ``hdf-java``). The width of
band around the contour may be controlled by two additional numbers
to ``--points`` -- inside and outside width. They have no definite
meaning, just larger value means wider band in the corresponding
direction. The default values are ``0.05 0.05``. When the file is
generated, just pass its path to the ``scan`` instead of all the
``--grid``-s.

Finally, you will definitely want to run the Feldman-Cousins scanning
on a cluster utilizing a lot of CPU cores. You can efficiently split
the tasks by the points utilizing the ``--pointsrange`` argument to
the ``scan``. It takes up to three integer, which are interpreted as
python indexing or slice: ``[a]``, ``[a:b]`` or ``[a:b:c]`` on the
linearized list of all points in file, given by
``--points``. Alternatively, you can specify interesting points
directly by given several points path in form ``--pointspath path1
path2 ...``, where path is just numerical values of parameters (in the
same order as in the points file). The points file is still
required. Finally, you can split by samples number. After you'll get a
lot of splitted scan results, you should merge them into one file
before processing with ``fcmap`` by using ``fcmerge``:

.. code-block:: bash
   
  python ./gna fcmerge --fcscan fcfile1 fcfile2 ... --output mergedfile

where ``fcfile``-s are partial scan outputs and ``mergedfile`` is the
final output to be passed to ``fcmap``.