3. Simple transformations¶

Here we will review several simple transformations. Simple means that they do not depend on any variables and have no inputs. There are currently three transformations Points, Histogram and Histogram2d that enable user to initialize the input data (arrays or histograms) in forms of transformation outputs.

Note

Points, Histogram and Histogram2d objects may be constructed from TH1, TH2 and TMatrix. See Importing data from ROOT files for the example.

3.1. Points¶

The Points transformation is used to represent 1d/2d array as transformation output. The Points instance is created with numpy array passed as input:

01_points.py¶

import numpy as np
# Create numpy array
narray = np.arange(12).reshape(3,4)
# Create a points instance with data, stored in `narray`
parray = C.Points(narray)

# Print the structure of GNAObject parray
parray.print()
print()

# Print list of transformations
print('Transformations:', list(parray.transformations.keys()))

# Print list of outputs
print('Outputs:', list(parray.points.outputs.keys()))
print()

# Access the output `points` of transformation `points` of the object `parray`
print('Output:', parray.points.points)
# Access and print relevant DataType
print('DataType:', parray.points.points.datatype())
# Access the actual data
print('Data:\n', parray.points.points.data())

The code produces the following output:

[obj] Points: 1 transformation(s)
     0 [trans] points: 0 input(s), 1 output(s)
         0 [out] points: array 2d, shape 3x4, size  12

Transformations: ['points']
Outputs: ['points']

Output: [out] points: array 2d, shape 3x4, size  12
DataType: array 2d, shape 3x4, size  12
Data:
 [[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]

Let us now follow the code in more details. We prepare 2-dimensional array on a side of python:

narray = np.arange(12).reshape(3,4)

In order to use this data in the computational chain a transformation should be provided. The Points transformation is used for arrays. We use Points constructor from constructors module in order to initialize it from the numpy array [1].

parray = C.Points(narray)

Here parray is GNAObject. We now may print the information about its transformations, inputs and outputs:

parray.print()

[obj] Points: 1 transformation(s)
     0 [trans] points: 0 input(s), 1 output(s)
         0 [out] points: array 2d, shape 3x4, size  12

As it can be seen from the output, the Points instance has a single transformation called points with a single output again called points. As it was shown in the Introduction the transformation may be accessed by its name as an attribute of the object as object.transformation_name:

t = parray.points
print(t)

[trans] points: 0 input(s), 1 output(s)

The short way to access its output is similar, object.transformation_name.output_name. In our case it reads as follows:

output = parray.points.points
print(output)

[out] points: array 2d, shape 3x4, size  12

There exist a longer but in some cases more readable way of accessing the same data:

output = parray.transformations['points'].outputs['points']
print(output)

[out] points: array 2d, shape 3x4, size  12

Here we read the dictionary of transformations, request transformation points, access the dictionary with its outputs and request the output points.

As we now can access the transformation output, we may request the data it holds:

arr = parray.points.points.data()
print(arr)
print('shape:', arr.shape)

[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
shape: (3, 4)

The data() method triggers the transformation function which does the calculation and returns a numpy view on the result, contained in parray.points.points. Accessing the data() for the second time will do one of the following things:

Return the same view on a cached data in case no calculation is required.

If some of the prerequisites of the output has changed the transformation function will be called again updating the result. The view on the updated data is returned then.

The status of the transformation may be checked by accessing its taintflag:

print(bool(parray.points.tainted()))

If the result of the method is false, the call to data() will return cached data without triggering the transformation function. In case it is true, the call to data() will execute the transformation function and then return the view to updated data [2].

The term view here means that if the data will be modified by the transformation, the arr variable will contain the updated data. In the same time access to arr does not trigger the calculation itself, only data() does.

In case user wants to have a fixed version of the data the copy() method should be used:

arr = parray.points.points.data().copy()
print(arr)
print('shape:', arr.shape)

There is also datatype() method that returns a DataType instance holding the information on the array dimensions.

dt = parray.points.points.datatype()
print(dt)

Now we have defined a transformation holding the data. The transformations output may now be connected to other transformations’ inputs in order to build a computational chain (see Sum and product: transformations with inputs). It is important to understand that the way to access transformations and their inputs and outputs is universal and is applicable to any GNAObject.

3.2. Histogram¶

The Histogram transformation stores a 1-dimensional histogrammed data. It is very similar to the 1d version of Points with the only difference: its DataType stores the bin edges.

02_hist.py¶

import numpy as np
# Create numpy array for data points
nbins = 12
narray = np.arange(nbins)**2 * np.arange(nbins)[::-1]**2
# Create numpy array for bin edges
edges  = np.linspace(1.0, 7.0, nbins+1)

# Create a histogram instance with data, stored in `narray`
# and edges, stored in `edges`
hist = C.Histogram(edges, narray)
hist.print()
print()

# Access the output `hist` of transformation `hist` of the object `hist`
print('Output:', hist.hist.hist)
# Access and print relevant DataType
datatype = hist.hist.hist.datatype()
print('DataType:', datatype)
print('Bin edges:', list(datatype.edges))
# Access the actual data
print('Data:', hist.hist.hist.data())

The work flow for a histogram is very similar to the one of the array. The object has a single transformation hist with a single output hist.

The main difference is that DataType of the histogram now has histogram edges defined. On the line 19 datatype.edges C++ vector is accessed and converted to the python list.

The code produces the following output:

[obj] Histogram: 1 transformation(s)
     0 [trans] hist: 0 input(s), 1 output(s)
         0 [out] hist: hist,  12 bins, edges 1.0->7.0, width 0.5

Output: [out] hist: hist,  12 bins, edges 1.0->7.0, width 0.5
DataType: hist,  12 bins, edges 1.0->7.0, width 0.5
Bin edges: [1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0]
Data: [  0. 100. 324. 576. 784. 900. 900. 784. 576. 324. 100.   0.]

3.3. Histogram2d¶

The Histogram2d is 2-dimensional version of a histogram. It holds the 2-dimensional array and its datatype has two sets of bin edges.

03_hist2d.py¶

import numpy as np
# Create numpy arrays for bin edges
nbinsx, nbinsy = 12, 8
edgesx = np.linspace(0, nbinsx, nbinsx+1)
edgesy = np.linspace(0, nbinsy, nbinsy+1)
# Create fake data array
narray = np.arange(nbinsx*nbinsy).reshape(nbinsx, nbinsy)
narray = narray**2 * narray[::-1,::-1]**2

# Create a histogram instance with data, stored in `narray`
# and edges, stored in `edgesx` and `edgesy`
hist = C.Histogram2d(edgesx, edgesy, narray)
hist.print()
print()

# Access the output `hist` of transformation `hist` of the object `hist`
print('Output:', hist.hist.hist)
# Access and print relevant DataType
datatype = hist.hist.hist.datatype()
print('DataType:', datatype)
print('Bin edges (X):', list(datatype.edgesNd[0]))
print('Bin edges (Y):', list(datatype.edgesNd[1]))
# Access the actual data
print('Data:', hist.hist.hist.data())

And again the general work flow is very similar. When it comes to the multiple axes their bin edges may be accessed via edgesNd member of the DataType by axis index: see lines 21 and 22.

The code produces the following output:

[obj] Histogram2d: 1 transformation(s)
     0 [trans] hist: 0 input(s), 1 output(s)
         0 [out] hist: hist2d, 12x8=96 bins, edges 0.0->12.0 and 0.0->8.0

Output: [out] hist: hist2d, 12x8=96 bins, edges 0.0->12.0 and 0.0->8.0
DataType: hist2d, 12x8=96 bins, edges 0.0->12.0 and 0.0->8.0
Bin edges (X): [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]
Bin edges (Y): [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
Data: [[      0.    8836.   34596.   76176.  132496.  202500.  285156.  379456.]
 [ 484416.  599076.  722500.  853776.  992016. 1136356. 1285956. 1440000.]
 [1597696. 1758276. 1920996. 2085136. 2250000. 2414916. 2579236. 2742336.]
 [2903616. 3062500. 3218436. 3370896. 3519376. 3663396. 3802500. 3936256.]
 [4064256. 4186116. 4301476. 4410000. 4511376. 4605316. 4691556. 4769856.]
 [4840000. 4901796. 4955076. 4999696. 5035536. 5062500. 5080516. 5089536.]
 [5089536. 5080516. 5062500. 5035536. 4999696. 4955076. 4901796. 4840000.]
 [4769856. 4691556. 4605316. 4511376. 4410000. 4301476. 4186116. 4064256.]
 [3936256. 3802500. 3663396. 3519376. 3370896. 3218436. 3062500. 2903616.]
 [2742336. 2579236. 2414916. 2250000. 2085136. 1920996. 1758276. 1597696.]
 [1440000. 1285956. 1136356.  992016.  853776.  722500.  599076.  484416.]
 [ 379456.  285156.  202500.  132496.   76176.   34596.    8836.       0.]]