Workflow Description
To fit the carpenter
stages into the proposed workflow they must satisfy some requirements.
Stage | Requirements |
---|---|
data import | Translate data config into a data mapping - mapping should be dictionary-like and provide n-D array access |
data mapping | copy on write; accommodate new data; GPU compatible; aliases → Index; filtering → Mask |
Index | Callable (order?); uproot path: dir1/tree1/var1 ; alias: dir1.tree1.var1 ; expression: dir1__dot__tree1__dot__var1 (or use a simplified version) |
Mask(expression) | Callable , Mergable ; merge via minary AND , OR : (mask1 | mask2) → mask3 , (mask1 & mask2) → mask4 |
Operations(config) | Callable , Mergable ; Types: Define → new data; Cutflow → creates Masks + cutflow; Binning → createst Hists, tables; DataOut → creates ntuples/CSV/binary format |
Define
stage: independent new entries → data.updateCutflow
stage: merging counts across files and datasetsBinning
stage: merge bin entries (preserve datasets?)
→ each type of merge needs its own rules
For optimization, we need to be able to replicate stages across inputs - if applicable.
Stage multiplexing will replicate a stage definition across inputs (previous stages, data import).
Stages that merge data, will typically have different rules for muliplexing (none, reduce by N
).