Workflow Description
To fit the carpenter stages into the proposed workflow they must satisfy some requirements.
| Stage | Requirements |
|---|---|
| data import | Translate data config into a data mapping - mapping should be dictionary-like and provide n-D array access |
| data mapping | copy on write; accommodate new data; GPU compatible; aliases → Index; filtering → Mask |
| Index | Callable (order?); uproot path: dir1/tree1/var1; alias: dir1.tree1.var1; expression: dir1__dot__tree1__dot__var1 (or use a simplified version) |
| Mask(expression) | Callable, Mergable; merge via minary AND, OR: (mask1 | mask2) → mask3, (mask1 & mask2) → mask4 |
| Operations(config) | Callable, Mergable; Types: Define → new data; Cutflow → creates Masks + cutflow; Binning → createst Hists, tables; DataOut → creates ntuples/CSV/binary format |
Definestage: independent new entries → data.updateCutflowstage: merging counts across files and datasetsBinningstage: merge bin entries (preserve datasets?)
→ each type of merge needs its own rules
For optimization, we need to be able to replicate stages across inputs - if applicable.
Stage multiplexing will replicate a stage definition across inputs (previous stages, data import).
Stages that merge data, will typically have different rules for muliplexing (none, reduce by N).

