notna.io · Business Intelligence

ETL pipelines &
task orchestration,
re-imagined.

Two .NET 9 tools, purpose-built for high-throughput data movement and parallel process scheduling — from Progress OpenEdge sources to SQL Server targets, and beyond.

.NET 9 Spectre.Console Dapper Plugin architecture LPT scheduling Multicast streaming
EHC.Bi — live run
6 DB engines supported
1→N Multicast targets
LPT Scheduling algorithm
0 User interaction needed
Why notna.io

Three generations of production evidence

notna.io presents V3 — the third generation of an ETL and orchestration platform born inside a hospital information system serving over 2,000 employees. V1 entered clinical production and ran for 8 years, surfacing the real operational constraints that textbook ETL tools ignore: connection pooling exhaustion on legacy ODBC drivers, silent data quality failures with no visibility, manual scheduling bottlenecks, and the cost of reading the same source multiple times to feed multiple targets.

V2 addressed these incrementally. V3 was redesigned from scratch — multicast streaming, statistical anomaly detection, LPT wall-clock scheduling, plugin architecture — with every architectural decision rooted in 8 years of production evidence. Nothing in V3 is theoretical.

V1
First generation — entered hospital production. 8 years of real clinical data, real failures, real lessons.
V2
Incremental iteration — connection pooling fixed, scheduling improved, multicast introduced.
V3 Current
Ground-up redesign. Plugin architecture, LPT scheduling, anomaly detection, multicast streaming. Open source.
Applications

Two tools, one pipeline

Each app is independently deployable and operable via CLI. Together they cover the full ETL orchestration lifecycle.

🔄
EHC.Bi.Db.Loader
EHC.Bi.Db.Loader.exe assembly load--object -a opale -b sa_opale -t PATIENT
Multicast ETL Anomaly-aware Incremental
Command lifecycle — how it all connects
Source via:
Step 1
create
Provision target schema
Step 2
profile
Build anomaly profile
Step 3
load
Stream data to target(s)
# Introspects source schema → creates target table(s) — no DDL scripting needed
EHC.Bi.Db.Loader.exe assembly create--object  -a opale -b sa_opale -t PATIENT
✔ Target table created — ready for profiling and loading

A modular, cross-database ETL console application that transfers data from one source to one or more targets simultaneously. Designed to run autonomously — from a scheduled task, SSIS ExecuteProcess, or the Orchestrator — with zero user interaction once configured.

  • Multicast streaming — one source read, N target writes in parallel via in-memory channels. Loading into three targets costs no more than one.
  • Incremental loading — automatic watermark-based incremental loads. Configures once via a single row in dbo.incremental_watermark, no CLI changes needed.
  • Anomaly detection — two-phase statistical column profiling. Profiles built offline, applied per-batch at load time with zero overhead.
  • Plugin architecture — database drivers loaded at runtime via reflection. Extend to any engine without recompiling the core.
  • 8 CLI commands — load--object, load--sql, create--object, create--sql, recover--load, profile--object, profile--sql, check.
  • Schema & truncate tokens@schema=, @truncate, @postprocess inline in the CLI arg.
In development Anomaly detection

A two-phase statistical profiling engine. Phase 1: assembly profile--object scans a source column offline, builds a numeric or categorical signature (min/max/mean/stddev, coverage threshold, value distribution), and persists it to the Tools database. Phase 2: profiles are loaded once at startup and applied per batch during streaming — any row deviating from the signature triggers a warning without aborting the load. Zero extra DB round-trips during data movement.

Next milestone In-memory data transformation

Transformation rules applied inline to the batch already in memory during the streaming pass — type coercions, column renames, value mappings, derived columns — before the data is written to any target. No staging table, no second pass, no additional DB connection. Transformation cost is absorbed into the streaming time that already exists.

Supported engines
SQL Server
ORACLE Oracle
MySQL
PostgreSQL
OE Progress OpenEdge
SQLite
Example — incremental load
# Insert once in Tools DB: INSERT INTO dbo.incremental_watermark (source, object_name, watermark_column, watermark_type) VALUES ('opale','PATIENT', 'LAST_MODIFIED,CREATED','DateTime'); # Then just run — no --where needed: EHC.Bi.Db.Loader.exe assembly load--object -a opale -b sa_opale -t PATIENT
EHC.Bi.Apps.Orchestrator
EHC.Bi.Apps.Orchestrator.exe orchestrator run --tasks opale_daily.json --batch {guid} --threads 4
Wall-clock optimiser LPT (Longest Processing Time) Retry-capable

A generic parallel task orchestrator whose primary goal is to minimise total wall-clock time across a batch of heterogeneous tasks. It dispatches any command-line process — fully decoupled from what the command does — and focuses entirely on scheduling its lifecycle for maximum throughput.

Wall-clock compression

A 24-task batch totalling 16 hours of sequential work completes in under 1 hour with 4 threads, automatically balanced by LPT (Longest Processing Time) bin-packing. The orchestrator computes the optimal thread count from the CV of task weights — no manual tuning needed.

  • LPT (Longest Processing Time) scheduling — tasks sorted by weight descending, distributed via greedy bin-packing. Task weights are median runtimes derived from 7 years of production history, making estimates reliable for known objects. New objects are seeded with the batch average and self-correct to a stable median after ~7 days of runs. Queue re-sorts dynamically when actual runtime deviates >50% from estimate.
  • Resource domain gating — a HardFail on any task immediately gates its entire resourceIdentifier group. Other domains continue uninterrupted.
  • Retry policy — configurable exit codes trigger retry with wait. Exit code -2 (process failed to start) always HardFails unconditionally.
  • Batch recovery — rerun with --recovery and the same GUID: completed tasks are skipped, failed tasks are re-queued.
  • Any runtime — .exe, python, node, php, any executable. The task file just needs a command string and a weight estimate.
  • --showconsole — opens a live console window per spawned process for interactive diagnostics, no recompile needed.
Task file — two-section format
{ "resources": [ { "resourceIdentifier": "multicast", "workingDirectory": "C:\\Tools\\Loader" } ], "tasks": [ { "objectCode": "pre", "projectCode": "OPALE", "resourceIdentifier": "multicast", "objectCommand": "Loader.exe assembly load--object -a opale -b sa_opale -t pre", "elapsedTime": 5011 } ] }
Exit code contract
0 → Success 1 (or --retrycodes) → Retry up to --maxretry other non-zero → HardFail, gate domain -2 → HardFail (process couldn't start)
Architecture

How they work together

The Orchestrator is the conductor. The Loader is a musician. Any number of musicians can play in parallel.

Loader — data flow
01
Source readOne connection to Progress/Oracle/MySQL. Batched IDataReader stream.
02
PipelineTransformation rules + anomaly detection applied per batch, inline.
03
Channel fan-outIn-memory Channel<T> distributes each batch to N target writers.
04
Parallel writesEach target consumes independently via SqlBulkCopy or provider bulk.
05
Watermark commitMAX observed value written to dbo.incremental_watermark on success.
Orchestrator — dispatch loop
01
Task file loadJSON parsed into OrchestratorTaskFile. Resources section maps identifiers to directories.
02
LPT (Longest Processing Time)CV-based thread count, greedy bin-packing into N bins by median ElapsedTime weight. New objects seeded with batch average.
03
Conductor dispatchSingle conductor dequeues heaviest pending task, assigns to idle ThreadWorker.
04
Process spawnThreadWorker: SplitCommand() parses exe + args, Process.Start(), timeout guard.
05
Exit code evaluationRetryPolicy: Success / Retry / Fail / HardFail. ResourceDomainMonitor gates on HardFail.
Runtime
.NET 9
Both applications — console executables, Windows & Linux
CLI framework
Spectre.Console
Colour logging, progress, command parsing
Data access
Dapper
Stored-procedure calls to Tools SQL Server
Persistence
SQL Server
Tools DB: watermarks, batch logs, task logs, anomaly profiles