Structured Data Roadmap¶
This document outlines the roadmap for implementing first-class support for structured data and system interaction within the Endo shell. It builds upon the F# syntax extensions, particularly records and union types.
Vision¶
The goal is to evolve Endo from a shell that primarily deals with streams of text to one that can natively understand and manipulate streams of structured objects. This enables more powerful, reliable, and developer-friendly automation, akin to PowerShell but with a modern, functional approach.
Instead of parsing text with awk, sed, and grep:
# Instead of: ps aux | grep 'endo' | awk '{print $2}'
ps |> filter (_.command == "endo") |> map _.pid
Phase 6.1: Core Infrastructure -- Complete¶
CoreVM Support for Records and Unions¶
Discriminated unions and record types are fully implemented in the CoreVM, supporting efficient creation, manipulation, and reference counting.
Structured Command Interface¶
An internal C++ interface (StructuredCommand) allows commands to declare that they produce structured data and advertise the type of their output. Platform-abstracted via ProcessProvider interface for cross-platform support.
Structured Data Wrapper¶
Four data source commands for ad-hoc structured parsing:
open-json/open-csv-- read from filesfrom-json/from-csv-- read from pipe input
These support inline record type definitions and named type references:
# Inline type definition
open-json "file.json" as { name: string; age: int }
# Named type reference
type Person = { name: str; age: int }
open-csv "people.csv" as Person
# Pipe-based source
curl api/users | from-json as { name: string; id: int } |> map _.name
Note
Output Recognition Files (Phase 6.3a) automate this for pipeline contexts by declaratively defining how to parse command output. Explicit wrappers like from-json remain available for ad-hoc use.
Phase 6.2: Built-in Structured Commands -- Complete¶
ls¶
Returns a stream of FileInfo records.
| Field | Type | Description |
|---|---|---|
name | string | File name |
size | int | File size in bytes |
mode | FileMode | File permissions |
mtime | int | Modification time (epoch seconds) |
isDir | bool | Whether the entry is a directory |
ps¶
Returns a stream of ProcessInfo records.
| Field | Type | Description |
|---|---|---|
pid | int | Process ID |
ppid | int | Parent process ID |
user | string | Owner |
cpu | float | CPU usage percentage |
mem | int | Memory usage in KB |
command | string | Command name |
jobs¶
Returns a stream of JobInfo records.
| Field | Type | Description |
|---|---|---|
id | int | Job number |
state | str | Running, Stopped, etc. |
command | str | Command string |
pid | int | Process ID |
Phase 6.3: Extensible Command Discovery¶
Phase 6.3a: Output Recognition Files -- Complete¶
Declarative YAML definitions teach Endo how to parse external command output without modifying the tools themselves.
- YAML definition file format with JSON and fields parser types
- Variant matching by command arguments with priority system
command_to_runoverride to redirect commands to structured-output flags- Definition file search paths:
~/.config/endo/definitions/, system-wide, and bundled - Pipeline integration via
StructuredPipelineSourceExprAST node - Bundled definitions for
docker ps,docker images,git log,git status
command: "docker"
variants:
- name: "ps-json"
matches:
- ["ps"]
- ["ps", "-a"]
priority: 10
command_to_run: "docker ps --format json {args}"
parser:
type: "json"
format: "lines"
See Structured Output Recognition for the full specification.
Phase 6.3b: Self-Describing Commands -- Planned¶
For new commands that opt into structured output:
- Discovery via
my-command --endo-schemaconvention - Libraries for C++, Rust, Go, Python to simplify writing structured commands
- Schema caching for performance
Phase 6.4: Structured Data Pipeline Integration -- Complete¶
Record-Aware List Operations¶
Standard F# higher-order functions work directly with record-typed lists. No special-purpose verbs are needed:
| Verb Equivalent | F# Function | Example |
|---|---|---|
where | filter | filter (_.name == "endo") |
select | map | map _.pid |
sort-by | sortBy | sortBy _.cpu |
group-by | groupBy | groupBy _.user |
Placeholder Lambda Sugar¶
Parser-level sugar for concise field access:
_.fielddesugars tofun __x -> __x.field_.field == valuedesugars tofun __x -> __x.field == value_ + 1desugars tofun __x -> __x + 1
Table Rendering¶
Lists of records are automatically rendered as tables when displayed:
- Auto-detect column widths with terminal-width-aware shrinking
- Three styles: Bordered (Unicode box-drawing), Compact, Plain
- Auto-style selection: Bordered with color for terminals, Plain for pipes
Suggested Future Structured Commands¶
Candidates for built-in structured output:
| Command | Fields |
|---|---|
df | filesystem, size, used, available, mountpoint |
netstat/ss | proto, localAddress, localPort, peerAddress, peerPort, state, pid |
git-log | sha, author, email, date, message |
docker-ps | id, image, status, ports, names |
ip-addr | interface, address, netmask, family |
history | index, timestamp, command |
env | name, value |