map1

MAP v1 — Specification v1.1 Canon: MAP1 (v1) Status: v1.1 (frozen)

This version is v1.0 + BOOLEAN and INTEGER type additions. v1.0 shipped with four types (STRING, BYTES, LIST, MAP) and mapped JSON booleans to STRING “true”/”false”, rejected all JSON numbers. Community feedback identified two problems: (a) boolean-string collision: true and “true” produced the same MID (b) rejecting all numbers was too restrictive for common use cases

v1.1 adds BOOLEAN (tag 0x05) and INTEGER (tag 0x06) to the canonical model and updates the JSON-STRICT adapter accordingly. No changes to CANON_HDR, existing MCF tags (0x01-0x04), key ordering, projection semantics, MID format, or error precedence.

All v1.0 MIDs that did not involve JSON booleans remain identical under v1.1. MIDs involving JSON booleans will change because booleans now encode as type 0x05 instead of STRING.

============================================================

  1. OVERVIEW

MAP v1 defines:

1) A canonical model (STRING, BYTES, LIST, MAP, BOOLEAN, INTEGER). 2) A canonical binary encoding (MCF) for that model. 3) A canonical byte stream (CANON_BYTES) = CANON_HDR || MCF(root_value). 4) A deterministic identifier (MID) = sha256(CANON_BYTES) with “map1:” prefix. 5) Projection rules (FULL, FULL-MINIMUM, BIND) to derive a stable identity surface.

MAP is identity-only. MAP does not grant authority, does not assert safety, and does not interpret semantics.

============================================================

  1. TERMINOLOGY

Descriptor

Projection Mode

CANON_HDR

MCF

CANON_BYTES

MID

============================================================

  1. PROJECTION (NORMATIVE)

2.1 Root BIND root requirement: For BIND projection, the parsed root_value MUST be a MAP. If the parsed root_value is not a MAP, implementations MUST reject deterministically with ERR_SCHEMA.

The input to all MAP v1 identity functions is a descriptor that is modeled as a MAP.

2.2 FULL Projection FULL projection produces a canonical model value equal to the descriptor MAP.

2.3 BIND Projection Semantics BIND projection constructs a projected MAP by selecting values from the descriptor using RFC 6901 JSON-Pointer paths.

Normative rules:

(0) Pointer-set rules (normative) (a) Pointer parsing Each pointer in pointer_set MUST parse according to RFC 6901. Any pointer parse failure MUST reject with ERR_SCHEMA.

(b) Duplicate pointers (fail-closed) pointer_set MUST NOT contain duplicate pointers (byte-identical strings). If duplicates are present, reject deterministically with ERR_SCHEMA.

(c) Unmatched pointers (fail-closed, with one exception) A pointer “matches” if it selects a value in the descriptor under RFC 6901 traversal rules. If no pointers match any value in the descriptor, project() returns an EMPTY MAP (count=0) as specified in rule (3). Otherwise (i.e., at least one pointer matches), if any pointer does not match, reject deterministically with ERR_SCHEMA.

(d) Overlapping pointers (subsumption) If pointer P1 is a strict path-prefix of pointer P2 (P2 begins with P1 followed by “/”), then P1 subsumes P2; P2 has no additional effect on the projection result.

(e) Empty pointer “” (MAP-root FULL-equivalent) The empty pointer “” selects the entire MAP root (RFC 6901 whole-document pointer applied to a MAP root). For the purpose of rule (c), the empty pointer “” is a matching pointer (it always selects the MAP root). If pointer_set contains “”, the projection result is FULL-equivalent over the MAP root and “” subsumes all other pointers. This rule does not change the BIND root requirement: non-MAP roots MUST still reject with ERR_SCHEMA.

(1) Omit siblings (mechanical rule)

(2) Minimal enclosing structure

(3) If no pointer paths match

(4) LIST traversal is forbidden (Option 1 — LOCKED)

(5) JSON-Pointer parsing

2.4 Underscore Open-Field Discipline (FULL-MINIMUM) [Reserved for future version — see Appendix D.]

2.5 Absence vs Empty Absence and empty are canonically distinct:

============================================================

  1. CANONICAL ENCODING (MCF) (NORMATIVE)

3.1 Canonical Model Types

3.2 Type Tags MCF encodes a single value as:

STRING : 0x01   uint32be(byte_len)   utf8_bytes                
BYTES : 0x02   uint32be(byte_len)   raw_bytes                
LIST : 0x03   uint32be(count)   value_1     value_n        
MAP : 0x04   uint32be(count)   (key_1   val_1)     (key_n   val_n)
BOOLEAN: 0x05   payload_byte                    
INTEGER: 0x06   int64be(value)                    

BOOLEAN payload_byte:

INTEGER encoding:

Constraints:

3.3 Length/Count Fields (Fork-hardening)

3.4 STRING UTF-8 Rules

3.5 MAP Key Ordering (Critical Fork Surface) Ordering is unsigned-octet lexicographic compare over raw UTF-8 bytes (memcmp semantics).

Normative:

Implementation note (non-normative):

3.6 Duplicate Keys

3.7 Fast-Path Validation for Pre-Serialized CANON_BYTES If an implementation accepts pre-serialized CANON_BYTES (e.g., mid_from_canon_bytes), it MUST:

============================================================

  1. NORMATIVE LIMITS

MAX_CANON_BYTES = 1,048,576 (1 MiB) // total CANON_BYTES length MAX_DEPTH = 32 // depth of nested LIST/MAP containers MAX_MAP_ENTRIES = 65,535 MAX_LIST_ENTRIES = 65,535

Depth definition (normative):

Size-limit precedence (fork-hardening):

============================================================

  1. CANON_BYTES AND MID (NORMATIVE)

5.1 CANON_HDR CANON_HDR is exactly 5 bytes: 0x4D 0x41 0x50 0x31 0x00 (“MAP1” + NUL)

5.2 CANON_BYTES CANON_BYTES = CANON_HDR || MCF(root_value)

5.3 MID MID = “map1:” || hex_lower(sha256(CANON_BYTES))

============================================================

  1. ERRORS (NORMATIVE)

6.1 Error Codes ERR_CANON_HDR - invalid header ERR_CANON_MCF - malformed MCF (parse failure, trailing bytes, truncated) ERR_SCHEMA - invalid descriptor shape for the selected mode (e.g., BIND pointer traversal into LIST) ERR_TYPE - unsupported type in adapter layer (e.g., JSON null, JSON float) ERR_UTF8 - invalid UTF-8 or forbidden scalar values (including surrogates) ERR_DUP_KEY - duplicate MAP key ERR_KEY_ORDER - MAP keys not in required order ERR_LIMIT_DEPTH - exceeds MAX_DEPTH ERR_LIMIT_SIZE - exceeds MAX_CANON_BYTES or other size limits

JSON adapter parse failures (normative):

6.2 Error Code Precedence (Reported) If multiple violations apply, implementations MUST report the first applicable error in this precedence order:

ERR_CANON_HDR ERR_CANON_MCF ERR_SCHEMA ERR_TYPE ERR_UTF8 ERR_DUP_KEY ERR_KEY_ORDER ERR_LIMIT_DEPTH ERR_LIMIT_SIZE

Reported-code rule (normative):

Safety vs precedence rule (normative; fork-hardening):

Non-normative implementer note:

============================================================

  1. REQUIRED API SURFACE

An implementation MUST provide the following functions (or equivalent behavior), with identical semantics.

7.1 Canonical Bytes canonical_bytes_full(descriptor_map) -> bytes | ERR_* canonical_bytes_bind(descriptor_map, pointer_set) -> bytes | ERR_*

Rules:

7.2 MID mid_full(descriptor_map) -> MID | ERR_* mid_bind(descriptor_map, pointer_set) -> MID | ERR_* mid_from_canon_bytes(canon_bytes) -> MID | ERR_*

Rules:

============================================================

  1. JSON ADAPTER (NORMATIVE) — JSON-STRICT PROFILE

MAP v1 defines a single normative JSON ingestion profile: JSON-STRICT.

8.1 Parsing Requirements

8.1.1 BOM Stance (STRICT; fork-hardening)

8.2 Type Mapping (JSON-STRICT)

8.2.1 JSON Number Rules (normative) JSON numbers are accepted only if ALL of the following conditions are met: (a) The JSON number token does not contain a decimal point character ‘.’. (b) The JSON number token does not contain an exponent indicator ‘e’ or ‘E’. (c) The numeric value, when parsed as a signed integer, is within the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (signed 64-bit).

If any condition is not met, the adapter MUST reject with ERR_TYPE.

Normative examples:

The token-level check (conditions a and b) is intentional. The adapter inspects the raw JSON number token string, not the parsed numeric value. This prevents silent coercion (e.g., 1.0 being treated as integer 1).

8.3 Duplicate Object Keys Duplicate detection is required at the JSON adapter boundary.

Normative:

============================================================

  1. SECURITY & INTEROP NOTES (NON-NORMATIVE)

============================================================

  1. CHANGELOG

v1.1

v1.0

v0.2.6

v0.2.5

v0.2.4

v0.2.2

============================================================ APPENDIX A: INTEGRATION GUIDANCE (NON-NORMATIVE) ============================================================

A1) Identity is not authority MAP provides stable identity bytes and hashes. It does not decide whether a mutation is allowed. If you interpret a MID as “safe” or “approved,” you MUST bind that meaning to an external authority system.

A2) Trust boundaries SHOULD reconstruct If a trust boundary accepts a MID or CANON_BYTES from an untrusted party, it SHOULD reconstruct CANON_BYTES from the actual descriptor (and pointers, if BIND) rather than trusting caller-supplied bytes. Otherwise you risk “orphan MIDs” that cannot be reproduced by correct implementations.

A3) Projection context is out-of-band MID does not encode projection mode. If your system’s meaning differs for FULL vs BIND, you must bind that context out-of-band and must not infer FULL vs BIND alone.

A4) Underscore discipline is a caller convention FULL-MINIMUM is a caller-selected convention. If systems in the same identity domain mix FULL and FULL-MINIMUM, they will generate different MIDs for the same descriptor. Treat that as an identity mismatch, not a bug.

A5) Empty BIND projections BIND projections that produce an empty MAP may be valid at the MAP layer, but many applications SHOULD treat this as an application-layer error (an “empty identity surface” is usually suspicious).

A6) Version prefix stability MAP’s identity function is versioned by prefix (map1:). The addition of BOOLEAN and INTEGER types in v1.1 does not change the map1: prefix. Future type additions that preserve the canonical framing do not alter the map1: identity class. A new prefix (e.g., map2:) would only be introduced for changes that break canonical encoding compatibility.

============================================================ APPENDIX B: MAP v1 LAYER DIAGRAM (NON-NORMATIVE) ============================================================

Layer 0: Raw input (JSON / internal structures / pre-serialized bytes) | v Layer 1: Adapter profile (JSON-STRICT)

============================================================ APPENDIX C — CONFORMANCE SUITE MINIMUM ADD-ON VECTORS (v1.1) ============================================================

These vectors add coverage only; they MUST NOT require rebaselining existing golden outputs (except for vectors involving JSON booleans, which now encode differently under v1.1).

C1. Escape equivalence (key and value)

C2. Lone surrogate reject (adapter boundary)

C3. UTF-16 vs UTF-8 ordering trap (FULL)

C4. Fast-path trailing bytes reject

C5. Depth boundary (32 pass / 33 fail)

C6. RFC 6901 tilde decoding (BIND)

C7. Prefix ordering (raw UTF-8 bytes)

C8. BIND omit-siblings behavior

C9. BIND LIST traversal rejection (Option 1)

C10. Duplicate-after-unescape (adapter boundary)

C11. BOM rejection (adapter boundary)

C12. Reported error precedence under safety limits

C13. BOOLEAN type distinction

C14. INTEGER type distinction

C15. INTEGER boundary

C16. Float rejection

C17. Mixed-type containers

APPENDIX D — FULL-MINIMUM (RESERVED FOR FUTURE VERSION)

The FULL-MINIMUM projection mode is deferred. The following text is included here as a non-normative preview of the intended semantics.

FULL-MINIMUM is a caller-selected convention that strips “open fields” starting with underscore.

Draft rules:

============================================================ END