Honour UN/LOCODE 12-month deletion notice in publish pipeline #37
Fixes the bug where X-marked entries were silently stripped from published artefacts (e.g. JO AQB missing from the publication download). Per UN/LOCODE Manual § 7.1.1, X markers must remain visible in the publication for the notice period and only be physically removed once the date column is at least 12 calendar months older than the current edition.
Pipeline changes:
- apply_transforms preserves X rows on tag releases; only #/¦/| are cleared
- json_generator now consumes transformed entries so the website JSON reflects auto-bumped dates and X visibility (previously read source CSVs directly and bypassed all transforms)
- New unlocode_publisher.retention module + prune_aged_x.py CLI for date-aware physical removal of aged X rows from locodes/
- Publisher safety-net warns when --tag-release sees aged X rows in source
Source-side date stamping (apply_change_indicators):
- Excludes 'date' from OTHER_FIELDS so date-only diffs no longer flag a spurious indicator, restoring idempotency
- Clears stale +/#/¦/| indicators on rows that match the previous release exactly (fixes the long-standing leftover-indicator bug)
- Bumps date column to current edition's YYMM when setting a fresh indicator with empty/older date; respects future and current dates
- Detects newly-added X markers (X not in previous publication) and stamps date so the 12-month notice clock starts at the right edition; preserves continuing-notice X dates across cycles
- Fixes the line 1 'give #!/...' typo that broke import
YYMM helpers (utils.py):
- version_to_yymm, yymm_to_year_month, yymm_diff_months
- PERIOD_TO_MONTH = {1: 1, 2: 7} biannual only; raises on period > 2
Validator soft-warn:
- Empty date columns surface in an end-of-run warning summary instead of silently passing; --strict-dates flag for future enforcement
CI orchestration (.gitlab-ci.yml):
- On tag, apply_change_indicators runs against publication/{prev_tag}/csv, then prune_aged_x removes aged X rows
- Combined source mutations are committed back to main as a single bot commit (chore(release): apply change indicators and prune aged X rows) so source stays in sync with what was published
- Requires CI variable CI_PUSH_TOKEN (project access token with write_repository scope); see publishing-a-release.md
manage_locode_changes.remove_x_deletions kept as a deprecation shim that prints a pointer to prune_aged_x.py (it removes every X regardless of date and bypasses the manual's notice period).
Tests: 174 passing (38 utils, 17 retention, 53 publish_transforms, 46 apply_change_indicators, 20 validator).
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com