A single-page brief for coding agents and developers integrating OpenSportTaxonomy (OST). Full detail is below; if you only read one thing, read this:
Start here — for agents, do exactly this
1. Fetch this exact URL — the machine-readable source of truth
(catalogue, labels, modifiers, and a version: field), served as YAML:
https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/main/schema.yaml
If you are baking the data into a build, fetch a pinned snapshot
instead — main drifts as the spec evolves. Tags are
spec/vX.Y.Z; the current spec is 0.10.0:
https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/spec/v0.10.0/schema.yaml
2. Parse a sport as code(+modifier)* — dotted code
path, then + modifiers sorted alphabetically and de-duplicated. The
string is the identity.
3. Label it by exact match in sports:, else compose
(see §4 — do not hand-derive a label by reversing the code).
4. Python reference implementation:
pip install open-sport-taxonomy
python -c "from open_sport_taxonomy import Sport; print(Sport('cycling.road').label)" # road cycling
The raw GitHub URL above is canonical (and pinnable). This site's own
/schema.yaml and /mappings/ 302-redirect to it as a
convenience, so either works — but pin the GitHub tag if you bake the data.
Platform translation tables:
mappings/.
1 · The canonical string
A sport is a single string. Dots (.) separate a sport from its
disciplines in the hierarchy; plusses (+) attach modifiers
(circumstances that don't change the movement). The string itself is the
identity — not membership of any catalogue.
cycling.road+stationary+virtual
| String | Meaning |
|---|---|
cycling | cycling (any discipline) |
cycling.road | road cycling |
cycling.road+race | road cycling race |
cycling.road+stationary+virtual | e.g. a Zwift ride |
xc_skiing.classic+roller | classic roller skiing |
Canonical form: a code followed by zero or more
+modifiers, with modifiers sorted alphabetically and
de-duplicated. Two strings denote the same sport iff their canonical
forms are identical. Modifiers are independent and absence means
unspecified, never "the opposite".
2 · Well-formed grammar
A well-formed string is a dotted code path followed by zero or more
+ modifiers. Each segment is lowercase letters with single internal
underscores; dots live only inside the code, never after a +.
# segment = [a-z]+(?:_[a-z]+)*
WELL_FORMED = /^[a-z]+(?:_[a-z]+)*(?:\.[a-z]+(?:_[a-z]+)*)*(?:\+[a-z]+(?:_[a-z]+)*)*$/
Accepts xc_skiing.skate+roller, alpine_skiing,
cycling+stationary+virtual, generic. Rejects
+stationary (no code), cycling. (trailing dot),
cycling..road (empty segment), Cycling (uppercase),
cycling__road (doubled underscore),
cycling+stationary.foo (dot after +). Sortedness and
de-duplication are properties of the canonical form, not of this lexical
grammar.
3 · Three validity tiers
Strictly nested: standard ⊆ known-atoms ⊆ well-formed.
| Tier | Holds when | Python |
|---|---|---|
| well-formed | matches the grammar above (modifiers may be unsorted) | Sport.parse(s) succeeds |
| known-atoms | well-formed and the code and every modifier are declared atoms (group-valid) | Sport.uses_known_atoms |
| standard | the exact canonical string is in the catalogue | Sport.is_standard |
Any well-formed string is valid and storable — a client may mint
cycling.road+race even though it is not (yet) a catalogue entry.
The catalogue is a recommended profile over the open string space, not
a whitelist.
4 · Getting a label
A label is a human display name — presentation only, never identity.
To resolve one, fetch
schema.yaml
and match the canonical string against the sports: list:
sports:
- sport: cycling
label: cycling
- sport: cycling+stationary
label: indoor cycling
- sport: cycling.road
label: road cycling
modifiers:
- code: stationary
label: stationary
group: environment
Labels are looked up, never derived by string manipulation.
Reversing or re-spacing the code is wrong: cycling.mountain is
"mountain biking", not "mountain cycling", and xc_skiing is "XC
skiing". The exact algorithm:
def label(s): # s is a canonical sport string
if s in sports: # 1. exact catalogue match wins
return sports[s].label
code, mods = split(s) # e.g. "cycling.road", ["race"]
if code in sports: # 2. look up the BARE code's label
base = sports[code].label # (a catalogue lookup, not a transform)
else: # 3. only an UNCATALOGUED code is derived
base = code.replace(".", " ").replace("_", " ")
if not mods:
return base
return base + " (" + ", ".join(modifiers[m].label for m in mods) + ")"
Worked examples (resolved against the catalogue above):
| String | Path | Label |
|---|---|---|
cycling+stationary | exact match | indoor cycling |
cycling.mountain | exact match | mountain biking |
cycling.road+race | bare code cycling.road + modifier | road cycling (race) |
parkour (uncatalogued code) | derive from string | parkour |
Note cycling.road+race is not itself a catalogue entry, so step 1
misses; its bare code cycling.road is, so step 2 supplies "road
cycling". Hand-derivation (step 3) only ever applies when the code is unknown
to the catalogue.
5 · Translating to / from a platform
Each file in
mappings/
is a bidirectional table between a platform's identifiers and OST strings
(strava.yaml, garmin_fit.yaml,
apple_healthkit.yaml, polar.yaml, suunto.yaml,
wahoo.yaml, garmin_training_api.yaml). Translation is
lossy by design: decode yields the most-specific OST string the
platform actually represents; encode may coarsen. Unmatched
values fall back via each file's fallback block. The normative
encode/decode rules are in
docs/translation.md.
6 · Reference implementation (Python)
pip install open-sport-taxonomy
from open_sport_taxonomy import Sport
from open_sport_taxonomy.platforms import garmin_fit
s = Sport("cycling+stationary")
s.label # "indoor cycling"
s.is_standard # True
s.resolve() # nearest standard sport
garmin_fit.decode(2, 0) # Sport('cycling.road')
garmin_fit.encode(Sport("cycling.road")) # GarminFitCode(sport=2, sub_sport=0)
Full API — storage, matching, sub-sport containment, Pydantic integration — is in
python/README.md.
The package exposes open_sport_taxonomy.version (package release) and
open_sport_taxonomy.taxonomy_version (the spec version it implements).
7 · Pinning a version
The spec is versioned in schema.yaml (version:) and
tagged spec/vX.Y.Z. Sport codes are stable — once published, never
removed, only deprecated. Pin a snapshot via the git tag:
# Latest
https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/main/schema.yaml
# Pinned to a spec version
https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/spec/v0.10.0/schema.yaml
Canonical sources
- schema.yaml — catalogue, labels, modifiers, version (machine-readable source of truth)
- docs/taxonomy.md — terminology, the modality rule, validity tiers, operations
- docs/translation.md — language-agnostic encode/decode specification
- docs/reference.md — generated catalogue browser (all codes & modifiers)
- mappings/ — per-platform translation tables
- python/README.md — reference implementation API
- llms.txt — the same pointers in plain text for crawlers
A sport is missing? Open an issue at github.com/sweatstack/open-sport-taxonomy/issues.