OpenSportTaxonomy

The sport vocabulary your app was missing.

A single-page brief for coding agents and developers integrating OpenSportTaxonomy (OST). Full detail is below; if you only read one thing, read this:

Start here — for agents, do exactly this

1. Fetch this exact URL — the machine-readable source of truth (catalogue, labels, modifiers, and a version: field), served as YAML:

https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/main/schema.yaml

If you are baking the data into a build, fetch a pinned snapshot instead — main drifts as the spec evolves. Tags are spec/vX.Y.Z; the current spec is 0.10.0:

https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/spec/v0.10.0/schema.yaml

2. Parse a sport as code(+modifier)* — dotted code path, then + modifiers sorted alphabetically and de-duplicated. The string is the identity. 3. Label it by exact match in sports:, else compose (see §4 — do not hand-derive a label by reversing the code).

4. Python reference implementation:

pip install open-sport-taxonomy
python -c "from open_sport_taxonomy import Sport; print(Sport('cycling.road').label)"  # road cycling

The raw GitHub URL above is canonical (and pinnable). This site's own /schema.yaml and /mappings/ 302-redirect to it as a convenience, so either works — but pin the GitHub tag if you bake the data. Platform translation tables: mappings/.

1 · The canonical string

A sport is a single string. Dots (.) separate a sport from its disciplines in the hierarchy; plusses (+) attach modifiers (circumstances that don't change the movement). The string itself is the identity — not membership of any catalogue.

cycling.road+stationary+virtual

StringMeaning
cyclingcycling (any discipline)
cycling.roadroad cycling
cycling.road+raceroad cycling race
cycling.road+stationary+virtuale.g. a Zwift ride
xc_skiing.classic+rollerclassic roller skiing

Canonical form: a code followed by zero or more +modifiers, with modifiers sorted alphabetically and de-duplicated. Two strings denote the same sport iff their canonical forms are identical. Modifiers are independent and absence means unspecified, never "the opposite".

2 · Well-formed grammar

A well-formed string is a dotted code path followed by zero or more + modifiers. Each segment is lowercase letters with single internal underscores; dots live only inside the code, never after a +.

# segment = [a-z]+(?:_[a-z]+)*
WELL_FORMED = /^[a-z]+(?:_[a-z]+)*(?:\.[a-z]+(?:_[a-z]+)*)*(?:\+[a-z]+(?:_[a-z]+)*)*$/

Accepts xc_skiing.skate+roller, alpine_skiing, cycling+stationary+virtual, generic. Rejects +stationary (no code), cycling. (trailing dot), cycling..road (empty segment), Cycling (uppercase), cycling__road (doubled underscore), cycling+stationary.foo (dot after +). Sortedness and de-duplication are properties of the canonical form, not of this lexical grammar.

3 · Three validity tiers

Strictly nested: standard ⊆ known-atoms ⊆ well-formed.

TierHolds whenPython
well-formed matches the grammar above (modifiers may be unsorted) Sport.parse(s) succeeds
known-atoms well-formed and the code and every modifier are declared atoms (group-valid) Sport.uses_known_atoms
standard the exact canonical string is in the catalogue Sport.is_standard

Any well-formed string is valid and storable — a client may mint cycling.road+race even though it is not (yet) a catalogue entry. The catalogue is a recommended profile over the open string space, not a whitelist.

4 · Getting a label

A label is a human display name — presentation only, never identity. To resolve one, fetch schema.yaml and match the canonical string against the sports: list:

sports:
  - sport: cycling
    label: cycling
  - sport: cycling+stationary
    label: indoor cycling
  - sport: cycling.road
    label: road cycling

modifiers:
  - code: stationary
    label: stationary
    group: environment

Labels are looked up, never derived by string manipulation. Reversing or re-spacing the code is wrong: cycling.mountain is "mountain biking", not "mountain cycling", and xc_skiing is "XC skiing". The exact algorithm:

def label(s):                       # s is a canonical sport string
    if s in sports:                 # 1. exact catalogue match wins
        return sports[s].label

    code, mods = split(s)           # e.g. "cycling.road", ["race"]

    if code in sports:              # 2. look up the BARE code's label
        base = sports[code].label   #    (a catalogue lookup, not a transform)
    else:                           # 3. only an UNCATALOGUED code is derived
        base = code.replace(".", " ").replace("_", " ")

    if not mods:
        return base
    return base + " (" + ", ".join(modifiers[m].label for m in mods) + ")"

Worked examples (resolved against the catalogue above):

StringPathLabel
cycling+stationaryexact matchindoor cycling
cycling.mountainexact matchmountain biking
cycling.road+racebare code cycling.road + modifierroad cycling (race)
parkour (uncatalogued code)derive from stringparkour

Note cycling.road+race is not itself a catalogue entry, so step 1 misses; its bare code cycling.road is, so step 2 supplies "road cycling". Hand-derivation (step 3) only ever applies when the code is unknown to the catalogue.

5 · Translating to / from a platform

Each file in mappings/ is a bidirectional table between a platform's identifiers and OST strings (strava.yaml, garmin_fit.yaml, apple_healthkit.yaml, polar.yaml, suunto.yaml, wahoo.yaml, garmin_training_api.yaml). Translation is lossy by design: decode yields the most-specific OST string the platform actually represents; encode may coarsen. Unmatched values fall back via each file's fallback block. The normative encode/decode rules are in docs/translation.md.

6 · Reference implementation (Python)

pip install open-sport-taxonomy
from open_sport_taxonomy import Sport
from open_sport_taxonomy.platforms import garmin_fit

s = Sport("cycling+stationary")
s.label            # "indoor cycling"
s.is_standard      # True
s.resolve()        # nearest standard sport

garmin_fit.decode(2, 0)                   # Sport('cycling.road')
garmin_fit.encode(Sport("cycling.road"))  # GarminFitCode(sport=2, sub_sport=0)

Full API — storage, matching, sub-sport containment, Pydantic integration — is in python/README.md. The package exposes open_sport_taxonomy.version (package release) and open_sport_taxonomy.taxonomy_version (the spec version it implements).

7 · Pinning a version

The spec is versioned in schema.yaml (version:) and tagged spec/vX.Y.Z. Sport codes are stable — once published, never removed, only deprecated. Pin a snapshot via the git tag:

# Latest
https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/main/schema.yaml

# Pinned to a spec version
https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/spec/v0.10.0/schema.yaml

Canonical sources

A sport is missing? Open an issue at github.com/sweatstack/open-sport-taxonomy/issues.