Skip to content

Custom media types

In PineXQ workflows, the platform validates that media types match when data flows between Jobs. By default, DataSlots use well-known media types like application/json from the MediaTypes enum. This works for simple workflows, but is too coarse when different JSON schemas represent fundamentally different data — two DataSlots both typed as application/json will pass validation even though they carry incompatible data structures.

Custom media types solve this by generating a distinct media type for each data schema. The platform can then enforce schema-level type safety: a ProcessingStep producing application/vnd.pinexq.myproject.sensor-reading.v1+json will only connect to a downstream step that expects exactly that type. This applies even to single Jobs which can only be configured DataSlots with matching media types.

The @media_type_def decorator lets you annotate your data classes with a name and version. The framework builds the full media type string from these parameters — no need to construct it manually. Use get_media_type(MyClass) to retrieve the string and pass it to DataSlot decorators.

Use the @media_type_def decorator to annotate a Pydantic model or dataclass with media type metadata:

from pydantic import BaseModel
from pinexq.procon.core.media_type import media_type_def, get_media_type
@media_type_def(name="sensor-reading", version=1, namespace="myproject")
class SensorReadingV1(BaseModel):
timestamp: str
value: float
unit: str
# application/vnd.pinexq.myproject.sensor-reading.v1+json

The decorator works with both Pydantic BaseModel subclasses and standard @dataclass classes.

The generated media type follows the format:

application/vnd.<vendor>.<namespace>.<name>.v<version>+<suffix>
ParameterRequiredDefaultDescription
nameYesLogical name for the data schema.
versionYesSchema version number (positive integer, >= 1).
namespaceYesNamespace to group schemas under (e.g., project, team, domain).
vendorNo"pinexq"Vendor prefix (first segment after vnd.).
suffixNo"json"Structured syntax suffix (json, xml, csv, etc.). Also determines whether DataSlots open files in text or binary mode.

Use get_media_type() to retrieve the precomputed media type string from an annotated class:

media_type = get_media_type(SensorReadingV1)
# "application/vnd.pinexq.myproject.sensor-reading.v1+json"

If the class has no @media_type_def annotation, a TypeError is raised.

Pass get_media_type() to the media_type parameter on DataSlot decorators. You also need to provide reader and writer functions for serialization:

from pinexq.procon.dataslots.default_reader_writer import DefaultReaderWriter
@dataslot.input('reading',
reader=lambda f: DefaultReaderWriter.pydantic_base_reader(f, SensorReadingV1),
media_type=get_media_type(SensorReadingV1))
@dataslot.returns(writer=DefaultReaderWriter.pydantic_base_writer,
media_type=get_media_type(SensorReadingV1))
def process_reading(self, reading: SensorReadingV1) -> SensorReadingV1:
...

DataSlots do not automatically serialize or deserialize Pydantic models or dataclasses — you must provide explicit reader and writer functions. For Pydantic models, the framework provides DefaultReaderWriter with built-in support. For dataclasses, you need to supply custom reader/writer functions (e.g., using dataclasses.asdict and json.dump).

UtilityPurpose
DefaultReaderWriter.pydantic_base_readerDeserializes JSON into a Pydantic model instance.
DefaultReaderWriter.pydantic_base_writerSerializes a Pydantic model instance to JSON.
DefaultReaderWriter.pydantic_list_base_readerReads a JSON array into a list of Pydantic model instances.
DefaultReaderWriter.pydantic_list_base_writerWrites a list of Pydantic model instances as a JSON array.

The reader callable only receives a file handle, so pydantic_base_reader needs a lambda to bind the target type. The writer can be passed directly.

When many classes share common parameters, use with_defaults() to pre-fill namespace, vendor, and/or suffix. Individual calls can still override any pre-filled value:

my_schema = media_type_def.with_defaults(namespace="myproject")
@my_schema(name="sensor-reading", version=1)
class SensorReadingV1(BaseModel): ...
# Override vendor and suffix for a specific class:
@my_schema(name="report", version=1, vendor="acme", suffix="xml")
class AcmeReportV1(BaseModel): ...
# "application/vnd.acme.myproject.report.v1+xml"

Each annotated class represents one specific version of a data schema. The platform enforces exact match on media type strings, so a ProcessingStep consuming v1 will not accidentally receive v2 data.

When to create a new version:

  • Any change to the schema’s fields (added, removed, renamed, or type-changed) requires a new version number.
  • Changes to field constraints (e.g., tighter validation) that could break existing consumers should bump the version.

Note: While Pydantic (and dataclasses with default values) can technically handle added optional fields without breaking deserialization, we recommend bumping the version for any schema change. Keeping the same version undermines the platform’s exact-match safety and makes contract changes invisible to other workflow participants.

How to version: Create a separate class for each version. The name parameter stays the same; only version changes:

@media_type_def(name="sensor-reading", version=1, namespace="myproject")
class SensorReadingV1(BaseModel):
timestamp: str
value: float
@media_type_def(name="sensor-reading", version=2, namespace="myproject")
class SensorReadingV2(BaseModel):
timestamp: str
value: float
unit: str
source: str

These produce distinct media types:

  • application/vnd.pinexq.myproject.sensor-reading.v1+json
  • application/vnd.pinexq.myproject.sensor-reading.v2+json

All string parameters are normalized to lowercase. The following rules are enforced at class definition time:

ParameterRulePatternValid ExamplesInvalid Examples
nameStarts with letter, lowercase alphanumeric + hyphens, no dots[a-z][a-z0-9\-]*sensor-reading, calibrationSensorReading, sensor.reading, 2sensors
namespaceSame as name[a-z][a-z0-9\-]*myproject, data-pipelinemy.project, My_Project
vendorSame as name[a-z][a-z0-9\-]*pinexq, acmeAcme Inc.
versionPositive integer (>= 1)1, 2, 420, -1, "1"
suffixStarts with letter, lowercase alphanumeric, no hyphens[a-z][a-z0-9]*json, xml, csvJSON, json-ld

Length limits (RFC 6838):

  • The media subtype (everything after application/) must not exceed 127 characters — a ValueError is raised if exceeded.
  • A UserWarning is emitted if it exceeds 64 characters (recommended limit). With the vnd.pinexq. prefix (~12 chars) and .v<N>+json suffix (~8-10 chars), roughly 44 characters remain for <namespace>.<name> within the recommended 64.

Naming tips:

  • Use hyphens to separate words within a segment: sensor-reading, not sensorreading.
  • Use short, descriptive names: they appear in manifests and platform UIs.
  • The name identifies the data schema, not the Python class. Multiple versions of the same schema share the same name — only version changes. The class name (e.g., SensorReadingV1, SensorReadingV2) is independent and can be chosen freely.