Testing a Processing Step

While it is possible to invoke a ProcessingStep from the command line, ProcessingSteps can also be called from within Python for local unit testing. This test interface is currently under development, and is subject to change.

General testing guide

To ensure robust ProcessingSteps and minimize failures in the PineXQ cloud, we recommend a three-tier testing strategy. This approach moves from fast, granular tests to full-scale integration.

Note: Good testing improves the developer experience by providing rapid feedback or test driven development. This enables faster development cycles, allowing teams to ship features and refactor with confidence.

Tier 1: Unit tests

Goal: Verify the core logic of your functions, readers, and writers in isolation.

Scope: Single functions or modules, small mocked data (but with multiple variations).

In this tier, you test the “business logic” inside your code without the full ProCon execution. If external systems are called consider mocking them for fast feedback, no external dependency and control on the responses.

What to test:

Transformation Logic: Test the pure Python functions that process data.
Readers & Writers: Verify that your code can correctly parse input formats and generate valid output formats using small, in-memory data snippets.
Edge Cases: Test how your functions handle empty inputs, malformed data, or boundary values.
Parametrization: Ensure parameters and input by users is valid and inside the expected ranges.

Passing DataSlots as parameters

If your step takes bare DataSlot objects, it may be necessary to construct them. For this you use the TestDataSlot class, a DataSlot subclass backed by temporary files. Import it with:

from pinexq.procon.dataslots.testing import TestDataSlot

Input slots

# Inline string/bytes data (creates temp file automatically)
ds = TestDataSlot(data='hello world')
ds = TestDataSlot(data=b'\x00\x01', media_type=MediaTypes.OCTETSTREAM)

# Multiple slots (collection)
ds = TestDataSlot(data=['chunk1', 'chunk2', 'chunk3'])

# Point at existing files on disk (no temp file created)
ds = TestDataSlot(files=['data.json', 'more.json'])n

Output slots

ds_out = TestDataSlot.output()                          # single slot
ds_out = TestDataSlot.output(n_slots=3)                 # collection

step.my_func(data='hi', out_blob=ds_out)
written = ds_out.read_result()  # str/bytes, or list for collections

Important caveat

TestDataSlot only applies to parameters explicitly annotated as DataSlot in the function signature (e.g. data_blob: DataSlot). For implicit dataslot parameters (e.g. text: str with @dataslot.input), the framework normally handles deserialization — when calling directly you pass the value itself:

# implicit in_text: str  → pass string directly
# explicit out_text: DataSlot → pass TestDataSlot
ds_out = TestDataSlot.output()
step.dataslot_string_in_out(in_text='hello', out_text=ds_out)

Tier 2: Local end-to-end test

Goal: Verify that your ProcessingStep works as a cohesive unit within the ProCon framework on your local machine.

Scope: The full step execution lifecycle, local file system, realistic (and potentially large) datasets.

This tier simulates the ProCon execution environment on your local machine. It ensures that your parameters and DataSlot configurations are correct before you ever upload to the cloud.

What to test:

Step Execution: Run the full ProcessingStep using the local test, integrating the reader and writers and function execution
Performance: Use realistic data sizes (Megabytes to Gigabytes) to catch memory issues or timeouts that unit tests miss.
Real data: Use a curated set of realistic Data files stored locally. These should represent real-world scenarios.

Local execution in a test

In order to test a ProcessingStep, we must create an instance of our ProcessingStep, and invoke a function by name. Function parameters must be passed in as a dict; DataSlots are passed in via create_dataslot_description(), which takes a dict from argument names to lists of filenames. Note that output DataSlots must also be specified in the same way as input DataSlots. Result DataSlots are a special case: while they do not need to be specified, you will likely wish to test that the results serialize correctly. The example below uses a pytest fixture (not shown) to generate a temporary output file for the test.

from cloud_step import CloudProcessingStep
from pinexq.procon.dataslots import create_dataslot_description
from pinexq.procon.dataslots.annotation import RETURN_SLOT_NAME #=="__returns__"
from pinexq.procon.step import ExecutionContext

def test_get_job_results_step(create_tmp_file):
    """
    Test retrieving jobs from Cloud Service.
    """
    worker = CloudProcessingStep(use_cli=False) # Suppress pop-up
    tmp_output_file = create_tmp_file()
    result = worker._call(
        ExecutionContext(
            function_name="get_task_results",
            input_dataslots=create_dataslot_description(
                {"license_file": ["test_license.txt"]}
            ),
            parameters={
                "backend_name": "cb1",
                "job_id": "e01200e5-c000-4003-9000-c0abb7000c52"
            },
            output_dataslots=create_dataslot_description(
                {RETURN_SLOT_NAME: [str(tmp_output_file)]} # write result to temp file
            )
        )
    )
    # "result" is function return value if return DataSlot is not specified. Otherwise...
    assert result is None
    result_json = tmp_output_file.read_text()
    assert result_json is not None

Some ProcessingSteps may require API keys in order to test properly, especially those accessing external resources. We strongly recommend you do not include these API keys in your source code; see the discussion here for alternatives.

Mocking execution context

In certain cases, it may be necessary to mock various methods your ProcessingStep calls, especially if those methods depend on being called by the JMA. For example, if your function acquires a Client from its step context, you will need to supply a Client during testing. An example is shown below, using pytest-mock to mock getting the Client, and creating the Client itself with a fixture.

from main import UploadStep
from pinexq.procon.step import ExecutionContext

def test_sync_step(client, mocker):
    """
    Test a step that uploads workdata.
    Mocks get_client() with mocker, supplying an API client via fixture.
    """
    step = UploadStep(use_cli=False)
    # Note: patch where function is looked up, not where it's defined.
    # See e.g. https://docs.python.org/3/library/unittest.mock.html#where-to-patch.
    mocker.patch("main.get_client", return_value=client)
    step._call(
        ExecutionContext(
            function_name="sync_files"
        )
    )

Tier 3: Testing and debugging on a remote JMA

Goal: Verify integration with the PineXQ cloud and interaction with other steps.

Scope: Live environment, local ProCon execution.

This is the final validation. Even if a step works locally, environment differences or integration issues with the JMA or other ProcessingSteps in a Workflow can occur. Also debugging a ProcessingStep which is already deployed is possible.

What to test:

Cloud Integration: Verify that the step can fetch WorkData from previous steps and pass results to subsequent steps in a workflow.
Environment Specifics: Check for issues related to the specific deployment like usable resources, timing issues and integartion with the JMA.

Remote debugging

In more complicated cases, it may be necessary to debug your worker while it processes a job on our platform. By authenticating with a remote-enabled API key, and setting environment variables and launch arguments in your IDE, it is possible to debug a worker in remote mode; you can then step through your code, line by line, and watch the stack as it executes.

Alternatively, you can use uv to import an .env file and run your worker in remote mode. On the portal, clicking the “Debug Step” button on Processing Steps will bring up a list of environment variables and a command. Simply copy the environment variables to a .env file, add in a remote-enabled API key, and then run the command shown; your worker will begin listening to JMA and responding to jobs. For more information, see Using ProCon from CLI: Remote.

Note that, in order to ensure that your worker receives a particular job, it may be necessary to scale down deployment of workers implementing the same function; alternatively, you may wish to register a new pre-release version (e.g. 1.0.1-dev1) for testing.