Working with WorkData
WorkData is managed by the cloud platform to ensure the data is safe and passed to and from ProcessingSteps correctly. If any data is needed by a ProcessingStep, and is not in PineXQ already, you need to upload it.
Uploading WorkData
Section titled “Uploading WorkData”To upload a file follow these steps:
- Create a WorkData object using a configured client.
- Execute
create()to upload your data. - (optional) Edit metadata, such as tags, or allow later deletion.
from pinexq.client.job_management import WorkDatafrom pinexq.client.core import MediaTypes
with open("TestData/test.npy", "rb") as testfile: workdata = (WorkData(client=self.client) .create(filename="test.npy", file=testfile, mediatype=MediaTypes.OCTET_STREAM) .set_tags(["TestData"]) .allow_deletion())Note that in addition to files, Workdata.create() can also accept JSON-able objects (json=obj) and binary streams (binary=stream), though supplying more than one will throw an error.
Alternatively, it is possible to upload WorkData via the portal. You can then use the WorkData URL to configure a step.
As a reminder, media types must match those expected by the processing step to ensure compatibility.
The default media type MediaTypes.OCTET_STREAM = "application/octet-stream" is intended for binary files. For more common media types, see: MDN - Common media types
Using input WorkData
Section titled “Using input WorkData”Once we have uploaded WorkData, we can give it to the job.
# workdata created previously
# partially-configured job# index denotes the n-th dataslot as specified by the configured ProcessingStepjob.assign_input_dataslot(index=0, work_data_instance=workdata)
# alternative using urlsjob.assign_input_dataslot(index=0, work_data_url="<url>")DataSlot collections, passing multiple WorkData to a single slot, are handled in a similar way:
# workdata created previously
# partially-configured job# index denotes the n-th dataslot as specified by the configured ProcessingStepjob.assign_collection_input_dataslot(index=0, work_data_instances=[workdata])
# alternative using urlsjob.assign_input_dataslot(index=0, work_data_urls=["<url>"])It is also possible to provide input data when configuring using the single call option:
from pinexq.client.job_management import InputDataSlotParameterFlexible, Job
input_data_slot = InputDataSlotParameterFlexible(index=0, work_data_instances=[workdata])processing_step = Processing_Step.from_name( client=client, function_name="my_function", function_version="1.0.0")job = Job(client).create_and_configure_rapidly( name="my_job", processing_step_instance=processing_step, # or processing_step_url="..." tags=["test", "my_function"], input_data_slots=[input_data_slot], start=True)result = job.get_result()job.delete()Downloading output WorkData
Section titled “Downloading output WorkData”As a job executes, output files are created automatically; while configuring a job, you may choose to make these files (more easily) deletable. Result WorkData is handled in a different way: since it is not clear which files (or, in case of collection DataSlots, how many files) are written, the files are pre-allocated when the job is scheduled. For normal collection DataSlots, the maximum number of files specified by the ProcessingStep is allocated. The files will be given a generic name, which can be changed later.
After a job is complete, we can look up its output dataslots, and get a list of WorkData. Each WorkData includes metadata about the name of the corresponding output DataSlot, and additional information.
In the following snippet, we wait for a job to complete, then download the output DataSlot “my_output” and the result DataSlot to respective local files:
from pinexq.client.job_management import WorkData
# previously started jobjob.wait_for_completion(timeout_s=600.0)output_data_slots = job.get_output_data_slots()output_workdata = {slot.title : Workdata.from_hco(slot.assigned_workdatas[0]) for slot in output_data_slots}
with open("output.npy", "rb") as file: file.write(output_workdata["my_output"].download())
# result dataslot is an output dataslot with this namewith open("result.npy", "rb") as file: file.write(output_workdata["__returns__"].download())As usual, WorkData will be placed in your WorkData tab on the portal, and can be searched and downloaded from there as well.
Uploading large files
Section titled “Uploading large files”HTTP requests for large files might time out on the client side. This is a known limitation at the moment. To avoid this the client should have a generous timeout for uploads to wait for a response of the server.
The used HTTP client, the httpx library, has a default 5 second timeout, which is often not sufficient for large file uploads over a network. The overall timeout of the httpx client can be adjusted like this:
client.timeout = 60.0 # in seconds; default is 5s