Sort of. Those transformations happen on the fly, but if you want to keep a file version on disk of the tensors after transformation, you can either modify datasetload.py to add a function saving the tensor within this block:

    if msg_type == 'RequestTrainingSamples' or msg_type == 'RequestValidationSamples':
        if meta_dataset is not None:
            meta_dataset.train(msg_type == 'RequestTrainingSamples')
            samples_id = tc.decode_ints(msg_data)
            for id in samples_id:
                tc.send_msg(app_socket, 'TensorData', tc.encode_torch_tensors(meta_dataset[id]))

That's the block that sends all the training and validation samples to the model training script.
Or you can modify modeltrain.py and retrieve all the transformed tensors here:

    def add_sample(self, data=None):
        if data:
            if self.cache:
                self.cache.write(len(data).to_bytes(4, 'little'))
                self.cache.write(data)
            self.index.append(tc.decode_torch_tensors(data))
        else:
            if self.cache:
                self.cache.close()
                try:
                    os.rename(self.filename+'.tmp', self.filename)
                except:
                    pass

If you don't want to hack those python file, checking "Cache" next to the "Train" button in Model tabs. This will create a big file containing all transformed tensors on the training server (locally or remotelly depending on where your training happens). Those tensors are packed continuously in this file using fastnumpyio.pack/fastnumpyio.unpack:

endianness: 1 byte, value can be '<', '>', '|'
type: 1 byte, value can be 'b', 'i', 'u', 'f', 'c'
type size: 1 byte, value can be 1, 2, 4, 8, 16
number of dimensions: 1 byte, value can be 0-255
shape: 4 bytes per dimension
raw samples data

Torchvision Transformers #56

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions