Skip to content

Commit

Permalink
Merge branch 'master' of github.com:keithfma/newmatic
Browse files Browse the repository at this point in the history
  • Loading branch information
keithfma committed Feb 24, 2020
2 parents c8ea54f + 799b9f5 commit 336cdb9
Show file tree
Hide file tree
Showing 5 changed files with 80 additions and 63 deletions.
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.0
0.2.0
26 changes: 12 additions & 14 deletions docs/newmatic_demo.html
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,8 @@

<span class="comment">% get the file size</span>
native_complete_file_size = dir(native_complete_file).bytes/1024/1024;
</pre><pre class="codeoutput">Native-complete, write: 3.334 s
Native-complete, read: 0.728 s
</pre><pre class="codeoutput">Native-complete, write: 3.264 s
Native-complete, read: 0.716 s
</pre><h2 id="3">Partial read/write with native MATLAB tools</h2><p>Now let's try using native MATLAB matfile() to do read and write the data one image at a time (i.e., partial IO). This is the real use case we are interested in.</p><pre class="codeinput"><span class="comment">% get a temporary file name</span>
native_partial_file = [tempname, <span class="string">'.mat'</span>];
native_partial_cleanup = onCleanup(@() delete(native_partial_file));
Expand Down Expand Up @@ -124,8 +124,8 @@

<span class="comment">% get the file size</span>
native_partial_file_size = dir(native_partial_file).bytes/1024/1024;
</pre><pre class="codeoutput">Native-partial, write: 52.339 s
Native-partial, read: 4.126 s
</pre><pre class="codeoutput">Native-partial, write: 52.359 s
Native-partial, read: 3.698 s
</pre><h2 id="4">Partial read/write with newmatic</h2><p>Now for the good stuff. Let's use <tt>newmatic</tt> to create our file, and then read and write the data one image at a time. We will choose a chunk size that neatly matches our planned access pattern (i.e, an image).</p><pre class="codeinput"><span class="comment">% get a temporary file name</span>
newmatic_partial_file = [tempname, <span class="string">'.mat'</span>];
newmatic_partial_cleanup = onCleanup(@() delete(newmatic_partial_file));
Expand Down Expand Up @@ -153,9 +153,8 @@

<span class="comment">% get the file size</span>
newmatic_partial_file_size = dir(newmatic_partial_file).bytes/1024/1024;
</pre><pre class="codeoutput">h5repack -i /tmp/tp03fd088e_f895_4664_81d7_9d2275dd7f72.mat -o /tmp/tpf6e2a598_4fd5_430c_8baf_c2a1aac9cb0e.mat -l images:CHUNK=1x1000x2000
Newmatic-partial, write: 2.888 s
Newmatic-partial, read: 0.158 s
</pre><pre class="codeoutput">Newmatic-partial, write: 2.987 s
Newmatic-partial, read: 0.153 s
</pre><h2 id="5">Complete read/write with newmatic</h2><p>To round out the comparison, let's read/write whole variables using newmatic</p><pre class="codeinput"><span class="comment">% get a temporary file name</span>
newmatic_complete_file = [tempname, <span class="string">'.mat'</span>];
newmatic_complete_cleanup = onCleanup(@() delete(newmatic_complete_file));
Expand All @@ -179,9 +178,8 @@

<span class="comment">% get the file size</span>
newmatic_complete_file_size = dir(newmatic_complete_file).bytes/1024/1024;
</pre><pre class="codeoutput">h5repack -i /tmp/tp1713a51c_f6d8_4f4c_8639_fca9af5e6594.mat -o /tmp/tp9943cc43_35e2_4cb4_aa30_a39d7a1c3386.mat -l images:CHUNK=1x1000x2000
Newmatic-complete, write: 3.151 s
Newmatic-complete, read: 0.709 s
</pre><pre class="codeoutput">Newmatic-complete, write: 3.183 s
Newmatic-complete, read: 0.715 s
</pre><h2 id="6">Comparison</h2><p>To make the comparison a bit easier, check out the tabulated results below:</p><pre class="codeinput">results = table(<span class="keyword">...</span>
round([native_complete_write_time; newmatic_complete_write_time; native_partial_write_time; newmatic_partial_write_time], 2), <span class="keyword">...</span>
round([native_complete_read_time; newmatic_complete_read_time; native_partial_read_time; newmatic_partial_read_time], 2), <span class="keyword">...</span>
Expand All @@ -193,10 +191,10 @@
</pre><pre class="codeoutput"> write-time-seconds read-time-seconds file-size-MB
__________________ _________________ ____________

native-complete 3.33 0.73 95.53
newmatic-complete 3.15 0.71 95.53
native-partial 52.34 4.13 116.33
newmatic-partial 2.89 0.16 95.82
native-complete 3.26 0.72 95.53
newmatic-complete 3.18 0.72 95.53
native-partial 52.36 3.7 116.33
newmatic-partial 2.99 0.15 95.4

</pre><p class="footer"><br><a href="https://www.mathworks.com/products/matlab/">Published with MATLAB&reg; R2019b</a><br></p></div><!--
##### SOURCE BEGIN #####
Expand Down
Binary file added logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added logo.xcf
Binary file not shown.
115 changes: 67 additions & 48 deletions newmatic.m
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
function mat = newmatic(path, varargin)
% function mat = newmatic(path, varargin)
function mat = newmatic(out_file, varargin)
% function mat = newmatic(out_file, varargin)
%
% Create new MAT-file with allocated arrays and specified chunking
%
% Arguments:
% path: path to the output file to create, will fail if file exists
% out_file: path to the output file to create, will fail if file exists
% varargin: one or more variable definition structs, as created by
% newmatic_variable(), see help for that function for more details
%
Expand All @@ -13,57 +13,80 @@
% %

% sanity checks
validateattributes(path, {'char'}, {'nonempty'});
assert(~isfile(path), 'newmatic:OverwriteError', 'Output file exists!');

mat = matfile(path, 'Writable', true);
validateattributes(out_file, {'char'}, {'nonempty'});
assert(~isfile(out_file), 'newmatic:OverwriteError', 'Output file exists!');

% filename for reference .mat, deleted on function exit
ref_file = [tempname, '.mat'];
ref_file_cleanup = onCleanup(@() delete(ref_file));

ref_mat = matfile(ref_file, 'Writable', true);
for ii = 1:length(varargin)
var = varargin{ii};
allocate(mat, var.name, var.type, var.size);

allocate(ref_mat, var.name, var.type, var.size);
end
delete(ref_mat);

h5repack(mat, varargin); % TODO: replace with native version when ready
% get file property lists from reference
ref_fid = H5F.open(ref_file, 'H5F_ACC_RDONLY', 'H5P_DEFAULT');
ref_fcpl = H5F.get_create_plist(ref_fid);

% create new file (fail if exists)
out_fcpl = H5P.copy(ref_fcpl);
out_fid = H5F.create(out_file, 'H5F_ACC_EXCL', out_fcpl, 'H5P_DEFAULT');

function h5repack(file_obj, vars)
% Apply specified chunking using external utility h5repack
%
% note: requires h5repack be installed on system
%
% Arguments:
% file_obj: matfile object
% vars: one or more variable definition structs, as created by
% newmatic_variable(), see help for that function for more details
% %
path = file_obj.Properties.Source;

chunk_args = {};
for ii = 1:length(vars)
var = vars{ii};
% copy over datasets (a.k.a., variables), applying chunking as needed
for ii = 1:length(varargin)
var = varargin{ii};

ref_ds_id = H5D.open(ref_fid, var.name);

out_ds_cpl = H5P.copy(H5D.get_create_plist(ref_ds_id));
if ~isempty(var.chunks)
chunks = var.chunks(end:-1:1); % variable dimensions are inverted in HDF file by MATLAB
chunk_args{end+1} = sprintf('-l %s:CHUNK=%s', var.name, strjoin(string(chunks), 'x')); %#ok!
H5P.set_chunk(out_ds_cpl, flip(var.chunks))
end

out_ds_id = H5D.create(...
out_fid, ...
var.name, ...
H5D.get_type(ref_ds_id), ...
H5D.get_space(ref_ds_id), ...
out_ds_cpl);

% note: assume that only this one attribute exists (cribbed from manual inspection of files
% created by matfile function)
ref_attr_id = H5A.open(ref_ds_id, 'MATLAB_class');

out_attr_id = H5A.create(...
out_ds_id, ...
'MATLAB_class', ...
H5A.get_type(ref_attr_id), ...
H5A.get_space(ref_attr_id), ...
'H5P_DEFAULT');
H5A.write(out_attr_id, 'H5ML_DEFAULT', H5A.read(ref_attr_id));

H5A.close(ref_attr_id);
H5A.close(out_attr_id);

H5D.close(ref_ds_id);
H5D.close(out_ds_id);

end

if ~isempty(chunk_args)
H5F.close(ref_fid);
H5F.close(out_fid);

temp_file = [tempname, '.mat'];

args = [{'h5repack', '-i', path, '-o', temp_file}, chunk_args];
cmd = strjoin(args, ' ');
fprintf('%s\n', cmd);

status = system(cmd);
assert(status == 0, 'Failed to update chunks with h5repack system utility');
% copy over the userblock
% read userblock from reference
% note: the userblock is a binary header prefixed to the file, and is opaque to the HDF5
% library. It is also essential for MATLAB to believe us that this is a valid MAT file.
ref_userblock = H5P.get_userblock(ref_fcpl);
ref_map = memmapfile(ref_file);
out_map = memmapfile(out_file, 'Writable', true);
out_map.Data(1:ref_userblock) = ref_map.Data(1:ref_userblock);

mat = matfile(out_file, 'Writable', true);

status = movefile(temp_file, path);
assert(status == 1, 'Failed to overwrite original file');
end


function allocate(file_obj, var_name, data_type, dimensions)
% function allocate(file_obj, var_name, data_type, dimensions)
Expand All @@ -76,6 +99,7 @@ function allocate(file_obj, var_name, data_type, dimensions)
% var_name: string, name of variable to allocate in matfile
% dimensions: 1D array, size of variable to allocate
% %
if isempty(dimensions); dimensions = [1, 1]; end

switch data_type

Expand All @@ -99,11 +123,6 @@ function allocate(file_obj, var_name, data_type, dimensions)
error('Bad value for data_type: %s', data_type);
end

if ~isempty(dimensions)
file_obj.(var_name) = empty(zeros(size(dimensions)));
dimensions = num2cell(dimensions);
file_obj.(var_name)(dimensions{:}) = last;
else
% handle unspecified size by creating an empty array of the correct type
file_obj.(var_name) = empty();
end
file_obj.(var_name) = empty(zeros(size(dimensions)));
dimensions = num2cell(dimensions);
file_obj.(var_name)(dimensions{:}) = last;

0 comments on commit 336cdb9

Please sign in to comment.