-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data size in GP / MB issue #5
Comments
Hi @jetschny
What do you think about it? |
ahja, now I undertstand, is there a chance that you add a short routine that checks for the largest allocated array and take this as dimension. in any case then need to call this parameter "largets allocated array (in grid points)". if possible at all, we could sum up the allocated variable sizes of the whole workspace, it is sort of similar to the allocated main memory but more specific to the script. I have tested further the "data size" and while running my script and export 82 MB data, the Measurer says: 0.0 MB can you have a look there? |
@cozzolinoac11, any update here? we would need some clarification to provide input for deliverable D3_3 |
Hi @jetschny I'm working on a procedure to obtain:
The procedure is already available and working in the measurer.py updated a few minutes ago on GitHub (please also have a look at the updated example.py). I'm currently doing further tests on the procedure which seems to work well. Regarding the test mentioned above, was the file already present in the folder? In this case, an overwriting occurs and the measurer returns 0 (this also happened to me during one of my tests 😄) |
Hi @cozzolinoac11 , should run within a cloned repo (only directory name needs to be modified). you could reproduce my exact situation. |
Hi @jetschny, I did a copy and paste of the file https://github.com/FAIRiCUBE/uc3-drosophola-genetics/blob/main/projects/gap_filling/src/data/load_csv_apply_GapFil_write_csv.py updating the paths and adding start and end of the measurer. In my test, the value of "Data size" is around 81 MB and the "Largest allocated array in grid points" is [50024, 334]. Probably the problem in your test is related to the paths used by the measurer (maybe the value of the parameter 'data_path' in the start/end methods is different from the one on which the file is saved). The folder https://github.com/FAIRiCUBE/common-code/tree/main/record-computational-demands-automatically/test/uc3_test contains:
cheers |
Hi @cozzolinoac11 , still, the same issue occurs. data size is reported to 0.0 MB. what can I test now to get to your "working environment" |
Hi, Thank you @cozzolinoac11 for this nice work. Best regards, -Bachir. |
Hi, @jetschny I have also tested the code provided in https://github.com/FAIRiCUBE/uc3-drosophola-genetics/blob/main/projects/gap_filling/src/data/load_csv_apply_GapFil_write_csv_test.py Best regards, -Bachir. |
Hi @BachirNILU and @cozzolinoac11 , I get different results for every run. sometimes negative numbers, sometimes closer to 0, never anything like around 80 MB... |
@cozzolinoac11 and @BachirNILU first I tought : the Measurer "listens" to i/o access to a specific path (not data_path ?). if I change the output destination of my csv output to the same directory where I write my measurer-statistics, then I see the 81 MB data size. if both files are written to different folders, then the Measurer does not report the correct data size... now I tested a bit more and it actually is a matter of whether the program outfile exists already or not. if I remove my output, Measurer works fine, if my program "overrides" the output with the same size and data, Measurer does not detect it! I see the same behavior for the General_Test.py. Once you run it more than once, it gives wrong results. |
Hi, @jetschny, yes, If I am not mistaken, I believed this is what @cozzolinoac11 had in mind for 'Data Size'. Best regards, -Bachir. |
Hi @BachirNILU cheers |
many thanks for the clarification. I see the point and I believe it adds value to our table. However, "I/O stream volume" (amount of data being written - and actually beeing read as well) is also important and it was my original thought when we "requested the table". We can now keep the data_size but have to explain that properly. in addition, can you, @cozzolinoac11, think of a metrics to determine size of output regardless of file existence? |
Hi @jetschny I can add two fields:
For these fields I can use the psutil.disk_io_counters function which returns system-wide disk I/O statistics. This also adds up the overwrites but, on the other hand, because it is a system-wide calculation, eventual writes/readings concurrently with the script are also added. Regarding 'Data size' we can also consider renaming it and making it more explanatory. What do you think about? |
@cozzolinoac11 that sounds very good! |
Hi @cozzolinoac11, Thanks again for your work. Best regards, -Bachir. |
Disk I/O or disk activity actually means something different for me. it is the amount of data read and written to disk, regardless of a files being overwritten. |
Hi @cozzolinoac11, Thank you again for your work. 1- Main memory available (GB) was reported as 61 GB, whereas 32 GB are available. Can you investigate this? Thanks in advance, -Bachir. |
@cozzolinoac11 : while I see the issue on data I/O and disk spaced added to disk can be resolved by a proper labeling of the value, what about the reporting by Bachir at EOX Hub, can this be worked on in the near future? |
Hi, I have performed several tests and in none of them did I encounter this issue. For the I/O values, the fields have been renamed while, regarding main memory, could it be that less memory (32 GB) is allocated to the development environment than the entire machine (61 GB)? |
@BachirNILU can you please have a look, if the issue has indeed resolved, we can close the issue here... |
Hi, @cozzolinoac11 thank you for checking this. PS: I took the liberty to correct a typo in measurer.py Thanks in advance. Best regards, -Bachir. |
Hi @BachirNILU PS: tanks for the correction in measurer.py |
@cozzolinoac11 I have added you to the UC4 EOX group... you can start logging into https://eoxhub.fairicube.eu/ |
Hi @jetschny @BachirNILU, I have just made changes to the measurer, using a different library for the CPU frequency. As a test on EOXHub, I trained a CNN on the CIFAR10 example dataset and the results returned by the measurer are all consistent. The files are in EOX_HUB_test folder. Thanks for your feedback |
Hi, I have recently used the measurer in the EOX Lab. I get incorrect results in the Data Size. I get
But two new files (7.5MB and 350 Bytes) are created. This was the first run of the script, so the files were not overwritten. |
I have been running my resources-montoring valdiation for the script:
https://github.com/FAIRiCUBE/uc3-drosophola-genetics/blob/main/projects/gap_filling/src/data/load_tsv_kmeans_elbow_sli.py
where I had a previous "manual" estimate.
While the monitoring itself works smooth (after solving the typical issue of installing libaries) I am missing some info:
0 Data size in grid points
8 Network traffic (MB) 0.23931884765625
The text was updated successfully, but these errors were encountered: