-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AzurePublicDatasetV2 - Workload categories and VM roles #12
Comments
hello I am using the dataset for research purposes too and i didn't find a complet description of this dataset if you did find something please contact me i will be gratful for you . |
Hi, Since no one commented on this and offered some description from the data provider, it makes sense to do some EDA over time and plot the workload. However one could ask domain experts in Cloud data about this. Please see the following example are categorized as
# Filter the VM_CPU_dataframe to find the category of the given vmid
vmid_to_check = '/KlNtIK2BCkiGhURgesiA/MQNyTpAgt7daSRsu2kJldWyCBGwnZCbtXR3w+vR4kq'
filtered_df = df[df['vmid'] == vmid_to_check]
# Display the filtered dataframe
#display(filtered_df)
print(filtered_df.to_markdown(tablefmt="grid"))
+--------+-------------------+------------------------------------------------------------------+
| | vmcategory | vmid |
+========+===================+==================================================================+
| 267924 | Delay-insensitive | /KlNtIK2BCkiGhURgesiA/MQNyTpAgt7daSRsu2kJldWyCBGwnZCbtXR3w+vR4kq |
+--------+-------------------+------------------------------------------------------------------+ # Filter the VM_CPU_dataframe to find the category of the given vmid
vmid_to_check = '//20EFdlSE3atYr9P03/9X4nF16d9RXI+JKVFfvpC281ohXWjFoS9L+ldKyb3ple'
filtered_df = df[df['vmid'] == vmid_to_check]
# Display the filtered dataframe
#display(filtered_df)
print(filtered_df.to_markdown(tablefmt="grid"))
+---------+-------------------+------------------------------------------------------------------+
| | vmcategory | vmid |
+=========+===================+==================================================================+
| 1011821 | Delay-insensitive | //20EFdlSE3atYr9P03/9X4nF16d9RXI+JKVFfvpC281ohXWjFoS9L+ldKyb3ple |
+---------+-------------------+------------------------------------------------------------------+ without domain knowledge and comment of data providers who collect data, it is difficult to reason. Here in the left picture, I see some delay especially is clear on the
|
Hello everyone,
I am using the dataset for research purposes, and I have some questions related to the workload. In
vmtable.csv
some VMs are labeled as "interactive", other as "delay-insensitive", and most of them as "unknown." I would like to know how this classification has been performed, and what do they mean. E.g., is it safe to think that in the "interactive" workload include web-services?Related to that, what does a deployment represent? Is this an application? Does it follow the definition of deployments for container strategies?
Thank you very much in advance.
The text was updated successfully, but these errors were encountered: