Organizing Files for Best Performance and Accessibility in Azure ML Workspace | DP-100 Exam Answer

Best Practices for Storing Files in Azure ML Workspace

Question

You have just set up your ML workspace with several computes.

For your machine learning experiments, you want to use Python SDK and the Jupyter notebook environment.

You'll have a number of files (scripts, notebooks, data, temporary files etc.) which need to be organized and stored to ensure the best performance and to be accessible for computes.

How should you organize storing your files?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: C.

Option A is incorrect because each workspace has a default storage account attached to it on creation.

This is the place where all the scripts and notebooks are stored by default, and it is shared among all computes in the workspace.

There is no need to create one manually.

Data should be stored in ML datastores.

Option B is incorrect because while you can store scripts and related files on the local disks of the computes, they won't be accessible for other computes.

Option C is CORRECT because each ML workspace has a default storage account attached to it on creation.

The file share in this account is mounted to each compute within the workspace.

Files stored here can easily be shared among computes.

For data, it is not recommended to use ML datastores which are specifically designed to store large data files consumed by ML experiments.

Option D is incorrect because for large data files consumed by training experiments, the recommended practice is storing them in ML datastores.

Avoid storing them on the local disks of computes.

Reference:

Sure, I'd be happy to provide a detailed explanation of the different options for organizing and storing files in an Azure Machine Learning workspace.

Option A: Create a storage account in your workspace and store both the scripts and data in there.

In this option, you would create a new storage account in your Azure Machine Learning workspace and store both your scripts and data in that account. This approach provides a centralized location for storing all of your files and makes it easy to access them from any of the computes in your workspace. However, there are a few potential downsides to this approach. First, storage accounts can be relatively expensive, especially if you are storing large amounts of data. Additionally, accessing files stored in a storage account can sometimes be slower than accessing files stored locally on a compute instance.

Option B: Store scripts and notebooks on local disks of compute instances; store data in datastores.

In this option, you would store your scripts and notebooks on the local disks of your compute instances, while storing your data in datastores. This approach can be more cost-effective than using a storage account, as you only need to pay for the storage used by your data, rather than paying for a separate storage account. Additionally, storing data in datastores can make it easier to manage and share data between different computes in your workspace. However, one potential drawback to this approach is that if you need to access files from a different compute instance, you may need to transfer them over the network, which can be slow if you are working with large files.

Option C: Store scripts and notebooks in the default storage account of your workspace; store your data in datastores.

This option is similar to option A, but instead of creating a new storage account, you would use the default storage account that is created when you set up your Azure Machine Learning workspace. This can be a good option if you don't want to incur the additional cost of creating a new storage account, but still want to keep your scripts and data separate. One potential disadvantage to this approach is that, like option A, accessing files stored in a storage account can sometimes be slower than accessing files stored locally on a compute instance.

Option D: Store scripts and notebooks in the default storage account of your workspace; store your large data files in the computes local folder.

In this option, you would store your scripts and notebooks in the default storage account of your workspace, but store your large data files in the local folder of the compute instance. This approach can be a good option if you have large data files that you need to work with frequently, as storing them locally can help improve performance. However, this approach can also be more difficult to manage than the other options, as you will need to make sure that all of your computes have access to the same data files.

In summary, there are several different options for organizing and storing files in an Azure Machine Learning workspace, and the best approach will depend on your specific needs and requirements. Option B and C are generally the most common and recommended approach.