Monitoring Performance Metrics and Errors in Azure's Notebook Environment | Best Method

Monitor Performance and Errors in Azure's Notebook Environment

Question

You are developing your training scenarios in Azure's notebook environment.

You want to see the progress of your training runs by monitoring performance metrics and possible errors in detail.

Which is the best method should you use?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: A.

Option A is CORRECT because in Azure's notebook environment, the most comfortable way of monitoring the progress of a run is using the Jupiter RunDetails widget.

The widget is running asynchronously and updates its output regularly (10-15 seconds) until the run completes.

Option B is incorrect because the get_metrics() method of the Run object is used to retrieve the metrics logged on a run.

It gives results after the run has completed.

Option C is incorrect because get_status() returns the latest status of the run, like “Running”, “Failed” etc.

It doesn't provide detailed in-progress information.

Option D is incorrect because this results in displaying the run's result only after its completion.

It is not suitable for watching the progress.

Diagram - Notebook widget example output:

1 from azureml.train.widgets import RunDetails

2 RunDetails(run).show()

Run Properties

Status Running
Start Time 9/15/2018 7:15:37 PM
Duration 0:00:20
Run Id train-on-
local_1537053337_839
d0780
Arguments N/A
alpha
0.3
0.2
0.1
0
0 2 4 6

Click here to see the run in Azure portal

Output Logs
Uploading experiment status to history service

Adding run profile attachment azuremI-logs/80_driver_log.txt

alpha is 0.00, and mse is 3424.32
alpha is 0.05, and mse is 3408.92
alpha is 0.10, and mse is 3372.65
alpha is 0.15, and mse is 3345.15
alpha is 0.20, and mse is 3325.29
alpha is 0.25, and mse is 3311.56
alpha is 0.30, and mse is 3302.67

mse

3400

3350

3300

Edit Metadata

Reference:

When developing and training machine learning models in Azure's notebook environment, it's important to monitor the progress of your training runs to ensure that they are running smoothly and to identify any errors or issues that may arise. To do this, there are several methods available in Azure that you can use to monitor your training runs, such as the RunDetails class, run.get_metrics(), get_status(), and run.wait_for_completion(show_output=True).

A. Use the RunDetails class from the Jupiter widgets

The RunDetails class is a Jupiter widget that provides a detailed view of the progress of your training runs. It allows you to view performance metrics such as accuracy, loss, and validation error, and to see the output of your training script. You can use the RunDetails class to monitor the progress of your training runs in real-time, and to identify any issues or errors that may arise.

B. Include the run.get_metrics() in your script

The run.get_metrics() method is used to retrieve the metrics that are logged during your training runs. You can include this method in your training script to log the performance metrics of your model, such as accuracy and loss. By doing so, you can monitor the progress of your training runs and identify any issues or errors that may arise.

C. Use the get_status() of the run

The get_status() method is used to retrieve the status of your training runs. It allows you to see whether your training run is running, completed successfully, or failed. You can use this method to monitor the status of your training runs and to identify any issues or errors that may arise.

D. Use the run.wait_for_completion(show_output=True) in your script

The run.wait_for_completion(show_output=True) method is used to wait for the completion of your training runs. It allows you to monitor the progress of your training runs in real-time and to identify any issues or errors that may arise. By setting show_output=True, you can also view the output of your training script in real-time.

Overall, the best method to use for monitoring the progress of your training runs in Azure's notebook environment will depend on your specific needs and preferences. However, using a combination of the methods mentioned above, such as the RunDetails class and run.get_metrics(), can help you to monitor your training runs more effectively and to identify any issues or errors that may arise.