Amazon Transcribe: Cost-effective Streaming Audio-to-Text Conversion for Conversational User Interfaces

Leveraging Amazon Transcribe Features for Cost-effective Streaming Audio-to-Text Conversion

Question

You are a machine learning specialist working for the digital banking division of a global banking firm.

Your bank is in the process of introducing a conversational user interface for its digital banking service.

The service will receive streaming audio from the conversational user interface and converse with the user in real-time.

Your machine learning team lead has decided to use the Amazon Transcribe service to convert the streaming audio to streaming text. To handle issues in the network connection when users are on mobile phones, how can you leverage the features of Amazon Transcribe to keep your solution as cost-effective as possible?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Answer: B.

Option A is incorrect.

The streaming client available with Transcribe is an HTTP/2 streaming client.

Option B is CORRECT.

You can use the Transcribe HTTP/2 streaming client to handle retrying the connection when there are intermittent problems on the network.

Option C is incorrect.

The Transcribe WebSocket protocol does not provide retry logic to handle retrying the connection when there are intermittent problems on the network.

With this option, you would have to code the retry logic yourself.

Option D is incorrect.

The streaming client available with Transcribe is an HTTP/2 streaming client.

Reference:

Please see the Amazon Transcribe developer guide titled Streaming Transcription.

Please see the Amazon Transcribe developer guide titled HTTP/2 Streaming Retry Client,

Please see the Amazon Transcribe developer guide titled Using Amazon Transcribe Streaming with WebSockets.

The correct answer is C. Use the Transcribe WebSocket protocol.

When using the Amazon Transcribe service to convert streaming audio to streaming text, the service provides multiple options to send the audio data to Transcribe. The options are:

A. Transcribe JSON streaming client: This is a low-level API that allows the client to send data in real-time to the Transcribe service. The JSON streaming client can be used when the audio data is in a file format, and the client wants to send the data to the Transcribe service in real-time.

B. Transcribe HTTP/2 streaming client: This is a low-level API that allows the client to send data in real-time to the Transcribe service. The HTTP/2 streaming client can be used when the audio data is in a file format, and the client wants to send the data to the Transcribe service in real-time.

C. Transcribe WebSocket protocol: This is a high-level API that allows the client to send data in real-time to the Transcribe service. The WebSocket protocol can be used when the client wants to stream audio data to the Transcribe service in real-time, and the client wants to handle issues in the network connection, such as loss of connectivity when users are on mobile phones.

D. Transcribe HTTP streaming client: This is a low-level API that allows the client to send data in real-time to the Transcribe service. The HTTP streaming client can be used when the audio data is in a file format, and the client wants to send the data to the Transcribe service in real-time.

Option C, using the Transcribe WebSocket protocol, is the best option for this scenario as it is a high-level API that provides real-time streaming and can handle network connectivity issues. When users are on mobile phones, they may experience loss of connectivity or unstable connections, which can cause issues in streaming audio to the Transcribe service. The WebSocket protocol can help manage these issues by providing a robust connection that can adapt to changing network conditions.

Using the WebSocket protocol can help keep the solution cost-effective because it provides real-time streaming without the need for additional infrastructure, such as load balancers, to manage connections. This can help reduce the cost of maintaining the solution while still providing a reliable and scalable service for streaming audio to the Transcribe service.

Therefore, option C is the best option for this scenario.