Securing PII Data in Amazon SageMaker with VPC Access

Best Practices for Protecting PII Data in SageMaker

Question

You are a machine learning specialist at a government agency that processes citizen applications (online, mail, and in-person) for government documents such as driver's licenses and passports.Your machine learning team is responsible for using machine learning technology to determine fraudulent activity in the document application processes.

You are preparing a subset of your agency data for model training.

In order to use your data in your SageMaker notebook, you have stored your data in S3

By definition, your data contains Personally Identifiable Information (PII)

In order to maintain the required level of security, your data must be accessible only from within your VPC and cannot traverse the public internet.

Which option best meets your security requirements?

Answers

Explanations

Click on the arrows to vote for the correct answer

A. B. C. D.

Correct Answer: D.

Option A is incorrect.

This option doesn't meet your requirements because it doesn't address access to the S3 bucket or the restriction of not traversing the internet.

Option B is incorrect.

This option also doesn't meet your requirements because it doesn't address the restriction of not traversing the internet.

Option C is incorrect.

We need to restrict access using a deny statement if our source (either the VPC endpoint or the VPC) is not equal to our VPC Endpoint or VPC.

So we need to use a deny statement in our policy, not an allow statement.

This option describes using an allow statement.

Option D is correct.

You can control which VPCs or VPC endpoints have access to your buckets by using S3 bucket policies with deny statements.

Also, when you use a VPC Endpoint, your traffic doesn't traverse the internet.

References:

Please see the Amazon Virtual Private Cloud AWS Privatelink documentation titled Endpoints for Amazon S3 (https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html#vpc-endpoints-s3-bucket-policies),

The Amazon Simple Storage Service user guide titled Controlling access from VPC endpoints with bucket policies (https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies-vpc-endpoint.html)

The correct option that best meets the security requirements for the given scenario is option A: Use a VPC endpoint and leverage a security group to restrict access to the VPC endpoint.

Explanation: The scenario involves storing Personally Identifiable Information (PII) data in S3, which needs to be accessible only from within the VPC and cannot traverse the public internet. In order to achieve this, we can use a VPC endpoint.

A VPC endpoint enables private communication between a VPC and S3, without traversing the public internet. It creates a secure and private connection between the VPC and S3, enabling instances in the VPC to access S3 using the internal AWS network. This eliminates the need for a public IP address or a NAT gateway.

To further restrict access to the VPC endpoint, we can use a security group. A security group acts as a virtual firewall that controls inbound and outbound traffic to the endpoint. By default, a security group allows no inbound traffic, so we can create a security group that allows traffic only from the resources within the VPC.

Option B, which suggests using a Network Access Control List (NACL), is not a good option for this scenario because NACLs work at the subnet level and cannot be used to restrict traffic to a specific resource, such as a VPC endpoint.

Option C suggests using a bucket access policy to allow access to the S3 bucket from the VPC endpoint. While this is a valid option, it does not meet the requirement of restricting access only from within the VPC, as it would allow access from any resource with the correct credentials, regardless of its location.

Option D suggests using a bucket access policy to deny access to the S3 bucket from resources other than the VPC endpoint and the VPC. While this would restrict access to the bucket, it would not prevent traffic from traversing the public internet. Additionally, using a security group to restrict access is a more effective and efficient method than using a bucket policy for access control.

In conclusion, option A, which suggests using a VPC endpoint and a security group to restrict access to the endpoint, is the best option for meeting the security requirements of the given scenario.