Case study
An ML engineer is developing a fraud detection model on AWS. The training dataset
includes transaction logs, customer profiles, and tables from an on-premises MySQL
database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm.
Additionally, many of the features have interdependencies. The algorithm is not capturing
all the desired underlying patterns in the data.
The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.
Which algorithm should the ML engineer use to meet this requirement?
A. LightGBM
B. Linear learner
C. -means clustering
D. Neural Topic Model (NTM)
Explanation:
Why Linear Learner?
SageMaker'sLinear Learneralgorithm is well-suited for binary classification
problems such as fraud detection. It handles class imbalance effectively by
incorporating built-in options forweight balancingacross classes.
Linear Learner can capture patterns in the data while being computationally
efficient.
Key Features of Linear Learner:
Automatically weights minority and majority classes.
Supports both classification and regression tasks.
Handles interdependencies among features effectively through gradient
optimization.
Steps to Implement:
Use the SageMaker Python SDK to set up a training job with the Linear Learner
algorithm.
Configure the hyperparameters to enable balanced class weights.
Train the model with the balanced dataset created using SageMaker Data
Wrangler.
A company is building a deep learning model on Amazon SageMaker. The company uses
a large amount of data as the training dataset. The company needs to optimize the model's
hyperparameters to minimize the loss function on the validation dataset.
Which hyperparameter tuning strategy will accomplish this goal with the LEAST
computation time?
A. Hyperbaric!
B. Grid search
C. Bayesian optimization
D. Random search
Explanation: Hyperband is a hyperparameter tuning strategy designed to minimize computation time by adaptively allocating resources to promising configurations and terminating underperforming ones early. It efficiently balances exploration and exploitation, making it ideal for large datasets and deep learning models where training can be computationally expensive.
A company is planning to create several ML prediction models. The training data is stored
in Amazon S3. The entire dataset is more than 5 in size and consists of CSV, JSON,
Apache Parquet, and simple text files.
The data must be processed in several consecutive steps. The steps include complex
manipulations that can take hours to finish running. Some of the processing involves
natural language processing (NLP) transformations. The entire process must be
automated.
Which solution will meet these requirements?
A. Process data at each step by using Amazon SageMaker Data Wrangler. Automate the process by using Data Wrangler jobs.
B. Use Amazon SageMaker notebooks for each data processing step. Automate the process by using Amazon EventBridge.
C. Process data at each step by using AWS Lambda functions. Automate the process by using AWS Step Functions and Amazon EventBridge.
D. Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge.
Explanation:
Amazon SageMaker Pipelines is designed for creating, automating, and managing end-to-end ML workflows, including complex data preprocessing tasks. It supports handling large
datasets and can integrate with custom steps, such as NLP transformations. By combining
SageMaker Pipelines with Amazon EventBridge, the entire workflow can be triggered and
automated efficiently, meeting the requirements for scalability, automation, and processing
complexity.
A company has a team of data scientists who use Amazon SageMaker notebook instances
to test ML models. When the data scientists need new permissions, the company attaches
the permissions to each individual role that was created during the creation of the
SageMaker notebook instance.
The company needs to centralize management of the team's permissions.
Which solution will meet this requirement?
A. Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.
B. Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.
C. Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.
D. Create a single IAM group. Add the data scientists to the group. Create an IAM role.
Attach the AdministratorAccess AWS managed IAM policy to the role. Associate the role
with the group. Associate the group with each notebook instance that the team uses.
Explanation:
Managing permissions for multiple Amazon SageMaker notebook instances can become
complex when handled individually. To centralize and streamline permission management,
AWS recommends creating a single IAM role with the necessary permissions and attaching
this role to each notebook instance used by the data science team.
Steps to Implement the Solution:
Create a Single IAM Role with Necessary Permissions:
Attach the IAM Role to Each Notebook Instance:
Benefits of This Approach:
Centralized Permission Management:By using a single IAM role, you simplify the
process of updating permissions. Changes to the role's policies automatically
propagate to all associated notebook instances, ensuring consistent access
control.
Adherence to Best Practices:AWS recommends using IAM roles to manage
permissions for applications running on services like SageMaker. This approach
avoids the need to manage individual user permissions separately.(IAM Best
Practices for SageMaker)
Alternative Options and Their Drawbacks:
Option B:Creating a single IAM group and adding data scientists to it does not
directly associate the group with notebook instances. IAM groups are used to
manage user permissions, not to assign roles to AWS resources like notebook
instances.
Option C:Using a single IAM user with the AdministratorAccess policy is not
recommended due to security risks associated with granting broad permissions
and the challenges in managing shared user credentials.
Option D:Associating an IAM group with a role and then with notebook instances is
not a valid approach, as IAM groups cannot be directly associated with AWS
resources.
Conclusion:Option A is the most effective solution to centralize and manage permissions
for SageMaker notebook instances, aligning with AWS best practices for IAM role
management.
A company is planning to use Amazon Redshift ML in its primary AWS account. The
source data is in an Amazon S3 bucket in a secondary account.
An ML engineer needs to set up an ML pipeline in the primary account to access the S3
bucket in the secondary account. The solution must not require public IPv4 addresses.
Which solution will meet these requirements?
A. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.
B. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.
C. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.
D. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.
Explanation:
S3 Gateway Endpoint: Allows private access to S3 from within a VPC without requiring a
public IPv4 address, ensuring that data transfer between the primary and secondary
accounts is secure and private.
Bucket Policy Update: The S3 bucket policy in the secondary account must explicitly allow
access from the primary account's IAM principals to provide the necessary permissions.
Interface VPC Endpoints: Required for private communication between the VPC and
Amazon SageMaker and Amazon Redshift services, ensuring the solution operates without
public internet access.
This configuration meets the requirement to avoid public IPv4 addresses and allows secure
and private communication between the accounts.
Page 2 out of 14 Pages |
Previous |