Google Life Sciences
This guide assumes you have an existing Google Cloud account. Sign up for a free account here. Seqera Platform provides integration to Google Cloud via the Cloud Life Sciences API.
This guide is split into two parts:
- How to configure your Google Cloud account to use the Cloud Life Sciences API.
- How to create a Google Life Sciences compute environment in Seqera.
Configure Google Cloud
Create a project
Navigate to the Google Project Selector page and either select an existing project or select Create project.
Enter a name for your new project, e.g., tower-nf.
If you are part of an organization, the location will default to your organization.
Enable billing
See here to enable billing in your Google Cloud account.
Enable APIs
See here to enable the following APIs for your project:
- Cloud Life Sciences API
- Compute Engine API
- Cloud Storage API
Select your project from the dropdown menu and select Enable.
Alternatively, select your project in the navigation bar and enable each API manually from these pages:
IAM
Seqera requires a service account with appropriate permissions to interact with your Google Cloud resources.
Create a service account
- In the navigation menu, select IAM & Admin > Service Accounts.
- Select the email address of the Compute Engine default service account.
- Select Keys > Add key > Create new key.
- Select JSON as the key type.
- Select Create.
A JSON file will be downloaded to your computer. This file contains the credentials needed to configure the compute environment in Seqera.
You can manage your key from the Service Accounts page.
Cloud Storage bucket
Google Cloud Storage is a type of object storage. To access files and store the results for your pipelines, create a Cloud bucket that your Seqera service account can access.
Create a Cloud Storage bucket
- In the hamburger menu (≡), select Cloud Storage > Create bucket.
- Enter a name for your bucket. You will reference this name when creating the compute environment in Seqera.
Do not use underscores (_
) in your bucket name. Use hyphens (-
) instead.
- Select Region for the Location type and select the Location for your bucket. You will reference this location when creating the compute environment in Seqera.
- Select Standard for the default storage class.
- Select Uniform for the Access control.
The Cloud Life Sciences API is available in a limited number of locations. These locations are only used to store metadata about the pipeline operations. The storage bucket and compute resources can be in any region.
-
Select Create.
-
Once the bucket is created, you will be redirected to the Bucket details page.
-
Select Permissions, then + Add.
-
Copy the email address of the Compute Engine default service account into New principals.
-
Select the following roles:
- Storage Admin
- Storage Legacy Bucket Owner
- Storage Legacy Object Owner
- Storage Object Creator
Seqera compute environment
Your Seqera compute environment uses resources that you may be charged for in your Google Cloud account. See Cloud costs for guidelines to manage cloud resources effectively and prevent unexpected costs.
After your Google Cloud resources have been created, create a new Seqera compute environment.
Create a Seqera Google Cloud Life Sciences compute environment
- In a workspace, select Compute Environments > New Environment.
- Enter a descriptive name for this environment, e.g., Google Life Sciences (europe-west2).
- Select Google Life Sciences as the target platform.
- From the Credentials drop-down, select existing Google Cloud credentials, or add new credentials by selecting the + button. If you choose to use existing credentials, skip to step 7.
You can create multiple credentials in your Seqera workspace. See Credentials.
-
Enter a name for the credentials, e.g., Google Cloud Credentials.
-
Enter the Service account key created previously.
-
Select the Region and Zones where you wish to execute pipelines. Leave the Location empty for the Cloud Life Sciences API to use the closest available location.
-
In the Pipeline work directory field, enter your storage bucket URL, e.g.,
gs://my-bucket
. This bucket must be accessible in the region selected in the previous step.When you specify a Cloud Storage bucket as your work directory, this bucket is used for the Nextflow cloud cache by default. You can specify an alternative cache location with the Nextflow config file field on the pipeline launch form.
-
You can enable Preemptible to use preemptible instances, which have significantly reduced cost compared to on-demand instances.
-
You can use a Filestore file system to automatically mount a Google Filestore volume in your pipelines.
-
Apply Resource labels to the cloud resources consumed by this compute environment. Workspace default resource labels are prefilled.
-
Expand Staging options to include:
- Optional pre- or post-run Bash scripts that execute before or after the Nextflow pipeline execution in your environment.
- Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Configuration settings in this field override the same values in the pipeline Nextflow config file.
-
Use the Environment variables option to specify custom environment variables for the Head job and/or Compute jobs.
-
Configure any advanced options you need:
- Enable Use Private Address to ensure that your Google Cloud VMs aren't accessible to the public internet.
- Use Boot disk size to control the boot disk size of VMs.
- Use Head Job CPUs and Head Job Memory to specify the CPUs and memory allocated for head jobs.
-
Select Create to finalize the compute environment setup.
See Launch pipelines to start executing workflows in your Google Cloud Life Sciences compute environment.