User Guide

What Is GenomeSpace?

GenomeSpace bridges the gaps between popular bioinformatics tools, making it possible for you to move data smoothly between these tools, leveraging the available analyses and visualizations in each of these tools.  GenomeSpace allows you to store your data files in the Amazon cloud, and provides necessary file format transformations whenever you select an analysis or visualization within one of the tools.  It also includes the ability to link your GenomeSpace login to your user accounts in the GenomeSpace tools.

GenomeSpace can be thought of as:

While GenomeSpace is the nexus that links all the GenomeSpace tools to data files stored in the cloud, via scripts that can transform the data files into formats needed by those tools, it also enables users to manipulate files and launch tools needed for the full analysis.

GenomeSpace has a web-based interface that provides functionality to:

Tools and Data Sources

GenomeSpace tools are applications that allow users to either perform analyses on or visualize data.  You can send files to these applications from GenomeSpace and save files from those applications to GenomeSpace.

Data sources are repositories of data files that are linked to GenomeSpace so that you can send files from those repositories to GenomeSpace (and then on to GenomeSpace tools).

Currently, GenomeSpace provides connections between the tools and data sources listed here.

Want to Just Start Working with GenomeSpace?

If you want to just get in there and start working with GenomeSpace, check out the Public/SharedData folder in your GenomeSpace Directories for some sample data files you can use with or without the Recipes.


GenomeSpace User Interface

You will see five primary regions on your screen when you look at GenomeSpace:

Menu Bar

File menu:

  • Upload file to selected folder
  • Create Subdirectory in selected folder
  • Preview provides a view of the selected text-based file
  • Extract rows and columns allows you to extract specified rows and columns from a tab- or comma-delimited file and save them as a separate file
  • Convert allows you to convert the selected file if appropriate converters exist
  • Download selected file
  • View Link URL displays the direct URL to the selected file or folder so that it can be viewed or copied
  • Sharing shows the sharing options for the selected file or folder
  • Rename the selected file or folder
  • Move the selected file or folder
  • Delete the selected file or folder

Launch menu: allows you to launch a tool or data source. If you have selected a file, you can launch a tool on that file.

View menu:

  • Refresh current directory refreshes the directory currently in your view.
  • Recent uploads opens the uploads window to display the upload status(es) of the file you recently queued for upload to GenomeSpace.
  • Customize tool bar... lets you rearrange your tool bar to facilitate your GenomeSpace use.

Manage menu:

  • Groups opens a window that allows you to add and edit GenomeSpace user groups.
  • Private Tools opens a window that allows you to add your own tools to your GenomeSpace tool bar.
  • External Storage opens a window that allows you to add public or private cloud storage buckets to your GenomeSpace.
  • Search for Tools opens a search window that lets you search for a particular GenomeSpace tool.

Recipes menu: allows you to view a GenomeSpace recipe.

Help menu: Provides links to the User Guide, the Tool Guide, list of Format Converters, the System Status, the GenomeSpace Support page, email for the GenomeSpace team, and the About GenomeSpace dialog.

Tool Bar

Each GenomeSpace tool and data source is represented here.

For the tools, you can click the triangle next to the logo to launch the tool, to launch a tool with a file you have selected, or to see the tool guide page for that tool.  You can also drag and drop a file onto the tool logo to launch that tool on that file. 

For the data sources, you can click the triangle next to the logo to launch the tool or to see the tool guide page for that data source.

In both cases, you can select Tags to add a tag the tool for ease of searching.

Directories

Your directories: You can navigate through your directory structure as you would normally.  If you click the triangle to the right of one of your directories, you will see the following set of options:

Upload allows you to upload a file to this directory.

Create Subdirectory creates a directory inside this directory.

View directory link displays the location of the directory so you can copy it.

Sharing allows you to edit the sharing options for this directory.

Rename allows you to rename the directory.

Move allows you to move this directory within your GenomeSpace directory structure.

Delete will delete the directory and all its contents.

The Shared to [username] directory contains all the directories and files that have been shared with you by other users. Note the icon -- all subdirectories share the same icon, and there is a similar icon for shared files. If you click the triangle next to this directory or any of its subdirectories, you will see an abbreviated set of options.

The Public directory contains all directories and files that have been shared with all users by the GenomeSpace team. Note the icon -- all subdirectories share the same icon, and there is a similar icon for shared files. If you click the triangle next to this directory, you will see that it has the same set of options that the Shared to [username] folder has.

Files & Subdirectories

Files (and directories) in the selected directory: This area displays the files and directories contained in the directory you selected on the left, as well as navigation breadcrumbs to allow you easy access to parent folders.  If you select file checkboxes, you can use the button bar or tool bar to launch a tool with those files. 

If you left-click on a file, you will see a list of tools to which you can send the file:

If you right-click on a file, you will see the following set of options:

Preview provides a view of the selected text-based file.

Extract rows/cols allows you to extract specified rows and columns from a tab- or comma-delimited file and save them as a separate file.

Convert allows you to convert the file to another format. (Note: if there is no converter for this file format, the option will be greyed out.)

Download allows you to download the file to your local machine.

View file link displays the link so you can copy it to your clipboard.

Sharing allows you to edit the sharing options for this file.

Rename allows you to rename the selected file.

Move allows you to relocate the file in the directory structure.

Delete will delete the file.

User Profile

Clicking your username or the icon gives you two options:

  • Profile allows you to change the email address associated with your GenomeSpace account or change your password.
  • Logout logs you out of GenomeSpace

 


Using GenomeSpace

The following sections cover basic usage of GenomeSpace, including:


Registering as a GenomeSpace User

Registering with GenomeSpace is simple: click the Register New GenomeSpace User link on the login dialog (or go here).  Then:

  1. Choose a username and password.
  2. Supply a valid email address.
  3. Click Sign Up.

You will see a message: Email has been sent to [email address] describing how to complete your registration. It will remain valid for 24 hours.

Read the email and follow its instructions to complete your registration.


Logging Into GenomeSpace

When you start GenomeSpace, you will be asked to log in, via OpenID, with the username and password you chose when you registered. If you have not registered, click the Register New User button.

If you have forgotten your username or password, click the Forgot your password? link.  It will ask you to enter your username or email address, and a temporary password will be sent to your email address.

If this is the first time you have logged into GenomeSpace, you will see a welcome dialog that offers you shortcuts for uploading data files.

Logging In Via GenomeSpace Tools

Many GenomeSpace tools recognize your username if you are already logged in via GenomeSpace, though there are some that may require you to enter a password (this is a matter of how each tool handles logins). However, if you are starting from one of tools and want to send files to your GenomeSpace directories, there are menu items that allow you to log in from that tool.  See the tool guide for more details.


Sending Files to Tools from GenomeSpace

1. Direct your file to a tool.

There are several methods for sending files to a tool:

You can left-click the file and select the tool to which you want to send it.

OR

You can select the file checkbox and select Launch>[Tool].

OR

You can select the file checkbox and click the tool button.

OR

You can select the file checkbox, click the triangle next to the tool, and select Launch on File.

OR

You can drag the file to the tool button.

2. Are there more files you want to send?

All of these options will open a dialog box with a target field.  Drag other files you want to send to the tool to the target field and click Launch.

3. For more information about the GenomeSpace capabilities of the GenomeSpace tools, see the Tool Guide.


Running an Analysis

Loading Data Files

To load data files into GenomeSpace:

  1. Right-click a directory and select Upload.
  2. Browse to the local folder containing your file, select it, and click Upload.

Data Transformations

There are two ways to convert a data file that is in an incorrect format for a given tool's analysis:

Send it to the tool and run the analysis; a GenomeSpace converter will convert the file on its way to analysis.

OR

Right-click the file and select Convert (or select the file and then select File>Convert); you can then select from the file format options available for that particular file and select either Download (to download to your local machine) or Convert on Server (to save the new file in the same directory in your GenomeSpace cloud storage)

Sending Data to a Specific Tool from GenomeSpace

As discussed here, there are several ways you can send your data to a tool.  In this case, we will left-click on the file to be sent and select the tool from the drop-down list.

This opens a dialog box with a drag-and-drop target so that you can add more files to be sent to that tool.

When all the files are set to be sent, click Launch.  This will launch the tool with the specified file(s) in memory.  There may be additional steps within the tool to complete the analysis.


Customizing Your GenomeSpace

You can customize your GenomeSpace tool bar to show only the tools you are interested in using, in the order you prefer them listed.  Your selections will persist, so that whenever you log into GenomeSpace, the tool bar will still be set up according to your preferences.

To begin customizing your GenomeSpace tool bar, select View>Customize tool bar...

This opens the Customize toolbar dialog box.

After you have the tool bar arranged to your satisfaction, remember to click Save.  If you click Close without saving, your changes will be lost.

If you want to restore the default configuration of the tool bar (the full list, arranged alphabetically), click Revert.  If you want to save it in this default configuration, remember to click Save.

For Example

If you arrange the tools in this configuration in the dialog box:

Your tool bar would look like this:


Managing Your Files

Storing Your Files

GenomeSpace allows you to store your data securely in the Amazon Elastic Compute Cloud (Amazon EC2), which is a web service that provides resizable compute capacity in the cloud.  Having the data in one centralized location makes it easy for GenomeSpace to send data to one of the tools and receive the results files back from analyses, and it enables you to perform analyses on those data anywhere.

For more information about the Amazon Cloud, see the Amazon EC2 web site and the Amazon EC2 FAQ.

Upload Files

To upload a file to an existing directory, you can:

NOTE: If you are using Chrome, the browser upload your file into memory while uploading to GenomeSpace, so you will want to use the Upload menu option (Java Uploader applet) in GenomeSpace to upload files larger than about 1GB.

Browse to the file on your local computer that you want to upload, select it, and then click Upload in the dialog box.  GenomeSpace will show you a list of files (and their destinations in the your directory structure) that are queued for uploading the file upload dialog.  You can close this window and continue working in GenomeSpace; you can always check the list of uploads by selecting View>Recent Uploads.

Download Files

To download a file stored in the cloud to your local machine, you can:

Managing Your Directories and Files

The Directories and Files panes show the contents of your directories in the cloud.  From here, you can manipulate your data files in a number of ways.

Manage Directories

You can manage your directories by clicking the triangle next to a directory name and:

Manage Files

You can also manage individual files by right-clicking the file name and:

Moving and Copying Your Files

To move your files or folders from one directory to another, you can:

To copy your files from one directory to another, you can:

 


Converting Files

GenomeSpace contains built-in file converters for frictionlessly moving your files from one tool to another. 

If you have a file converter you would like to contribute to GenomeSpace, please contact gs-help@broadinstitute.org.

To Convert a File

In sending files to different tools, you may automatically invoke a file conversion.  Each tool handles these conversions differently, and may offer you options within the tool's user interface, or may convert the file behind the scenes.

You can convert files within GenomeSpace by:

Right-clicking the file and selecting Convert

OR

Selecting the file checkbox and selecting File>Convert

NOTE: The option to convert the file will not be available if there is no converter for that format.

This opens the Convert File Format dialog. If there are multiple converters for the file type, you can select the destination file type from the drop-down menu.

To convert and download the file, click Download.

To convert and save the new converted file in the same directory of your GenomeSpace cloud storage, click Convert on Server.

Available Converters

File converters are being added all the time; for the most current list, select Help>About.  Click the Format Converters tab in the About dialog.

Converts this file format... To this file format.
ADJ XGMML
ATTR (Cytoscape) ATTR (Cytoscape GeneMania)
GCT ATTR (Cytoscape)
GXP
TAB (Genomica)
geneset.TAB (The input is a dataset with expression values for genes/probes, but the output is just the list of probes in Genomica TAB format.)
EXP (geWorkbench Affymetrix EXPeriment file)
GMT TAB (Genomica)
GXP GCT
ODF ATTR (Cytoscape)
REG2TARGET geneset.TAB (The input is a file that contains the mapping between regulators and target genes, but the output is just the list of probes in Genomica TAB format.) 
GMT
RES GXP
TAB (Genomica)
geneset.TAB (The input is a dataset with expression values for genes/probes, but the output is just the list of probes in Genomica TAB format.)
TAB (Genomica) GCT

 

File Format Information

ADJ Adjacency file, tab-delimited. Used by the ARACNE module in GenePattern.  The ARACNE module is an algorithm that reverse engineers a gene regulatory network from microarray gene expression data.  Further information on this file format can be found here (PDF).
ATTR (Cytoscape) Cytoscape format that describes node and edge attributes.  Further information on this file format can be found here.
ATTR (Cytoscape GeneMania) Cytoscape attribute format for GeneMania networks. See this page for more information about the GeneMania plugin for Cytoscape.
EXP The geWorkbench native tab-delimited format for saving microarray data, providing a way to include both the data matrix for a group of arrays and various set labels grouping these arrays in the same file. Further information on this file format can be found here.
GCT A tab-delimited file format that describes an expression dataset. Further information on the file format can be found here.  Used in GenePattern.
GMT Tab-delimited file format that describes gene sets. Each row represents a gene set.  Further information on this file format can be found here.
GXP Genomica proprietary expression file format.  This file format can be used to store the results of complex analyses, and a single GXP file can store multiple annotation files and analyses.
ODF The Output Description Format (ODF) is similar to the RES or GCT file formats for gene expression datasets. The main difference is in the header; the body of data still contains the expression level values for each gene in each sample. Further information on the file format can be found here. Used in GenePattern.
REG2TARGET A two-column format that contains the mapping between regulators (column 1) and target genes (column 2).
RES

A tab-delimited file format that describes an expression dataset. Unlike the GCT file format, the RES file format contains labels for each gene's absent (A) versus present (P) calls, as generated by Affymetrix's GeneChip software and does not allow missing expression values.  Furhter information on the file format can be found here.  Used in GenePattern.

TAB Tab-delimited text file that contains gene expression data. The first row is a header row, where the names of the arrays/experiments are specified from column 3 and on. From the second row onward, rows specify expression data for each gene, where the first column is the unique identifier of each gene, the second column specifies the name and the description of the gene (where the name and description are separated by " - " [the surrounding spaces are important]), and column 3 and beyond  specify the expression data for the gene across all experiments. Used by Genomica.
XGMML eXtensible Graph Markup and Modeling Language file. These files contain network data and node/edge/network attributes. Further information on the file format can be found here.  Used by Cytoscape.

 


Extracting Rows and Columns from a File

If you have a tab- or comma-delimited data file from which you want to extract a set of rows and columns into a separate file, you do not need to download the file, perform the operation in Excel or another spreadsheet program, and upload the new file: GenomeSpace has server-side row and column extraction that allows you to pull a set of rows and columns out of your tab- or comma-delimited data file and save it as its own file in GenomeSpace.

To access this feature you can either:

This opens a dialog, showing only the first 10 or so lines (or the first 50kb if there are a lot of columns) of the selected file.

In this dialog, you can select the first row at which to start, so that you, for example, trim out header lines by starting at a lower row, and the last row to include (leave this blank to take the rest of the file from the starting row). Then select columns by checking the checkbox at the top of each column. 

Note that the selected cells that will be copied out to the new file are highlighted in pale blue, while those that will be left out of the new file are displayed as grey text on a white background.

Optionally, you can change the name of the new file.  The default adds .slice to the end of the source filename, leaving the file extension intact.  For example, if your source filename is myfile.gct, the default extracted file name will be myfile.slice.gct.  Note that if you are removing header lines you may also want to change the file extension to match the new format.

Click Save to create a new GenomeSpace file in the same GenomeSpace directory as the original file.


Sharing Your Data

Getting to the Sharing Options for Your Directories

You can edit the sharing options for one of your directories by clicking the triangle next to the directory name and selecting Sharing.

Getting to the Sharing Options for Your Files

You can edit the sharing options for one of your files by:

Right-clicking the filename and selecting Sharing.

OR

Selecting the file checkbox and selecting File>Sharing.

Sharing & Permissions

Any of the Sharing options above opens the Sharing & Permissions dialog.

This dialog box shows the file or directory owner.  It also lists all users or user groups that have access to that file or directory, and what privileges they have (Read or Read, Write & Delete).

To change the sharing options, click the Edit Sharing button on this dialog.

Edit Sharing Options

Clicking the Edit Sharing button opens the sharing management dialog.

Here, you can add a specific user to the sharing list for the file/directory by entering their GenomeSpace user name under Share with User, selecting the privileges you want them to have, and clicking Grant Permissions.

Groups can be similarly included in the permissions for a file/directory by selecting the group from the Group Name drop-down selecting the privileges you want to grant to all users in that group, and clicking Grant Permissions.

You can also choose to share your file or directory with all GenomeSpace users by selecting Allow public access, selecting the privileges you want to grant, and clicking Grant Permissions.

If you need to make a new group, or to manage an existing group, click the wrench icon to the right of the Group name drop-down list.

Manage Groups

The manage groups dialog has several major areas.

List of user groups in GenomeSpace.

From here, you can select one of your groups and delete it with the Delete Selected Group button.

You can also select a group to see its group members.

Use this area of the dialog to create a new group by entering the name and a short description, and clicking Add New Group.  Select the Membership is publicly visible checkbox to allow all GenomeSpace users to see the users in the group.

Under Group Members, you can not only see the list of group members, but also delete users from the group (by clicking the red X next to the user name), and add users to the group.  You can make a user a group administrator by selecting the user and the User can administer group checkbox.

To add a user, enter the GenomeSpace user name in the add user field and click the Add user to... button

 


Managing Private Tools

Adding Private Tools to Your GenomeSpace Tool Bar

You can add any GenomeSpace-enabled tool to your GenomeSpace tool bar as a private instance. Only you and whoever you share it with (individuals or groups) can see or use this instance. For example, you can add your private GenePattern or Galaxy server to your tool bar.  Any GenomeSpace-enabled tool can be similarly added.  If you are a developer and would like more details on making your tool GenomeSpace-enabled, see Adding a Tool to GenomeSpace.

To add a private tool to your GenomeSpace tool bar:

  1. Select Manage>Private Tools.

  1. Click Add new.

  1. Complete the new tool dialog form. See the figure below for an example form.

This figure shows an example of the new tool dialog filled out for a private GenePattern instance.

  1. Click Save.

More Resources

Some of the GenomeSpace tools have  instructions in their help for managing private instances in GenomeSpace:


Managing External Storage

You can add external storage from public Amazon S3 buckets and your own Amazon S3 buckets to your GenomeSpace.

In order to manage your external storage options, select Manage>External Storage from the menu bar.


Adding An Amazon S3 Bucket

The GenomeSpace Data Manager was originally built to save the files you upload to GenomeSpace in an Amazon Simple Storage System (S3) bucket that is managed by GenomeSpace itself. However you can add additional Amazon S3 buckets to GenomeSpace that you or a third party has set up to make the file contents available to your GenomeSpace and your GenomeSpace tools. For buckets that are publicly accessible, you only need to tell GenomeSpace the name of the bucket to mount it.  However, for private buckets, or those with limited non-public accessibility, the process is more complex, requiring you to set up a sub-account and the minimal permissions in Amazon to share the bucket with GenomeSpace.  Once a bucket has been mounted in GenomeSpace, you can share it with other GenomeSpace users using the standard GenomeSpace sharing dialogs.

Mounting a Public Amazon S3 Bucket

A publicly accessible Amazon bucket has its permissions set to allow public access to anyone without authentication. Public buckets may be mounted only with READ permissions in GenomeSpace: you cannot write files back to them.  This is to prevent users from accidentally saving files to Amazon S3 buckets that they do not control.  If you want to have GenomeSpace WRITE privileges to a bucket, follow the instructions for mounting private buckets below. 

To mount a public Amazon S3 bucket (you must log into GenomeSpace to start):

  1. Select Manage>External Storage. This will open the Mount Cloud Storage dialog box.
  2. Select the Public S3 Bucket tab.

  1. Enter the name of the publicly accessible bucket you want to mount in the field.  For this example, we are using the '1000genomes' bucket that is publicly accessible on Amazon S3.

  1. Click the Submit button to mount the bucket.  After a few seconds, the directory view should refresh with the new bucket mounted under /Home/S3:1000genomes.  You can now read files from this bucket as you would any other files in GenomeSpace and share it (Read only) to other GenomeSpace users.

 

Mounting a Private Read/Write Amazon S3 Bucket to GenomeSpace

The following instructions describe how to mount an Amazon S3 bucket with WRITE permissions in GenomeSpace or one that you own but do not want to expose publicly.  To do this you will need to follow the steps detailed below in order to prove that you are the owner of the bucket and to set up the minimal permissions to allow GenomeSpace to access it.  You will be using  GenomeSpace and several of the Amazon AWS management consoles to complete this task.

To mount an Amazon bucket with WRITE permissions in GenomeSpace or that you own and do not want to make publicly available (log into GenomeSpace to start):

  1. Select Manage>External Storage. This will open the Mount Cloud Storage dialog box.
  2. Select the Read/Write S3 Bucket tab.

  1. Click the Begin button to start the process of setting up an Amazon sub-account, providing it to GenomeSpace, and setting the permissions on the account to allow it to access your Amazon S3 bucket.  It will open the Mount Read/Write Bucket wizard.

NOTE: When this wizard opens, GenomeSpace automatically generates an account name for your Amazon sub-account and puts it in the 'Pending' status.  'Pending' means that the account may not exist yet and GenomeSpace does not have the necessary account credentials to access the account yet.

Wizard: Create or select a GenomeSpace Amazon AWS account

  1. Click the AWS IAM Management Console link. This will open a window for the Amazon IAM Management Console.
  2. Log into Amazon using the Amazon login that owns the bucket you want to mount.

 

  1. Click the Create New Users button.
  2. Copy/paste the pending account ID from the GenomeSpace window into the first Enter User Names field.  Make sure the Generate an access key for each User checkbox is selected so that you do not have to share your main Amazon account credentials to GenomeSpace. Click the Create button.

  1. Once you have created your sub-account, Amazon will offer to let you download the credentials.  Download these now and save them in a safe place as you will be sharing these to GenomeSpace in a moment.  Once this dialog is closed it is NO LONGER POSSIBLE to get the credentials from Amazon so it is important to save them now.

Wizard: Give AWS Account Credentials to GenomeSpace

  1. Return to the GenomeSpace window and click the second step, Give AWS Account Credentials to GenomeSpace.
  2. Enter the Access ID and Secret Key you downloaded from the Amazon console.

Wizard: Generate Account AWS Policy

The policy document you generate in this step will allow this account to access your S3 bucket on behalf of GenomeSpace.

  1. Click the Generate Account AWS Policy step in the wizard.
  2. Enter your bucket name and Amazon account number (not your account ID).  You can get the account number from the Amazon AWS Manage Account Page (there is a link provided in the wizard).  Once you enter these values, a policy document is generated. 

  1. Copy the entire contents of the policy document to the clipboard and return to the Amazon IAM Management Console.
  2. Select your new sub-account and select the Permissions tab. Click the Attach User Policy button.

  1. Select Custom Policy and click Select.

  1. Paste the generated policy into the editor and click the Apply Policy button.

Wizard: Verify AWS Credentials

Now that the account and policy are in place in Amazon, and you have provided the access ID and secret key to GenomeSpace, we can test the connectivity to validate the credentials. 

  1. Return to the GenomeSpace wizard and click the Verify AWS Credentials step.
  2. Click the Verify account credentials button.

GenomeSpace will test the connectivity and provide you with a feedback dialog as to whether your credentials are verified or not.

If your credentials fail to verify, go back to confirm that the access ID, username, policy, and secret key are all correct.

Wizard: Connect GenomeSpace to your S3 Bucket

When your credentials are verified, you will need to select the privileges you want for this bucket (read only or read and write).  This is a global setting on the bucket and apply only to your GenomeSpace account.  You may share the bucket with other users later, using the usual sharing functionality, but only with the privileges that you select here: that is, you cannot choose "read only" here and later choose to share it with write permissions unless you remount the bucket.

  1. Click the Connect GenomeSpace to your S3 Bucket step in the wizard.
  2. Select Read only or Read and Write for the permissions on your bucket.

  1. Click the Submit button to mount the bucket.  After a few seconds, the directory view should refresh with the new bucket mounted under /Home/S3:[bucket name].

 You can now read files from this bucket as you would any other files in GenomeSpace and share it (Read only) with other GenomeSpace users.