GenomeSpace bridges the gaps between popular bioinformatics tools, making it possible for you to move data smoothly between these tools, leveraging the available analyses and visualizations in each of these tools. GenomeSpace allows you to store your data files in the Amazon cloud, and provides necessary file format transformations whenever you select an analysis or visualization within one of the tools. It also includes the ability to link your GenomeSpace login to your user accounts in the GenomeSpace tools.
GenomeSpace can be thought of as:
While GenomeSpace is the nexus that links all the GenomeSpace tools to data files stored in the cloud, via scripts that can transform the data files into formats needed by those tools, it also enables users to manipulate files and launch tools needed for the full analysis.
GenomeSpace has a web-based interface that provides functionality to:
GenomeSpace tools are applications that allow users to either perform analyses on or visualize data. You can send files to these applications from GenomeSpace and save files from those applications to GenomeSpace.
Data sources are repositories of data files that are linked to GenomeSpace so that you can send files from those repositories to GenomeSpace (and then on to GenomeSpace tools).
Currently, GenomeSpace provides connections between the tools and data sources listed here.
If you want to just get in there and start working with GenomeSpace, check out the Public/SharedData folder in your GenomeSpace Directories for some sample data files you can use with or without the Recipes.
You will see five primary regions on your screen when you look at GenomeSpace:
Launch menu: allows you to launch a tool or data source. If you have selected a file, you can launch a tool on that file.
Recipes menu: allows you to view a GenomeSpace recipe.
Help menu: Provides links to the User Guide, the Tool Guide, list of Format Converters, the System Status, the GenomeSpace Support page, email for the GenomeSpace team, and the About GenomeSpace dialog.
Each GenomeSpace tool and data source is represented here.
For the tools, you can click the triangle next to the logo to launch the tool, to launch a tool with a file you have selected, or to see the tool guide page for that tool. You can also drag and drop a file onto the tool logo to launch that tool on that file.
For the data sources, you can click the triangle next to the logo to launch the tool or to see the tool guide page for that data source.
In both cases, you can select Tags to add a tag the tool for ease of searching.
Your directories: You can navigate through your directory structure as you would normally. If you click the triangle to the right of one of your directories, you will see the following set of options:
The Shared to [username] directory contains all the directories and files that have been shared with you by other users. Note the icon -- all subdirectories share the same icon, and there is a similar icon for shared files. If you click the triangle next to this directory or any of its subdirectories, you will see an abbreviated set of options.
The Public directory contains all directories and files that have been shared with all users by the GenomeSpace team. Note the icon -- all subdirectories share the same icon, and there is a similar icon for shared files. If you click the triangle next to this directory, you will see that it has the same set of options that the Shared to [username] folder has.
|Files & Subdirectories||
Files (and directories) in the selected directory: This area displays the files and directories contained in the directory you selected on the left, as well as navigation breadcrumbs to allow you easy access to parent folders. If you select file checkboxes, you can use the button bar or tool bar to launch a tool with those files.
If you left-click on a file, you will see a list of tools to which you can send the file:
If you right-click on a file, you will see the following set of options:
Clicking your username or the icon gives you two options:
The following sections cover basic usage of GenomeSpace, including:
Registering with GenomeSpace is simple: click the Register New GenomeSpace User link on the login dialog (or go here). Then:
You will see a message: Email has been sent to [email address] describing how to complete your registration. It will remain valid for 24 hours.
Read the email and follow its instructions to complete your registration.
When you start GenomeSpace, you will be asked to log in, via OpenID, with the username and password you chose when you registered. If you have not registered, click the Register New User button.
If you have forgotten your username or password, click the Forgot your password? link. It will ask you to enter your username or email address, and a temporary password will be sent to your email address.
If this is the first time you have logged into GenomeSpace, you will see a welcome dialog that offers you shortcuts for uploading data files.
Many GenomeSpace tools recognize your username if you are already logged in via GenomeSpace, though there are some that may require you to enter a password (this is a matter of how each tool handles logins). However, if you are starting from one of tools and want to send files to your GenomeSpace directories, there are menu items that allow you to log in from that tool. See the tool guide for more details.
There are several methods for sending files to a tool:
You can left-click the file and select the tool to which you want to send it.
You can select the file checkbox and select Launch>[Tool].
You can select the file checkbox and click the tool button.
You can select the file checkbox, click the triangle next to the tool, and select Launch on File.
You can drag the file to the tool button.
All of these options will open a dialog box with a target field. Drag other files you want to send to the tool to the target field and click Launch.
To load data files into GenomeSpace:
There are two ways to convert a data file that is in an incorrect format for a given tool's analysis:
Send it to the tool and run the analysis; a GenomeSpace converter will convert the file on its way to analysis.
Right-click the file and select Convert (or select the file and then select File>Convert); you can then select from the file format options available for that particular file and select either Download (to download to your local machine) or Convert on Server (to save the new file in the same directory in your GenomeSpace cloud storage)
As discussed here, there are several ways you can send your data to a tool. In this case, we will left-click on the file to be sent and select the tool from the drop-down list.
This opens a dialog box with a drag-and-drop target so that you can add more files to be sent to that tool.
When all the files are set to be sent, click Launch. This will launch the tool with the specified file(s) in memory. There may be additional steps within the tool to complete the analysis.
You can customize your GenomeSpace tool bar to show only the tools you are interested in using, in the order you prefer them listed. Your selections will persist, so that whenever you log into GenomeSpace, the tool bar will still be set up according to your preferences.
To begin customizing your GenomeSpace tool bar, select View>Customize tool bar...
This opens the Customize toolbar dialog box.
After you have the tool bar arranged to your satisfaction, remember to click Save. If you click Close without saving, your changes will be lost.
If you want to restore the default configuration of the tool bar (the full list, arranged alphabetically), click Revert. If you want to save it in this default configuration, remember to click Save.
If you arrange the tools in this configuration in the dialog box:
Your tool bar would look like this:
GenomeSpace allows you to store your data securely in the Amazon Elastic Compute Cloud (Amazon EC2), which is a web service that provides resizable compute capacity in the cloud. Having the data in one centralized location makes it easy for GenomeSpace to send data to one of the tools and receive the results files back from analyses, and it enables you to perform analyses on those data anywhere.
For more information about the Amazon Cloud, see the Amazon EC2 web site and the Amazon EC2 FAQ.
To upload a file to an existing directory, you can:
NOTE: If you are using Chrome, the browser upload your file into memory while uploading to GenomeSpace, so you will want to use the Upload menu option (Java Uploader applet) in GenomeSpace to upload files larger than about 1GB.
Browse to the file on your local computer that you want to upload, select it, and then click Upload in the dialog box. GenomeSpace will show you a list of files (and their destinations in the your directory structure) that are queued for uploading the file upload dialog. You can close this window and continue working in GenomeSpace; you can always check the list of uploads by selecting View>Recent Uploads.
To download a file stored in the cloud to your local machine, you can:
The Directories and Files panes show the contents of your directories in the cloud. From here, you can manipulate your data files in a number of ways.
You can manage your directories by clicking the triangle next to a directory name and:
You can also manage individual files by right-clicking the file name and:
To move your files or folders from one directory to another, you can:
To copy your files from one directory to another, you can:
GenomeSpace contains built-in file converters for frictionlessly moving your files from one tool to another.
If you have a file converter you would like to contribute to GenomeSpace, please contact firstname.lastname@example.org.
In sending files to different tools, you may automatically invoke a file conversion. Each tool handles these conversions differently, and may offer you options within the tool's user interface, or may convert the file behind the scenes.
You can convert files within GenomeSpace by:
Right-clicking the file and selecting Convert
Selecting the file checkbox and selecting File>Convert
NOTE: The option to convert the file will not be available if there is no converter for that format.
This opens the Convert File Format dialog. If there are multiple converters for the file type, you can select the destination file type from the drop-down menu.
To convert and download the file, click Download.
To convert and save the new converted file in the same directory of your GenomeSpace cloud storage, click Convert on Server.
File converters are being added all the time; for the most current list, select Help>About. Click the Format Converters tab in the About dialog.
|Converts this file format...||To this file format.|
|ATTR (Cytoscape)||ATTR (Cytoscape GeneMania)|
|geneset.TAB (The input is a dataset with expression values for genes/probes, but the output is just the list of probes in Genomica TAB format.)|
|EXP (geWorkbench Affymetrix EXPeriment file)|
|REG2TARGET||geneset.TAB (The input is a file that contains the mapping between regulators and target genes, but the output is just the list of probes in Genomica TAB format.)|
|geneset.TAB (The input is a dataset with expression values for genes/probes, but the output is just the list of probes in Genomica TAB format.)|
|ADJ||Adjacency file, tab-delimited. Used by the ARACNE module in GenePattern. The ARACNE module is an algorithm that reverse engineers a gene regulatory network from microarray gene expression data. Further information on this file format can be found here (PDF).|
|ATTR (Cytoscape)||Cytoscape format that describes node and edge attributes. Further information on this file format can be found here.|
|ATTR (Cytoscape GeneMania)||Cytoscape attribute format for GeneMania networks. See this page for more information about the GeneMania plugin for Cytoscape.|
|EXP||The geWorkbench native tab-delimited format for saving microarray data, providing a way to include both the data matrix for a group of arrays and various set labels grouping these arrays in the same file. Further information on this file format can be found here.|
|GCT||A tab-delimited file format that describes an expression dataset. Further information on the file format can be found here. Used in GenePattern.|
|GMT||Tab-delimited file format that describes gene sets. Each row represents a gene set. Further information on this file format can be found here.|
|GXP||Genomica proprietary expression file format. This file format can be used to store the results of complex analyses, and a single GXP file can store multiple annotation files and analyses.|
|ODF||The Output Description Format (ODF) is similar to the RES or GCT file formats for gene expression datasets. The main difference is in the header; the body of data still contains the expression level values for each gene in each sample. Further information on the file format can be found here. Used in GenePattern.|
|REG2TARGET||A two-column format that contains the mapping between regulators (column 1) and target genes (column 2).|
A tab-delimited file format that describes an expression dataset. Unlike the GCT file format, the RES file format contains labels for each gene's absent (A) versus present (P) calls, as generated by Affymetrix's GeneChip software and does not allow missing expression values. Furhter information on the file format can be found here. Used in GenePattern.
|TAB||Tab-delimited text file that contains gene expression data. The first row is a header row, where the names of the arrays/experiments are specified from column 3 and on. From the second row onward, rows specify expression data for each gene, where the first column is the unique identifier of each gene, the second column specifies the name and the description of the gene (where the name and description are separated by " - " [the surrounding spaces are important]), and column 3 and beyond specify the expression data for the gene across all experiments. Used by Genomica.|
|XGMML||eXtensible Graph Markup and Modeling Language file. These files contain network data and node/edge/network attributes. Further information on the file format can be found here. Used by Cytoscape.|
If you have a tab- or comma-delimited data file from which you want to extract a set of rows and columns into a separate file, you do not need to download the file, perform the operation in Excel or another spreadsheet program, and upload the new file: GenomeSpace has server-side row and column extraction that allows you to pull a set of rows and columns out of your tab- or comma-delimited data file and save it as its own file in GenomeSpace.
To access this feature you can either:
This opens a dialog, showing only the first 10 or so lines (or the first 50kb if there are a lot of columns) of the selected file.
In this dialog, you can select the first row at which to start, so that you, for example, trim out header lines by starting at a lower row, and the last row to include (leave this blank to take the rest of the file from the starting row). Then select columns by checking the checkbox at the top of each column.
Note that the selected cells that will be copied out to the new file are highlighted in pale blue, while those that will be left out of the new file are displayed as grey text on a white background.
Optionally, you can change the name of the new file. The default adds .slice to the end of the source filename, leaving the file extension intact. For example, if your source filename is myfile.gct, the default extracted file name will be myfile.slice.gct. Note that if you are removing header lines you may also want to change the file extension to match the new format.
Click Save to create a new GenomeSpace file in the same GenomeSpace directory as the original file.
You can edit the sharing options for one of your directories by clicking the triangle next to the directory name and selecting Sharing.
You can edit the sharing options for one of your files by:
Right-clicking the filename and selecting Sharing.
Selecting the file checkbox and selecting File>Sharing.
Any of the Sharing options above opens the Sharing & Permissions dialog.
This dialog box shows the file or directory owner. It also lists all users or user groups that have access to that file or directory, and what privileges they have (Read or Read, Write & Delete).
To change the sharing options, click the Edit Sharing button on this dialog.
Clicking the Edit Sharing button opens the sharing management dialog.
Here, you can add a specific user to the sharing list for the file/directory by entering their GenomeSpace user name under Share with User, selecting the privileges you want them to have, and clicking Grant Permissions.
Groups can be similarly included in the permissions for a file/directory by selecting the group from the Group Name drop-down selecting the privileges you want to grant to all users in that group, and clicking Grant Permissions.
You can also choose to share your file or directory with all GenomeSpace users by selecting Allow public access, selecting the privileges you want to grant, and clicking Grant Permissions.
If you need to make a new group, or to manage an existing group, click the wrench icon to the right of the Group name drop-down list.
The manage groups dialog has several major areas.
List of user groups in GenomeSpace.
From here, you can select one of your groups and delete it with the Delete Selected Group button.
You can also select a group to see its group members.
Use this area of the dialog to create a new group by entering the name and a short description, and clicking Add New Group. Select the Membership is publicly visible checkbox to allow all GenomeSpace users to see the users in the group.
Under Group Members, you can not only see the list of group members, but also delete users from the group (by clicking the red X next to the user name), and add users to the group. You can make a user a group administrator by selecting the user and the User can administer group checkbox.
To add a user, enter the GenomeSpace user name in the add user field and click the Add user to... button
You can add any GenomeSpace-enabled tool to your GenomeSpace tool bar as a private instance. Only you and whoever you share it with (individuals or groups) can see or use this instance. For example, you can add your private GenePattern or Galaxy server to your tool bar. Any GenomeSpace-enabled tool can be similarly added. If you are a developer and would like more details on making your tool GenomeSpace-enabled, see Adding a Tool to GenomeSpace.
To add a private tool to your GenomeSpace tool bar:
This figure shows an example of the new tool dialog filled out for a private GenePattern instance.
Some of the GenomeSpace tools have instructions in their help for managing private instances in GenomeSpace:
You can add external storage from public Amazon S3 buckets and your own Amazon S3 buckets to your GenomeSpace.
In order to manage your external storage options, select Manage>External Storage from the menu bar.
The GenomeSpace Data Manager was originally built to save the files you upload to GenomeSpace in an Amazon Simple Storage System (S3) bucket that is managed by GenomeSpace itself. However you can add additional Amazon S3 buckets to GenomeSpace that you or a third party has set up to make the file contents available to your GenomeSpace and your GenomeSpace tools. For buckets that are publicly accessible, you only need to tell GenomeSpace the name of the bucket to mount it. However, for private buckets, or those with limited non-public accessibility, the process is more complex, requiring you to set up a sub-account and the minimal permissions in Amazon to share the bucket with GenomeSpace. Once a bucket has been mounted in GenomeSpace, you can share it with other GenomeSpace users using the standard GenomeSpace sharing dialogs.
A publicly accessible Amazon bucket has its permissions set to allow public access to anyone without authentication. Public buckets may be mounted only with READ permissions in GenomeSpace: you cannot write files back to them. This is to prevent users from accidentally saving files to Amazon S3 buckets that they do not control. If you want to have GenomeSpace WRITE privileges to a bucket, follow the instructions for mounting private buckets below.
To mount a public Amazon S3 bucket (you must log into GenomeSpace to start):
The following instructions describe how to mount an Amazon S3 bucket with WRITE permissions in GenomeSpace or one that you own but do not want to expose publicly. To do this you will need to follow the steps detailed below in order to prove that you are the owner of the bucket and to set up the minimal permissions to allow GenomeSpace to access it. You will be using GenomeSpace and several of the Amazon AWS management consoles to complete this task.
To mount an Amazon bucket with WRITE permissions in GenomeSpace or that you own and do not want to make publicly available (log into GenomeSpace to start):
NOTE: When this wizard opens, GenomeSpace automatically generates an account name for your Amazon sub-account and puts it in the 'Pending' status. 'Pending' means that the account may not exist yet and GenomeSpace does not have the necessary account credentials to access the account yet.
The policy document you generate in this step will allow this account to access your S3 bucket on behalf of GenomeSpace.
Now that the account and policy are in place in Amazon, and you have provided the access ID and secret key to GenomeSpace, we can test the connectivity to validate the credentials.
GenomeSpace will test the connectivity and provide you with a feedback dialog as to whether your credentials are verified or not.
If your credentials fail to verify, go back to confirm that the access ID, username, policy, and secret key are all correct.
When your credentials are verified, you will need to select the privileges you want for this bucket (read only or read and write). This is a global setting on the bucket and apply only to your GenomeSpace account. You may share the bucket with other users later, using the usual sharing functionality, but only with the privileges that you select here: that is, you cannot choose "read only" here and later choose to share it with write permissions unless you remount the bucket.
You can now read files from this bucket as you would any other files in GenomeSpace and share it (Read only) with other GenomeSpace users.