Setting Up Your Own Instance (version beta 1.0)
This document describes how to set up your own instance of GenomeSpace services.
GenomeSpace is designed to run on Amazon Web Services. The basic steps involve launching our provided Amazon Machine Image and then customizing the settings to correspond to your own Amazon credentials, machine DNS name, email accounts, etc.
This document assumes the person installing the GenomeSpace instance has some familiarity with Amazon AWS and a basic understanding of HTTP and HTTPS security, and is also comfortable working on a Unix command line.
1. An Amazon Web Services account
GenomeSpace currently runs on the Amazon Web Services cloud exclusively and makes extensive use of its APIs. If you do not have an account with Amazon, go to http://aws.amazon.com/.
2. A DNS entry for your server
A DNS entry allows it to be accessed with a URL, such as https://genomespace.myinstitution.edu/jsui.
3. A security certificate that matches the DNS entry
All communications with the GenomeSpace servers occur via HTTPS, and thus require a security certificate, preferably from a well-known certificate authority. The public GenomeSpace instance uses a security certificate issued by Network Solutions for domain *.genomespace.org (a “wildcard” certificate), but you can use whatever certificate source you prefer and the certificate can be specific to a server. We have included a dummy certificate so that you can start up the service. However, there will be problems accessing the services without an “official” security certificate, such as not being able to upload files, security warnings from web browsers, etc. Using a self-issued certificate is likely to cause the same problems.
4. An email account that can be used to send email notification to users during registration
In addition to the email account address and password, the services need the SMTP host DNS address. We have been using GMail accounts for this purpose, and our info looks like this:
1. Create a super-user for your account using the Amazon IAM service, and save its credentials.
GenomeSpace will not run properly using the credentials for the Amazon account owner or “root account” (i.e., the Amazon user associated directly with the Amazon account). Using the main Amazon account, you can create a new user that has sufficient privileges to access the Amazon buckets, SimpleDB domains, etc. To create the user:
Important: When you create the user from the AWS Console, there will be a link to download the credentials. This is the only time that you will be able to do so for this user. Download the credentials and save them; they are required for the rest of the installation procedure.
2. Create an S3 bucket and initialize it.
3. Create a security group for GenomeSpace
The security group allows you to specify the network ports needed to access the GenomeSpace web services remotely.
When you're done, the Inbound tab for your new Security Group should look like this:
4. Create a key pair for GenomeSpace.
You need to generate a key pair with a corresponding SSH key file that will allow you to access your instance remotely.
5. Launch the GenomeSpace AMI
6. Associate an Elastic IP address with your new instance.
Network addresses for EC2 instances are volatile and generally do not survive an instance restart. You can assign a stable IP address by using AWS Elastic IPs.
7. Associate a network domain name with the IP address for your instance.
Your network administrator should be able to update or create a DNS record where they can point a server name such as genomespace.company.com with the elastic IP address you just created.
8. Remotely log into the new instance.
The rest of the setup process must be performed via command line to the running instance. Access the instance via an SSH client.
Be sure to have access to:
From your local machine, execute the following command:
ssh -i YOURKEYPAIRFILE.pem ec2-user@YOURINSTANCEPUBLICDNSENTRY
where YOURKEYPAIRFILE.pem is the key file downloaded when you created the security group and YOURINSTANCEPUBLICDNSENTRY is the Public DNS address for the instance.
Note: You might get an SSH error like the one below. If you do, it means you need to reduce the permissions on the .pem file. You can get past by this error on Mac or Linux with the chmod command:
chmod 400 genomespace.pem
9. Customize configuration files.
This creates a file called genomespace-secret.key.
cp ./genomespace-secret.key ./atm-properties/
find . -name server.properties | xargs sed -i 's/theserver/server.mydomain.edu/'
find . -name server.properties| xargs sed -i 's/thebucketname/my-gs-bucket-name/'
find . -type f -name server.properties | xargs sed -i 's/gsnamespace/gs1/'
cp ./genomespace-aws.properties ./identityserver-properties
Make a note of the assigned values to use later for registering a user.
zip -r dm-properties.jar *
zip -r identityserver-properties.jar *
10. Initialize GenomeSpace databases.
There are some Amazon SimpleDB databases that should be initialized with data for GenomeSpace to start normally, and so we provide seed data for you to load.
where theNameSpace is the same as what you set earlier in for the configuration files.
11. Configure HTTPS.
GenomeSpace runs in the Virgo Application Server, which in turn is built on top of the Apache Tomcat server. Depending on the origin of the SSH certificate and what format it is in, you may have to configure the Virgo https support with different parameters and/or convert your certificate to a format that is amenable to the server. For more information:
Each vendor provides instructions on how to use the certificates they sell with Tomcat. Here are the Network Solutions instructions on how to use their certificates: http://www.networksolutions.com/support/installation-of-an-ev-ssl-certificate-for-tomcat-apache/
This is a blog post discussing how to use certificates acquired from GoDaddy: http://blog.pantek.com/2009/08/add-godaddy-ssl-certificate-to-tomcat-6.html
More information is available from Apache: http://tomcat.apache.org/tomcat-6.0-doc/ssl-howto.html
The file to edit on the GenomeSpace application server is /usr/share/virgo/config/tomcat-server.xml
The relevant portion of the file:
<Connector port="8444" protocol="HTTP/1.1"
In the main GenomeSpace setup, we have placed the certificate file (wildcard.genomespace.org.p12) in the /usr/share/virgo/config directory.
12. Update the admin password for Virgo Console.
The password for the admin user is set in the file: /usr/share/virgo/config/org.eclipse.virgo.kernel.users.properties
The property user.admin is the admin password for the Virgo admin server application. For the best security, we recommend changing it.
13. Register the user used by the Data Manager to communicate to the Group Management server.
tail -f /usr/share/virgo/serviceability/logs/virgo-server/log.log &
A form like this one should appear:
14. Start GenomeSpace.
ps -ef | grep java
If only the grep command returns, the server is completely down.
If the Virgo java process is still active, wait a few more minutes, look for the process again.
If it still running, kill the process.
nohup ./startup.sh &