| 

Back


RESTful Access to Data Manager (version beta 2.0)

Introduction

Part of the functionality that we are including in the GenomeSpace CDK is a Java API that provides client applications authentication and authorization services for the GenomeSpace system, as well as access to data file services. However, if your application is not written in Java and you want to GenomeSpace-enable it, you can still access these services by them directly over the web.

HTTP Client Requirements

In addition to supporting the usual HTTP POST, GET, PUT, etc., your client will have to be able to do the following:

Handle cookies

Accept cookies and send their values to the correct domains

Support HTTP redirects

Accept as a response HTTP code 303 and automatically GET the response from the specified location URL.

Support Basic Authentication

Client should be ready to respond to Basic authentication credentials challenge via HTTP response code 401.

Support JSON marshalling and unmarshalling

The Data Manager will generate responses using JSON format and some operations require the client to PUT or POST JSON payloads.

Support for HTTPS

All communications to the Identity Server and the Data Manager service occur through secure HTTP.  We have obtained third-party certificates for our servers, so updating trust stores should not be necessary in most cases.

Common Request Headers
Every request to the Data Manager should include the Accept header with a preference for the media types application/json and application/text.
The gs-token cookie should also be included. More on gs-token below.

Login and Authentication

You can do an explicit login at the beginning of a session by issuing a GET to:
https://identitytest.genomespace.org:8443/identityServer/basic
You will get a HTTP 401 response code with a request for credentials using Basic Authentication. Basic authentication is described in RFC2617.
If you are utilizing an HTTP library, it is very likely that it already includes support for Basic authentication.
On successful submission of credentials, the server’s response will include a cookie named gs-token. The cookie should be included in all subsequent requests to the Data Manager. If your library has cookie handling enabled, the inclusion of the cookie in requests should happen automatically.
If a Data Manager request does not include the gs-token cookie or the session has expired, server will respond with another 401 authentication challenge. If successfully responded to, the client will be redirected to the original request.

WADL

All the operations available from the Data Manager  are described in Web Application Description Language (WADL) specification. The WADL can be accessed at:
https://dmtest.genomespace.org:8444/datamanager/application.wadl

GSFileMetadata, GSDataFormat and, GSDirectoryListing Objects

Information about files managed by Data Manager will be returned in JSON objects. GSFileMetadata objects will provide information about an individual file or directory (name, path, size,etc.). GSDirectoryListing carries a GSFileMetadata object for the corresponding directory and a list of GSFileMetadata with one entry for each file and subdirectory.
GSDataFormat carries information about the format of the files and the available format transformations. Appendix A and B of this document include samples of each.

Directories

Each user name will have a default directory assigned and created to reach this default directory.
(All URLs are preceded by https://dmtest.genomespace.org:8444)

Verb URL
GET
/datamanager/defaultdirectory 


will redirect to the user’s default directory URL.
Example:
If the user name is “test”, /defaultdirectory will be redirected to:

https://dmtest.genomespace.org:8444/datamanager/files/users/test
 

Verb URL
GET
/datamanager/files/users/test/dir1 


Whenever a GET is submitted on a URL that corresponds to a directory, the DataManager will respond with a GSDirectoryListing.

To create a directory:

Verb URL Body
PUT
/datamanager/files/users/test/newdirname
{"isDirectory":true} 


where newdirname is the new directory name. Note that DataManager expects the parent directory to exist (in this case, /users/test). If directory creation is successful, the GSFileMetaData for the new directory will be returned.

Uploading files

Uploading a file is a two step process. You must first obtain from Data Manager a signed Amazon S3 URL. Then you, will PUT your file to the generated URL.

To obtain the signed URL, you will do a GET with a URL that looks like:

/datamanager/uploadurls/users/test/mydir/AnotherLittleFile.txt?Content-Length=6&Content-MD5=nwYkOry4nHDgwzHGHYcfpw%3D%3D&Content-Type=application/octet-stream 

Note that the base URL (https://dmtest.genomespace.org:8444/datamanager) is followed by /uploadurls and then by the destination path and file name (/users/test/mydir/AnotherLittleFile.txt)

The 3 query parameters included in the URL

Query Param Description
Content-Length 
The size in bytes of the file
Content-MD5

The Base64 encoded MD5 hash for the file (note in the example the value is URL-encoded.%3D is the character ‘=’ url encoded. You should URL encode the values as well. To check your code, on Mac OSX you can obtain the correct Content-MD5 value for a file by issuing the following command:

openssl md5  -binary THEFILENAME | openssl base64
Content-Type
The content type you would like to assign to the file

In response, the Data Manager service will return an Amazon S3 URL that you will use to PUT the file.
The returned URL might look something like this (included here for illustration, but what it looks like is not important. Just use whatever was returned by the web service):

https://genomespace-input.s3.amazonaws.com/users/test/mydir/AnotherLittleFile.txt?AWSAccessKeyId=AKIAIDXKHSCMYX5BHNLA&Expires=1296076583&Signature=G4tZJCObhPsvcdZxKJkBY7%2Bq378%3D

With your file PUT you will need to include 4 HTTP headers:

Content-Length      
Content-MD5 (should not be URL encoded)      
Content-Type      
x-amz-meta-md5-hash     

The first 3 should have the same values used to obtain the upload URL. The last one is an MD5 hash of the file to upload without  Base64 encoding. To check your code, on Mac OSX you can obtain the value for x-amz-meta-md5-hash with the following command line:

openssl md5 THEFILENAME 

Amazon S3 will return an HTTP status code of 200 on the successful completion of the upload.

Downloading Files

To download the file, you will need either the absolute directory file path or you would have obtained the URL for the file already.
Both the absolute file path and url are available in the GSFileMetada object (properties named path and url respectively).

The URL to GET will look something like below (minus protocol, server,port number):

/datamanager/files/users/test/mydir/AnotherLittleFile.txt

To avoid problems with special characters, you should URL encode each URL path element and the file name itself. (Do not just do a URL encode on the whole URL or you will end up with a URL that will not work).

The Data Manager will respond with a redirect to the Amazon S3 location of the file. If your HTTP client library has redirects enabled, the redirection should happen automatically.

Downloading Files in Different Data Formats

Data Manager has the ability to convert some files into formats that can be consumed by other applications.
The data format that the Data Manager thinks original file is identified in the GSFileMetadata property dataFormat (could be empty if it does not recognize the format).
GSFileMetadata also specifies the property availableDataFormats. This will be an array of GSDataFormat objects that identify the formats this file can be requested in.

To GET the file in a specific format you will build a URL that looks as follows:

/datamanager/files/dir1/dir2/fileName.ext?dataFormat=http://www.genomespace.org/datamanager/dataformat/lowercasetxt/0.0.0

The query parameter dataFormat value is the URL for the format. This URL can be obtained from the GSDataFormat url property in the availableDataFormats array.

Other File and Directory Operations

Delete A File Or Directory

Verb URL
DELETE 
/datamanager/files/dir1/dir2/fileOrDirName 


Note: To delete a directory, it needs to be empty
 

Copy a File or Directory

Verb URL Headers
PUT
/datamanager/files/dir1/destFileOrDirName
x-gs-copy-source 


Note: The URL identifies the new object that will be created by the copy.
The source file or directory is identified in the custom x-gs-copy-source header . The header value should have look like /dir1/dir3/sourceFile  (the base URL nor the URL path segment /datamanager/files should be included).

Obtain Metadata on a File or Directory

Verb URL
GET
/datamanager/filemetadata/dir1/dir2/destFileOrDirName 


This will return a JSON GSFileMetadata object. See Appendix A and B.
 

(DEPRECATED)Obtain a Temporary File Download URL

If you have a use case where you need to obtain a URL from which your application can download a file directly, bypassing authentication challenges, etc.

Verb URL
GET
/datamanager/downloadurls/dir1/dir2/destFileOrDirName 


Data Manager will return in plain text a signed S3 URL that is good for downloads for a time limited basis (currently expiration is set for 5 minutes from the time it was created).
 

Appendix A: Sample GSFileMetadata JSON

   {"name":"volumeCheck.txt",
   "path":"/users/test/volumeCheck.txt",
   "size":"37", "url":"https://dmtest.genomespace.org:8444/datamanager/files/users/test/volumeCheck.txt",
   "dataFormat":
      {"name":"txt",
       "url":"http://www.genomespace.org/datamanager/dataformat/txt/0.0.0",
       "version":"0.0.0"},
   "directory":"false",
   "lastModified":"2011-06-02T17:16:10Z",
   "owner":"test",
   "availableDataFormats":[
      {"name":"lowercasetxt",
       "url":"http://www.genomespace.org/datamanager/dataformat/lowercasetxt/0.0.0",
       "version":"0.0.0"},
      {"name":"uppercasetxt",
       "url":"http://www.genomespace.org/datamanager/dataformat/uppercasetxt/0.0.0",
       "version":"0.0.0"},
      {"name":"txt",
       "url":"http://www.genomespace.org/datamanager/dataformat/txt/0.0.0",
       "version":"0.0.0"}],
   "contentType":"application/octet-stream"
   }

Note that the backward slash (\) is used to escape the forward slash (/).

Appendix B: Sample GSDirectoryListing JSON

{"contents":
      {"availableDataFormats":[
          {"name":"lowercasetxt",
        "url":"http://www.genomespace.org/datamanager/dataformat/lowercasetxt/0.0.0",
           "version":"0.0.0"},
          {"name":"txt",
           "url":"http://www.genomespace.org/datamanager/dataformat/txt/0.0.0",
           "version":"0.0.0"}],
       "dataFormat":
          {"name":"txt",
           "url":"http://www.genomespace.org/datamanager/dataformat/txt/0.0.0",
           "version":"0.0.0"},
       "directory":"false",
       "lastModified":"2011-06-02T17:57:36Z",
       "name":"datamanagerTestFile.txt",
       "owner":"test",
       "path":"/users/test/dmClientTest26468d1cb/datamanagerTestFile.txt",
       "size":"6",
       "url":"https://dmtest.genomespace.org:8444/datamanager/files/users/test/dmClientTest/datamanagerTestFile.txt"},
   "directory":
      {"contentType":"",
       "directory":"true",
       "name":"dmClientTest26468d1cb",
       "owner":"test",
       "path":"/users/test/dmClientTest26468d1cb",
       "size":"0",
       "url":"https://dmtest.genomespace.org:8444/datamanager/files/users/test/dmClientTest"}}

Appendix C: Sample GSDataFormat JSON

{"name":"gmt", "url":"hhttp://www.genomespace.org/datamanager/dataformat/gmt/0.0.0",
   "version":"0.0.0"}