Skip to main content

Transferring Files From the HPC To Google Drive

You can use the rclone tool to transfer data between the HPC and a Google Drive. This tool requires a one-time setup to work.

One-Time Setup

You need to go through the following steps once to configure rclone and authorize it to interact with your Google Drive account.

Generate your client id for use with rclone

  1. Creating your own Google client id for use with rclone will allow you to have the maximum performance available. (It's possible to use rclone without generating a client id, but it will be much slower)
  1. Go to https://console.developers.google.com (Google Cloud Console)
  2. Login with a Google account. Make sure that you can create projects with whichever account you use. You may use your SCU account, but it might not let you create a new project in Google Cloud.
  3. Click “Select a project” on the top
  4. Click “New Project”
  5. Name the project whatever you want (“rclone” is fine)
  6. Select either “No organization” if you're using a personal account. Optionally, you can create an organization using if you have Google Workspace
  7. Click “Create”
  8. Again click “Select a project” at the top
  9. Select the newly created project (eg, “rclone”)
  10. Click “Library” on the left
  11. In the “Search for APIs & Services” box, enter “drive”
  12. Select “Google Drive API”
  13. Click “Enable”
  14. Click “OAuth consent screen” on the left and go through the steps of configuring OAuth
  15. Enter a “App Name” to something you'll recognize (eg, “rclone”) and a “User support email”
  16. Click “Next”
  17. Under Audience, select “External” if you're not a Google Workspace user and “Internal” if you are
  18. Then, finish the rest of the creation process and click “Create”
  19. Go to the “Audience” on the left and scroll down to “Test Users”
  20. Click “Add users” and enter your gmail
  21. Click to the navigation menu (triple lines) in the top-left corner and then go back to “APIs & Services”
  22. Now click “Credentials” on the left
  23. Click on “Create credentials” and select “OAuth client ID”
  24. For “Application type”, select “Desktop App”
  25. Set the “Name” to whatever you like (eg, “rclone”)
  26. Click “Create”
  27. Copy both the “client ID” and “client secret” values for use below.

See this link for a GUI walk through of this process: https://github.com/Cloudbox/Cloudbox/wiki/Google-Drive-API-Client-ID-and-Client-Secret

Getting your Folder ID

rclone can be configured to confine all its operations inside a particular folder. This is optional, but strongly recommended as otherwise rclone will have access to your entire Google Drive which could lead to unintentional data loss if you're not careful.
  1. Browse to https://mail.google.com/ (Gmail)
  2. Login with the account you created the Google Cloud project on, if requested
  3. Click the “Google apps” (9 dots) button in the top-right
  4. Select “Drive”
  5. You're now in the Google Drive UI
  6. Right-click and select “New Folder”
  7. Name it something like “rclone” or “WAVE”
  8. Click “Create”
  9. Double-click the newly-created folder to open it
  10. In the URL bar at the top of the browser, the URL will now read
    1. https://drive.google.com/drive/folders/<long-string-starting-with-1>
  11. The <long-string-starting-with-1> is your FolderID
  12. Copy the FolderID for use below (but not the leading "https://drive.google.com/drive/folders/")

Configure rclone

Configuring rclone also only needs to be done once.

Phase One

$ rclone config

No remotes found - make a new one

  1. n) New remote
  2. s) Set configuration password
  3. q) Quit config

n/s/q> n
name> gdrive

Type of storage to configure.

Enter a string value. Press Enter for the default ("").

Choose a number from below, or type in your own value

<snip>

11 / Google Cloud Storage (this is not Google Drive)

   \ "google cloud storage"

12 / Google Drive

   \ "drive"

<snip>

Storage> 12 (or whatever the listed number is for Google Drive)

** See help for drive backend at: https://rclone.org/drive/ **

Google Application Client Id

Setting your own is recommended.

See https://rclone.org/drive/#making-your-own-client-id for how to create your own.

If you leave this blank, it will use an internal key which is low performance.

Enter a string value. Press Enter for the default ("").

client_id> <your client id from above>

Google Application Client Secret

Setting your own is recommended.

Enter a string value. Press Enter for the default ("").

client_secret> <your client secret from above>

Scope that rclone should use when requesting access from drive.

Enter a string value. Press Enter for the default ("").

Choose a number from below, or type in your own value

 1 / Full access all files, excluding Application Data Folder.

   \ "drive"

 2 / Read-only access to file metadata and file contents.

   \ "drive.readonly"

   / Access to files created by rclone only.

 3 | These are visible in the drive website.

   | File authorization is revoked when the user deauthorizes the app.

   \ "drive.file"

   / Allows read and write access to the Application Data folder.

 4 | This is not visible in the drive website.

   \ "drive.appfolder"

   / Allows read-only access to file metadata but

 5 | does not allow any access to read or download file content.

   \ "drive.metadata.readonly"

scope> 1

ID of the root folder

Leave blank normally.

Fill in to access "Computers" folders. (see docs).

Enter a string value. Press Enter for the default ("").

root_folder_id> Paste your FolderID from above (or just press Enter)

Service Account Credentials JSON file path

Leave blank normally.

Needed only if you want use SA instead of interactive login.

Enter a string value. Press Enter for the default ("").

service_account_file> [Press Enter]

Edit advanced config? (y/n)

  1. y) Yes
  2. n) No

y/n> n

Remote config

Use auto config?

 * Say Y if not sure

 * Say N if you are working on a remote or headless machine

  1. y) Yes
  2. n) No

y/n> n

If your browser doesn't open automatically go to the following link: https://accounts.google<whatever, copy this URL>

Phase Two

  1. Copy the above URL from your rclone config session
  2. Paste it into a browser
  3. Select the Google account you want to login with (your @scu.edu account)
  4. Click “Allow” to confirm you want to allow rclone to access your Google Drive
  5. Copy the resulting code for use below

Phase Three

Enter verification code> <Paste the code received above>

Configure this as a team drive?

  1. y) Yes
  2. n) No

y/n> n

--------------------

[gdrive]

type = drive

client_id = [your_client_id]

client_secret = [your_client_secret]

scope = drive

token = {"access_token":"[redacted]","token_type":"Bearer","refresh_token":"[redacted]","expiry":"2019-08-01T11:26:47.679442332-07:00"}

--------------------

  1. y) Yes this is OK
  2. e) Edit this remote
  3. d) Delete this remote

y/e/d> y

Current remotes:

 

Name                 Type

====                 ====

gdrive               drive

 

e) Edit existing remote

n) New remote

d) Delete remote

r) Rename remote

c) Copy remote

s) Set configuration password

q) Quit config

e/n/d/r/c/s/q> q

Using rclone on the HPC

Copying files from the HPC to Google Drive

Here's an example of pasting the date into a file and copying it to your Google Drive folder.

login1$ cd

login1$ date > date.txt

login1$ rclone copy date.txt gdrive:

Listing files in Google Drive

login1$ rclone ls gdrive:

       29 date.txt

login1$

Copying a large directory structure from the HPC to Google Drive

login1$ du -sh test/

329G    test/

login1$ du -s --inode test/

11247   test/

login1$ rclone copy -P -vv --tpslimit 10 --transfers 5 --drive-chunk-size 128M test/ gdrive:test

<snip>

Transferred:      132.419G / 328.156 GBytes, 40%, 130.961 MBytes/s, ETA 25m30s

Errors:                 0

Checks:                 0 / 0, -

Transferred:         1212 / 9995, 12%

Elapsed time:    17m15.4s

</snip>

login1$

Availability

rclone is ONLY available on the frontend login nodes. It is NOT available on the backend compute nodes. (The backend compute nodes do not have Internet access)

Performance

  • Google Drive limits transfers to two files per second, so lots of small files will take much longer to transfer.
    • You may want to consider taring small files together into a single larger tarball before uploading
  • Google Drive at SCU provides “unlimited” space, but limits uploads to 750GB in any 24 hour period
    • If you exceed this limit, you will be unable to upload additional data for 24 hours
    • If you intend to upload more than 750GB, you can give rclone the –bwlimit 8.8M argument which will restrict the transfer to 8MB/s, limiting the transfer to 742GB/24hours and ensuring you never exceed the limit.
  • If you want to upload as quickly as possible:
    • Make sure you're uploading less than 750GB total, or you'll hit the limit and be denied further Google Drive uploads for 24 hours
    • Make sure your average file size is large (>100MB)
    • Use the following command line arguments: –tpslimit 10 –transfers 5 –drive-chunk-size 128M
      • –tpslimit 10 limits rclone to using 10 transactions per second to avoid getting rate limited by Google Drive
      • –transfers 5 tells rclone to run with 5 simultaneous transfer threads
      • –drive-chunk-size 128M tells rclone to talk to Google Drive using 128M data chunks (sticking more data into a single transaction, reducing overall transactions per second, thus increasing overall bytes per second)
    • Using the above, it is possible to sustain transfers between the HPC and Google Drive at over 1Gbit/s. (125MB/s)

Getting Help

You can see all available rclone subcommands by simply running rclone

You can see all global (ie, non-subcommand-specific) flags by running: rclone help flags

You can see subcommand specific help by running rclone <subcommand> –help. For example, rclone copy –help | less