Code Ocean imposes a standard capsule structure, with associated storage limits and other usage patterns. This document outlines recommendations for working with your data in Code Ocean capsules.

Which files go where?

The Code Ocean UI shows files within a capsule on the left hand side. All capsules are given the same basic structure, with the following folders, each with an intended use case and size limitations:

See the Code Ocean user manual for more detail on the structure of a compute capsule’s default folders.

Data management flow using an external data asset in S3

If you’d like to maintain a separate store of data independent from (external to) Code Ocean, you can make use of a bucket that you own in AWS S3. Note that we advise against doing this with the S3 bucket that we provided to you to ease the transition from Posit; rather, this guidance is for a bucket that you have complete ownership over.

We can recommend the following data management flow for this case:

  1. Set up an external data asset linking to your bucket in S3 for data storage.

    1. The bucket name and path in an S3 URI correspond to the following: s3://{bucket name}/{path, including any applicable sub-folders}
  2. In your capsule, attach to data/ and read/copy data from it.

    1. At the start of a project, you may skip this step if you haven’t yet pulled your data from a database (e.g. with spatialmap_from_db).
  3. In your interactive Cloud Workstation session, operate on the local version of your data.

  4. Save updated data and intermediate files to scratch/ during session.

  5. When you reach a key milestone in analysis (e.g. finished QC, or finished clustering), move any data files that should be saved in a more persistent manner into results/, which will facilitate moving them to an external data asset. Put the Cloud Workstation on hold or shut it down.

    1. Warning: if any file name matches an existing file in the bucket where these are to be saved, they will be overwritten! You will NOT be warned when creating the new data asset!
  6. After hold/shut down is complete, use the Reproducibility pane to create a new data asset from the results snapshot.

    Right click on the Run / results snapshot >

    Right click on the Run / results snapshot > Create Data Asset

    1. Fill out appropriate info for external data asset (i.e., matching the S3 bucket/folder setup in Step 1).
    2. The new results data asset will now show up in the My Data tab, and the contents of the original folder on S3 and the original data asset will also have been updated. Since both the new and old data assets should point to the same location, they will contain identical data and one can be discarded.
    3. You may choose to re-index your original data asset, but this is not required to be able to access updated files there. It is simply to list the contents in the Code Ocean data asset UI.

    Return to Home Directory