Code Ocean imposes a standard capsule structure, with associated storage limits and other usage patterns. This document outlines recommendations for working with your data in Code Ocean capsules.
The Code Ocean UI shows files within a capsule on the left hand side. All capsules are given the same basic structure, with the following folders, each with an intended use case and size limitations:
code/
: Scripts, notebooks, and helpers. Data should not be stored here.data/
: Read-only input data that you use to operate your code on (attached as data assets, which do not count toward capsule storage limits)scratch/
: Read/write folder for in-progress data and intermediate files. Persists between sessions, and does not count towards capsule storage limits. Performance may suffer if too much is put into this folder.results/
: Read/write folder for key data milestones and final takeaways. Does not persist between sessions, but is snapshotted at the end of each session. Can be used to create a new data asset.See the Code Ocean user manual for more detail on the structure of a compute capsule’s default folders.
If you’d like to maintain a separate store of data independent from (external to) Code Ocean, you can make use of a bucket that you own in AWS S3. Note that we advise against doing this with the S3 bucket that we provided to you to ease the transition from Posit; rather, this guidance is for a bucket that you have complete ownership over.
We can recommend the following data management flow for this case:
Set up an external data asset linking to your bucket in S3 for data storage.
s3://{bucket name}/{path, including any applicable sub-folders}
In your capsule, attach to data/
and read/copy data from it.
spatialmap_from_db
).In your interactive Cloud Workstation session, operate on the local version of your data.
Save updated data and intermediate files to scratch/
during session.
When you reach a key milestone in analysis (e.g. finished QC, or finished clustering), move any data files that should be saved in a more persistent manner into results/
, which will facilitate moving them to an external data asset. Put the Cloud Workstation on hold or shut it down.
After hold/shut down is complete, use the Reproducibility pane to create a new data asset from the results snapshot.
Right click on the Run / results snapshot > Create Data Asset
Return to Home Directory