Skip to content
This repository has been archived by the owner on Sep 21, 2024. It is now read-only.

feat: Introduce a streaming way to encode BodyChunkIpld. Fixes #498 #586

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jsantell
Copy link
Contributor

  • Introduce streaming BodyChunkIpld::encode and BodyChunkIpld::decode methods.
  • Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk.
  • Mark non-streaming functions BodyChunkIpld::load_all_bytes and BodyChunkIpld::store_bytes as deprecated.
  • Remove BodyChunkDecoder (now implemented as BodyChunkIpld::decode.

Also bytes has been moved to a workspace dep

@jsantell jsantell requested a review from cdata August 10, 2023 20:22
@jsantell jsantell force-pushed the body-chunk-stream branch 3 times, most recently from 3c91d1d to 656e2f6 Compare August 10, 2023 22:51
@jsantell jsantell requested a review from cdata August 10, 2023 22:52
Copy link
Collaborator

@cdata cdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level this looks like the right approach. I think we should revise it to leverage a impl Store that is provided by [caller who wants to do the encoding] that is considered a persistence scratch space; that way the solution can address the problem of finite resources for any environment where we have some implementation for storage persistence.

Cargo.toml Outdated Show resolved Hide resolved
rust/noosphere-storage/src/implementation/web.rs Outdated Show resolved Hide resolved
rust/noosphere-storage/src/lib.rs Show resolved Hide resolved
rust/noosphere-storage/Cargo.toml Outdated Show resolved Hide resolved
@jsantell jsantell force-pushed the body-chunk-stream branch 5 times, most recently from c3c87f4 to 95e3007 Compare August 14, 2023 20:04
@jsantell jsantell requested a review from cdata August 14, 2023 20:12
@jsantell
Copy link
Contributor Author

jsantell commented Aug 14, 2023

Updates:

  • NativeStorage's scratch space cleans up on drop
  • WebStorage's scratch space is partitioned under SCRATCH_STORE and wiped on storage instantiation. (If we could easily async-drop, other solutions are available)
  • Storage now must implement Scratch -- we could roll these together (e.g. Storage::get_scratch_store), keep them independent (at the cost of Storage + Scratch everywhere), or keep this current middle ground.

@jsantell jsantell force-pushed the body-chunk-stream branch 2 times, most recently from 7c557f2 to 203073c Compare August 16, 2023 18:13
@jsantell jsantell force-pushed the body-chunk-stream branch 8 times, most recently from 554165d to 1108ed2 Compare August 17, 2023 21:58
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
@jsantell
Copy link
Contributor Author

The tests were timing out the wasm bindgen test runner (20s per crate) due to large size; decreased the size of unit tests (with customizable memory limit) and started work on a benchmark for future improvements in the encoding time

@jsantell jsantell requested a review from cdata August 18, 2023 16:59
Copy link
Collaborator

@cdata cdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited for this
giphy

pub fn reverse_stream<'a, St, S, T, E>(
stream: St,
provider: &'a S,
memory_limit: u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be usize?

/// forward or in reverse.
struct ReversableStore<T: Reversable, S: ReversableStorage<T>> {
item_count: usize,
byte_length: u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why not usize?


/// An accumulating store that can stream out its items
/// forward or in reverse.
struct ReversableStore<T: Reversable, S: ReversableStorage<T>> {
Copy link
Collaborator

@cdata cdata Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would like to see this re-imagined as a generic container-like thing. Let's call it StorageBuffer for the sake of me writing this comment, but it could be called something else.

StorageBuffer for the most part is implemented the same way, with a high-level distinction: it is not a Store/BlockStore itself. It wraps any S: Storage that it is given, and gives you a buffer you can push on to and stream in either direction.

My rationale is: this sort of tool should be chosen at the point where it is needed, by the code that needs it. BodyChunkIpld should always encode using a mechanism like this. It should not be a matter for the calling code to decide. And, while it makes use of storage, I don't know if we would ever want to think of it as storage itself (like, we would never want all of Noosphere storage to be some kind of SphereDb<ReversableStorage<_>>).

rust/noosphere-core/examples/encoding.rs Show resolved Hide resolved

/// Simple timer util to record duration of processing.
/// Does not support nested, overlapping, or duplicate time ranges.
struct EncodingBenchmark {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this overlap at all with the benchmarking work done in #623 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, similar constructs here; haven't updated this PR within that context, but some scaffolding can be shared here

@@ -304,7 +312,7 @@ where
#[cfg_attr(target_arch = "wasm32", async_trait(?Send))]
impl<S> UcanStore for SphereDb<S>
where
S: Storage,
S: Storage + Scratch,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: implied bound; this is probably redundant.

#[cfg_attr(target_arch = "wasm32", async_trait(?Send))]
impl<S> Scratch for SphereDb<S>
where
S: Storage + Scratch,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: implied bound; this is probably redundant.

#[derive(Clone)]
/// A [NativeStore] that does not persist data after dropping.
/// Can be created from [NativeStorage]'s [Scratch] implementation.
pub struct TempNativeStore {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Temporary preferred over Temp. Alternatively, I also like Ephemeral.

/// a generic `SCRATCH_STORE` table that all scratch stores use,
/// each partitioning keys by a random key prefix. In lieu of
/// removing these values on [TempWebStore] drop (no async drop), we
/// clear out the scratch storage on [WebStorage] instantiation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like us to think of ways that we might be able to drop the scratch space as it is no longer needed. For example: what if scratch storage could be dropped cooperatively by the user?


impl Drop for TempNativeStore {
fn drop(&mut self) {
if let Err(e) = self.db.drop_tree(&self.name) {
Copy link
Collaborator

@cdata cdata Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could better enforce this "storage disappears when dropped" aspect by saying something like:

pub trait DroppableStore {
  async fn drop_store(&mut self);
}

pub struct TemporaryStore where T: Store + DroppableStore {
  inner: OnceCell<T>
};

impl Store for TemporaryStore { /* ... */ }

impl Drop for TemporaryStore {
  fn drop(&mut self) {
    if let Some(mut store) = self.0.take() {
      noosphere_common::spawn(store.drop_store());
    }
  }
}

And later we can implement async drop for IndexedDbStore. Then, we also do a clean up check for droppable stores on every initialization of all storage layers, to cover the possibility that the application exited before async work could be scheduled.

Copy link
Contributor Author

@jsantell jsantell Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: excellent!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants