-
Notifications
You must be signed in to change notification settings - Fork 37
feat: Introduce a streaming way to encode BodyChunkIpld
. Fixes #498
#586
base: main
Are you sure you want to change the base?
Conversation
3c91d1d
to
656e2f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level this looks like the right approach. I think we should revise it to leverage a impl Store
that is provided by [caller who wants to do the encoding] that is considered a persistence scratch space; that way the solution can address the problem of finite resources for any environment where we have some implementation for storage persistence.
656e2f6
to
84d7ea2
Compare
84d7ea2
to
171ffea
Compare
c3c87f4
to
95e3007
Compare
Updates:
|
7c557f2
to
203073c
Compare
554165d
to
1108ed2
Compare
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
1108ed2
to
615dd41
Compare
The tests were timing out the wasm bindgen test runner (20s per crate) due to large size; decreased the size of unit tests (with customizable memory limit) and started work on a benchmark for future improvements in the encoding time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn reverse_stream<'a, St, S, T, E>( | ||
stream: St, | ||
provider: &'a S, | ||
memory_limit: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be usize
?
/// forward or in reverse. | ||
struct ReversableStore<T: Reversable, S: ReversableStorage<T>> { | ||
item_count: usize, | ||
byte_length: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, why not usize
?
|
||
/// An accumulating store that can stream out its items | ||
/// forward or in reverse. | ||
struct ReversableStore<T: Reversable, S: ReversableStorage<T>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would like to see this re-imagined as a generic container-like thing. Let's call it StorageBuffer
for the sake of me writing this comment, but it could be called something else.
StorageBuffer
for the most part is implemented the same way, with a high-level distinction: it is not a Store
/BlockStore
itself. It wraps any S: Storage
that it is given, and gives you a buffer you can push on to and stream in either direction.
My rationale is: this sort of tool should be chosen at the point where it is needed, by the code that needs it. BodyChunkIpld
should always encode using a mechanism like this. It should not be a matter for the calling code to decide. And, while it makes use of storage, I don't know if we would ever want to think of it as storage itself (like, we would never want all of Noosphere storage to be some kind of SphereDb<ReversableStorage<_>>
).
|
||
/// Simple timer util to record duration of processing. | ||
/// Does not support nested, overlapping, or duplicate time ranges. | ||
struct EncodingBenchmark { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this overlap at all with the benchmarking work done in #623 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, similar constructs here; haven't updated this PR within that context, but some scaffolding can be shared here
@@ -304,7 +312,7 @@ where | |||
#[cfg_attr(target_arch = "wasm32", async_trait(?Send))] | |||
impl<S> UcanStore for SphereDb<S> | |||
where | |||
S: Storage, | |||
S: Storage + Scratch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: implied bound; this is probably redundant.
#[cfg_attr(target_arch = "wasm32", async_trait(?Send))] | ||
impl<S> Scratch for SphereDb<S> | ||
where | ||
S: Storage + Scratch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: implied bound; this is probably redundant.
#[derive(Clone)] | ||
/// A [NativeStore] that does not persist data after dropping. | ||
/// Can be created from [NativeStorage]'s [Scratch] implementation. | ||
pub struct TempNativeStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Temporary
preferred over Temp
. Alternatively, I also like Ephemeral
.
/// a generic `SCRATCH_STORE` table that all scratch stores use, | ||
/// each partitioning keys by a random key prefix. In lieu of | ||
/// removing these values on [TempWebStore] drop (no async drop), we | ||
/// clear out the scratch storage on [WebStorage] instantiation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like us to think of ways that we might be able to drop the scratch space as it is no longer needed. For example: what if scratch storage could be dropped cooperatively by the user?
|
||
impl Drop for TempNativeStore { | ||
fn drop(&mut self) { | ||
if let Err(e) = self.db.drop_tree(&self.name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could better enforce this "storage disappears when dropped" aspect by saying something like:
pub trait DroppableStore {
async fn drop_store(&mut self);
}
pub struct TemporaryStore where T: Store + DroppableStore {
inner: OnceCell<T>
};
impl Store for TemporaryStore { /* ... */ }
impl Drop for TemporaryStore {
fn drop(&mut self) {
if let Some(mut store) = self.0.take() {
noosphere_common::spawn(store.drop_store());
}
}
}
And later we can implement async drop for IndexedDbStore
. Then, we also do a clean up check for droppable stores on every initialization of all storage layers, to cover the possibility that the application exited before async work could be scheduled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: excellent!
BodyChunkIpld::encode
andBodyChunkIpld::decode
methods.BodyChunkIpld::load_all_bytes
andBodyChunkIpld::store_bytes
as deprecated.BodyChunkDecoder
(now implemented asBodyChunkIpld::decode
.Also
bytes
has been moved to a workspace dep