You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched in the issues and found nothing similar.
Motivation
Currently, the Table API of the Fluss client is not very clean and consistent. When supporting some new features, it becomes more and more complicated to extend the current Table API.
The problems of the current Table API:
limitScan, getLogScanner, getSnapshotScanner are all scanner, but APIs are very different.
Refactor SnapshotScanner
it is not an interface, we need a more general interface for bounded scan, no need to design a scanner interface for snapshot.
The snapshot scanner should be created from a snapshot id instead of snapshot files, because users don't know what files, snapshot files are internal objects.
getLookuper and getPrefixLookuper can be unified, there are no different on the lookuper interface, but only the configuring.
publicinterfaceTableextendsAutoCloseable {
ScannewScan();
LookupnewLookup();
AppendnewAppend();
UpsertnewUpsert();
}
/** * Used to configure and create a scanners to scan data for a table. * * <p>{@link Scan} objects are immutable and can be shared between threads. Refinement methods, like * {@link #project} and {@link #limit(int)}, create new Scan instances. */publicinterfaceScan {
Scanproject(@Nullableint[] projectedColumns);
Scanproject(List<String> projectedColumnNames);
Scanlimit(introwNumber);
LogScannercreateLogScanner();
BatchScannercreateBatchScanner(TableBuckettableBucket);
BatchScannercreateBatchScanner(TableBuckettableBucket, longsnapshotId);
}
// no changes to LogScannerpublicinterfaceBatchScannerextendsCloseable {
/** * Poll one batch records. The method should return null when reaching the end of the input. */@NullableCloseableIterator<InternalRow> pollBatch(Durationtimeout) throwsIOException;
}
publicinterfaceLookup {
LookuplookupBy(List<String> lookupColumnNames);
LookupercreateLookuper();
}
publicinterfaceLookuper {
CompletableFuture<LookupResult> lookup(InternalRowlookupKey);
}
publicinterfaceUpsert {
UpsertpartialUpdate(@Nullableint[] targetColumns);
UpsertpartialUpdate(String... targetColumnNames);
UpsertWritercreateWriter();
}
publicinterfaceUpsertWriterextendsTableWriter {
CompletableFuture<UpsertResult> upsert(InternalRowrow);
CompletableFuture<DeleteResult> delete(InternalRowrow);
}
UpsertWriterupsertWriter = table.newUpsert().createWriter();
// or with partial update specified columns// UpsertWriter upsertWriter = table.newUpsert().partialUpdate("a", "b").createWriter();upsertWriter.upsert(row1);
upsertWriter.delete(key1);
Scanning log data
LogScannerlogScanner = table.newScan().createLogScanner();
// or with projection pushdown// LogScanner logScanner = table.newScan().project(projectedFields).createLogScanner();logScanner.subscribeFromBeginning(bucketId);
ScanRecordsscanRecords = logScanner.poll();
...
Batch scan data with limit
BatchScannerscanner = table.newScan().limit(limitSize).createBatchScanner(tableBucket);
// or with projection pushdown// BatchScanner scanner = table.newScan().limit(limitSize).project(projectedFields).createBatchScanner(tableBucket);List<InternalRow> result = collectRows(scanner);
Scan snapshot data
BatchScannerscanner = table.newScan().createBatchScanner(tableBucket, snapshotId);
// or with projection pushdown// BatchScanner scanner = table.newScan().project(projectedFields).createBatchScanner(tableBucket, snapshotId);
...
In order to make the new snapshot scanner API work (table.newScan().createBatchScanner(tableBucket, snapshotId)), we need to refactor Admin interface and RPC a bit to get the latest snapshot id for a given table.
The existing CompletableFuture<KvSnapshotInfo> getKvSnapshot(TablePath tablePath) and CompletableFuture<PartitionSnapshotInfo> getPartitionSnapshot(TablePath tablePath, String partitionName) API will be removed, and the following two methods are introduced:
/** * Get the latest kv snapshots of the given table asynchronously. A kv snapshot is a snapshot of * a bucket of a primary key table at a certain point in time. Therefore, there are at-most * {@code N} snapshots for a primary key table, {@code N} is the number of buckets. * * <p>The following exceptions can be anticipated when calling {@code get()} on returned future. * * <ul> * <li>{@link TableNotExistException} if the table does not exist. * <li>{@link NonPrimaryKeyTableException} if the table is not a primary key table. * <li>{@link InvalidTableException} if the table is partitioned, use {@link * #getLatestKvSnapshots(TablePath, String)} instead to get the latest kv snapshot of a * partition of a partitioned table. * <li> * </ul> * * @param tablePath the table path of the table. */CompletableFuture<KvSnapshots> getLatestKvSnapshots(TablePathtablePath[, StringpartitionName]);
/** * Get the kv snapshot metadata of the given kv snapshot asynchronously. The kv snapshot * metadata including the snapshot files for the kv tablet and the log offset for the changelog * at the snapshot time. * * <p>The following exceptions can be anticipated when calling {@code get()} on returned future. * * <ul> * <li>{@link KvSnapshotNotExistException} if the snapshot does not exist. * </ul> * * @param bucket the table bucket of the kv snapshot. * @param snapshotId the snapshot id. */CompletableFuture<KvSnapshotMetadata> getKvSnapshotMetadata(
TableBucketbucket, longsnapshotId);
Corresponding RPC methods are required to add, and existing RPC methods (GetKvSnapshot, GetPartitionSnapshot) will be removed.
Willingness to contribute
I'm willing to submit a PR!
The text was updated successfully, but these errors were encountered:
Search before asking
Motivation
Currently, the
Table
API of the Fluss client is not very clean and consistent. When supporting some new features, it becomes more and more complicated to extend the currentTable
API.The problems of the current
Table
API:limitScan
,getLogScanner
,getSnapshotScanner
are all scanner, but APIs are very different.SnapshotScanner
getLookuper
andgetPrefixLookuper
can be unified, there are no different on the lookuper interface, but only the configuring.Solution
So here is a proposal to refactor the
Table
API.The new Table API:
How to use:
Anything else?
In order to make the new snapshot scanner API work (
table.newScan().createBatchScanner(tableBucket, snapshotId)
), we need to refactorAdmin
interface and RPC a bit to get the latest snapshot id for a given table.The existing
CompletableFuture<KvSnapshotInfo> getKvSnapshot(TablePath tablePath)
andCompletableFuture<PartitionSnapshotInfo> getPartitionSnapshot(TablePath tablePath, String partitionName)
API will be removed, and the following two methods are introduced:Corresponding RPC methods are required to add, and existing RPC methods (
GetKvSnapshot
,GetPartitionSnapshot
) will be removed.Willingness to contribute
The text was updated successfully, but these errors were encountered: