Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spec][Doc] fury cross-language serialization specification proposal #1418

Closed
chaokunyang opened this issue Mar 22, 2024 · 0 comments · Fixed by #1413
Closed

[Spec][Doc] fury cross-language serialization specification proposal #1418

chaokunyang opened this issue Mar 22, 2024 · 0 comments · Fixed by #1413
Assignees
Labels
enhancement New feature or request

Comments

@chaokunyang
Copy link
Collaborator

chaokunyang commented Mar 22, 2024

Is your feature request related to a problem? Please describe.

We've standardized java serialization spec in #1240, but the cross-language serialzation spec has never been formulized.

The current implementation of fury xlang serialization across multiple languages are all based on the code in one of the languages. It's not complete and pone to inconsistencies.

And if some one want to implement Fury for a new language such as Fury C# in #686, he must read all Fury java serialization code. This would be a huge burden for new developers. Not even to say someone may don't write java either.

Another thing is that our xlang serialization is not standardized, we can't have a foundation to discuss how to improve our protocol too.

And our current xlang serialization has many places to improve, such as it didn't resolve the type inconsistencies between languages. Such things should be resolved too.

Describe the solution you'd like

We should design a new protocol for Fury and standardized it as a document.

Additional context

Serialization frameworks such as arrow/avro/hession/thrift/flatbuffer/msgpack all have a serialization spec:

@chaokunyang chaokunyang added the enhancement New feature or request label Mar 22, 2024
@chaokunyang chaokunyang changed the title [Spec] standardizing fury cross-language serialization specification [Spec][Doc] standardizing fury cross-language serialization specification Mar 22, 2024
@chaokunyang chaokunyang self-assigned this Mar 26, 2024
@chaokunyang chaokunyang changed the title [Spec][Doc] standardizing fury cross-language serialization specification [Spec][Doc] fury cross-language serialization specification proposal Mar 26, 2024
chaokunyang added a commit that referenced this issue Mar 30, 2024
…tion (#1413)

## What does this PR do?

This PR standardizes fury cross-language serialization specification. It
comes with following changes:
- Remove type tag from the protocol since it introduce space and
performance overhead to the implementation. The `type tag` version can
be seen in
https://github.com/apache/incubator-fury/blob/6ea2e0b83d5449d63ca62296ff0dfd67b96c5bc5/docs/protocols/xlang_object_graph_spec.md
.
- Fury preserves `0~63` for internal types, but let users register type
by id from `0`(added by 64 automatically) to setup type mapping between
languages.
- Streamline the type systems, only
`bool/byte/i16/i32/i64/half-float/float/double/string/enum/list/set/map/Duration/Timestamp/decimal/binary/array/tensor/sparse/tensor/arrow/record/batch/arrow/table`
are allowed.
- Formulized the binary format for above types.
- Add type disambiguation: the deserialization are determined by data
type in serialized binary and target type jointly.
- Introduce meta string encoding algorithm for field name to reduce
space cost by 3/8.
- Introduce schema consist mode format for struct.
- Introduce schema envolution mode for struct: 
- this mode can embeed meta in the data or share across multiple
messages,
- it can avoid the cost of type tag comparison in frameworks like
protobuf

This protocol also supports object inheriance for xlang serializaiton.
This is a feature request that users has been discussed for a long time
in protobuf/flatbuffer:
- google/flatbuffers#4006
- protocolbuffers/protobuf#5645

Although there are some languages such as `rust/golang` doesn't support
inheriance, there are many cases only langauges like
`java/c#/python/javascript` are involved, and the support for inheriance
is not complexed in the protocol level, so we added the inheriance
support in the protocol. And in languages such as `rust/golang`, we can
use some annotation to mark composition field as parent class for
serialization layout, or we can disable inheriance foor such languages
at the protocol level.
 
The protocol support polymorphic natively by type id, so I don't include
types such as `OneOf/Union`. With this protocol, you can even serialize
multiple rust `dyn trait` object which implement same trait., and get
exactly the same objects when deserialization.

## Related issue
This PR Closes #1418

---------

Co-authored-by: Twice <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant