Feature #3789
rewrite writeObject and readObject to rely on NativeTypeSerializer or a similar approach
0%
Related issues
History
#1 Updated by Constantin Asofiei over 5 years ago
Created task branch 3789a from trunk rev 11289.
This includes an initial approach of using NativeTypeSerializer
to send native arrays over the socket.
String
serialization - many cases usereadObject
andwriteObject
so that anull
value can be handled. This can be replaced with aNativeTypeSerializer
API which relies onreadUTF
andwriteUTF
(see bellow how to handlenull
values).null
values - evenNativeTypeSerializer
relies onreadObject
andwriteObject
to handlenull
values - this can be replaced by a customNativeTypeSerialized
sub-class which writes only its marker, and it will know to assume anull
value when reading the data, instead of serializing the entire object.BaseDataType
instances - there are cases whereread/writeObject
is used to serialize a BDT (seePutField.writeExternal
for an example); this can be solved writing a specific sub-class ofNativeTypeSerializer
, which writes its custom marker first, the class name to be instantiated, and after that relies onread/writeExternal
to serialize the BDT.
The idea behind this performance optimization is that writing and reading fully serialized objects is expensive in Java - we should rely on the Externalizable
approach always; write/readObject
will at some point use the write/readExternal
, if the object is a Externalizable
, but cutting down this overhead should help.
This performance issue was initially observed in #3741, where a testcase relies on lots of PUT
statements (10s of thousands).
#2 Updated by Constantin Asofiei over 5 years ago
The native array serializing changes are in 3789a rev 11290.
#3 Updated by Constantin Asofiei over 5 years ago
Another part to think about is the read/writeObject
in Protocol$Reader
and Writer
classes - this uses standard object serialization, because it doesn't know what kind of objects are sent as messages.
I'm not sure why this approach was taken, but in our case messages are taken via Queue.dequeueOutbound
, and this always returns a Message
instance. In turn, Message
instance is Externalizable
, but it contains a Serializable payload
- which relies on read/WriteObject
.
- if the instance is a
Externalizable
, it will work almost the same: write the marker, a flag that this isExternalizable
, the class name to be instantiated and read/write the data using theread/writeExternal
APIs. - otherwise, write the marker, and a
false
flag which will inform that we are not anExternalizable
, and rely onread/writeObject
to get the object.
#4 Updated by Eric Faulhaber about 5 years ago
- Related to Feature #4026: ensure all objects transmitted over the DAP implement Externalizable added
#5 Updated by Greg Shah almost 4 years ago
- Assignee set to Igor Skornyakov
#6 Updated by Igor Skornyakov almost 4 years ago
NativeTypeSerializer
) has at least two disadvantages:
- It required a lot of coding.
- Such serialization is difficult to maintain.
Maybe it is worth considering one of the Java serialization libraries? See a comparison of some at https://github.com/eishay/jvm-serializers/wiki. I had an experience with one such library in the past - Kryo
(https://github.com/iskorn/kryo). It was an application where the performance of the (de)serialization was very important and there were a lot of objects which required this. The library proved to be really very efficient.
Please note that a "shaded" version of Kryo
is already indirectly used by FWD (it is a part or the gremlin-shaded-3.2.3.jar
).
#7 Updated by Constantin Asofiei almost 4 years ago
Igor, how would Message.payload
be changed to use Kryo?
#8 Updated by Igor Skornyakov almost 4 years ago
Constantin Asofiei wrote:
Igor, how would
Message.payload
be changed to use Kryo?
Constantin,
I've not analysed Message.payload
yet. The Kryo (or other library) can be used to simplify implementation of the Externalizable
interface. In this case there still will be some overhead in using writeObject
for the Message.payload
serialization but it will be just the names of the classes if the payload is an array of Object
if all objects are serialized efficiently.