Project

General

Profile

Bug #5446

PUT UNFORMATTED performace (stream is flushed on each call)

Added by Constantin Asofiei almost 3 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
case_num:
version:

History

#1 Updated by Constantin Asofiei almost 3 years ago

This testcases executes in 4GL almost immediately:

def var ch as char.

ch = "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890".
ch = ch + ch.

def stream rpt.
output stream rpt to a.txt.

def var i as int.
do i = 1 to 100000:
   put stream rpt unformatted ch skip.
end.

output close.

In FWD, it takes minutes to complete. Even though there are 100k server-client roundtrips, the most consuming factor is the buffer flush in Stream.putWorker line 5785, which makes the buffer's 4096 bytes useless.

If this line is commented, FWD completes a lot faster (~20 times faster, as each line is 200 bytes and the buffer is 4096 bytes).

The question is: is the buffer flush for a PUT statement required in 4GL?

#4 Updated by Greg Shah over 2 years ago

The question is: is the buffer flush for a PUT statement required in 4GL?

I think we needed to downcall and flush for each statement for some use cases in the 4GL. If I recall correctly, it was related to child process streams. Anytime PUT UNFORMATTED is used on a stream created by OUTPUT THROUGH or INPUT-OUTPUT THROUGH, the pipeline won't work unless there is an immediate downcall AND a flush for each downcall.

Hopefully I'm not forgetting any other dependencies here. I think we can do a much better job for the common case where the downcall and flush is not needed. And I do want us to avoid both the downcall and the flush. We will need to implement a deeper automated refactoring of the code, where possible. The idea is to detect when it is safe to emit loops of PUT UNFORMATTED in a way that will perform acceptably.

This is happening for enough customer cases that I think we need to address it now.

#5 Updated by Constantin Asofiei over 2 years ago

Something I need to clarify: this flush in FWD is not a problem just for PUT UNFORMATTED (this is just the statement I kept seeing in the customer code).

This is related to any caller of Stream.putWorker, which are PUT, PUT CONTROL, PUT UNFORMATTED and EXPORT. All have the same performance problem, the stream is flushed on each call.

In OE on Windows, the buffer size looks to be around 1029 bytes (it varies if there are page-top header frames) - but this may be dependent on the OS.

My approach at this time would be to let the stream flush on putWorker only for non-file streams.

#6 Updated by Greg Shah over 2 years ago

This is related to any caller of Stream.putWorker, which are PUT, PUT CONTROL, PUT UNFORMATTED and EXPORT. All have the same performance problem, the stream is flushed on each call.

Yes, understood.

My approach at this time would be to let the stream flush on putWorker only for non-file streams.

Hmm. Good point. For the flushing part of the problem, we can do this at runtime, on the client. This seems reasonable and will provide a large, immediate improvement. We probably should add a isFileResource() abstract method to Stream to let each stream subclass report this.

Please run ChUI regression testing. It is very sensitive to changes in these statements.

In regard to the avoidance of round trips, we could buffer these on the server side for file streams. I guess StreamWrapper may be the right place for that. What do you think?

#7 Updated by Constantin Asofiei over 2 years ago

Greg Shah wrote:

In regard to the avoidance of round trips, we could buffer these on the server side for file streams. I guess StreamWrapper may be the right place for that. What do you think?

Yes, I was thinking of buffering the StreamWrapper.putWorker calls - but it is kind of tricky, because we can't just flush when the buffer is filled. We need to consider that a PUT can be mixed with any other stream output statement - so we need to flush this buffer before any other Stream API (or maybe just output-related APIs?) is executed.

Also, my current approach would be this: take the arguments from the putWorker(FieldEntry[] data, int mode, char delim), use NativeTypeSerializer to serialize them to a stream, and when a flush is needed just sent this stream to the FWD client to execute the putWorker operations. Serializing everything (especially FieldEntry instances) ensures we have the correct data to output (and any errors are reported when the PUT is attempted).

#8 Updated by Greg Shah over 2 years ago

Serializing everything (especially FieldEntry instances) ensures we have the correct data to output (and any errors are reported when the PUT is attempted).

Yes, that is smart.

We need to consider that a PUT can be mixed with any other stream output statement - so we need to flush this buffer before any other Stream API (or maybe just output-related APIs?) is executed.

Send to client in these cases:

  • server-side buffer is full
  • output stream is closed (implicitly or explicitly)
  • any other downcall

The downcall case can be handled using state synchronization like we do with the other UI features. This way it is always applied in the right order and there is no extra trip to the client. Code that doesn't intermix client calls will be very efficient, gated by the server side buffer size.

#9 Updated by Constantin Asofiei over 2 years ago

Greg Shah wrote:

The downcall case can be handled using state synchronization like we do with the other UI features. This way it is always applied in the right order and there is no extra trip to the client. Code that doesn't intermix client calls will be very efficient, gated by the server side buffer size.

Well, the scenario I'm looking at has exactly this: a single REPEAT loop which reads a file via IMPORT and writes it into another file via PUT. So using state synchronization doesn't work here - best way would be to somehow intercept any other API calls for the same stream, and flush only then... but I don't know if this is possible.

#10 Updated by Constantin Asofiei over 2 years ago

The only feasible way I see to make caching the StreamWrapper.putWorker calls work is this:
  • encapsulate all usage of StreamWrapper.stream in a getter
  • this getter will perform any flushing as needed
  • add a flag to StreamWrapper that it will indicate to cache or not the putWorker calls (this flag needs to be set to true only if the stream references a file, and not a process, terminal, etc).

#11 Updated by Greg Shah over 2 years ago

The only feasible way I see to make caching the StreamWrapper.putWorker calls work is this

It seems reasonable.

Also available in: Atom PDF