fix: Limit length of response body read to 4mb#2080
fix: Limit length of response body read to 4mb#2080kaylareopelle wants to merge 11 commits intoopen-telemetry:mainfrom
Conversation
Limiting the read size may help prevent memory exhaustion exploits when the configured collector endpoint is attacker-controlled.
35dfa8e to
f0bec26
Compare
| truncated = false | ||
|
|
||
| response.read_body do |chunk| | ||
| if body.bytesize + chunk.bytesize <= RESPONSE_BODY_LIMIT |
There was a problem hiding this comment.
Before starting to read the response body, should we not first read the content-length or transfer-encoding and short circuit if the body exceeds the limit?
IIUC the Go collector will switch to using chunked responses that exceed the 2KiB default buffer size. In cases where the response is less than 2KiB or the backend supports larger buffer sizes we may get the content-length responses and exit early.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Length
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Transfer-Encoding
There was a problem hiding this comment.
I like the idea of using the headers to make this more performant!
Just to make sure I'm on the same page, are you proposing:
- Use the transfer-encoding / content-length headers to determine body size
- Skip chunking when the body is less than the limit and just return the body
- Truncate the response using chunking for bodies that exceed the limit
Or are you suggesting something else?
There was a problem hiding this comment.
My understanding is that it is either or.
The response will either have a Content-Length or Transfer-Encoding Chunked
In the case the Content-Length is present and it exceeds the size then discard the response body and exit immediately.
If it's a chunked response use the code you've introduced here.
There was a problem hiding this comment.
In a hacky reproduction of the the problem (some malicious or misbehaving OTLP receiver responds with a large body), I still saw the whole large response body getting loaded into memory. The exporter's @http.request(request) call reads the full body before read_response_body runs, so the 4 MB cap never activates against a real HTTP server. The existing tests pass because WebMock stubs don't have real socket behavior. 😭
Adds a failing test that exercises the OTLP exporter's response body limit against a real TCPServer instead of WebMock stubs. WebMock patches Net::HTTP's read_body internals, so the existing stub-based tests pass even though the chunked reader doesn't work against real HTTP. With WebMock's adapter fully disabled, we see that Net::HTTP#request eagerly reads the full body before read_response_body runs, causing IOError: "read_body called twice". The IOError is the symptom — the actual problem is that the full oversized response body is already in memory by that point, defeating the 4 MB cap. Reproduces the problem described in review comments on PR open-telemetry#2080.
Adds a failing test that exercises the OTLP exporter's response body limit against a real TCPServer instead of WebMock stubs. WebMock patches Net::HTTP's read_body internals, so the existing stub-based tests pass even though the chunked reader doesn't work against real HTTP. With WebMock's adapter fully disabled, we see that Net::HTTP#request eagerly reads the full body before read_response_body runs, causing IOError: "read_body called twice". The IOError is the symptom — the actual problem is that the full oversized response body is already in memory by that point, defeating the 4 MB cap. Reproduces the problem described in review comments on PR open-telemetry#2080.
Adds a failing test that exercises the OTLP exporter's response body limit against a real TCPServer instead of WebMock stubs. WebMock patches Net::HTTP's read_body internals, so the existing stub-based tests pass even though the chunked reader doesn't work against real HTTP. With WebMock's adapter fully disabled, Net::HTTP#request eagerly reads the full body before read_response_body runs, causing IOError: "read_body called twice". The full response body is already in memory by that point, defeating the 4 MB cap.
Net::HTTP#request without a block eagerly reads the entire response
body into memory via reading_body's self.body call.
Refactor send_bytes to use @http.request(request) { |response| }
so the case statement runs before the body is read. The error path
uses streaming access for the chunked reader. Non-error paths
drain-and-discard with read_body { |_| } to preserve keep-alive
connections (read_body(nil) leaves the socket undrained).
redo doesn't work from within a nested block, so add a should_redo
flag to trigger it.
Got past the IOError from trying to read the body twice, verify the chunked reader hit the 4 MB cap against a real HTTP response.
After read_response_body breaks out of the chunked read at 4 MB, Net::HTTP's reading_body still calls self.body which reads more data from the socket into memory. Our String is capped but response.body held 5.8 MB against a 10 MB response. Spies on the exporter to observe response.body size after read_response_body returns.
When read_response_body truncates at the 4MB limit, Net::HTTP's reading_body calls self.body after our block returns. This re-reads the entire remaining response from the socket into memory. Our break doesn't prevent this because reading_body has its "ensure to read body" safety net that fires before the socket closes. Calling @http.finish before break closes the socket. reading_body's self.body call hits a closed stream instead of re-reading. The resulting IOError is rescued only when truncation caused it. Unexpected IOErrors still propagate.
send_bytes()'s sad path needs to know if the chucking reader truncated the response body so it can log that appropriately. Instead of managing a shared-state instance variable, return that state from the reader and pass it into the logger.
Decode will raise an error on a truncated body and then the exporter will log a confusing "unexpected error decoding rpc.Status".
A truncated protobuf is invalid binary. Passing it to Google::Rpc::Status.decode raises "unexpected error decoding rpc.Status" which is rescued by our own error handling. The resulting log output about decode error mentions nothing about a known-truncated body. When we know the body was truncated, log a direct oversized-body message and return early.
fix: refactor response reader
| return ['', false] if response.nil? | ||
|
|
||
| body, truncated = collect_response_chunks(response) | ||
| body.force_encoding('UTF-8') |
There was a problem hiding this comment.
If the response is truncated and the code points will be incomplete and could cause an invalid byte sequence when trying to translate it from binary to a String unicode sequence.
e.g. if the last 8 bytes is multibyte but was cut off it will fail. E.g. take the number 1 in Chinese character U+4E00...
11100100 10111000 10000000
the first byte in that sequence expects to be followed by two bytes but when truncated it is
11100100 10111000
IIUC the force_encoding method only changes encoding attribute in the String and does not actually attempt to validate it. It will fail once someone tries to read the string ... effectively to false advertising. The user of the body will then have to guard and ensure the encoding is valid or not or handle the invalid byte sequences.
# U+4E00 (一, yī) is encoded in UTF-8 as 3 bytes: E4 B8 80
# Here we only have the first 2 bytes — the sequence is incomplete.
truncated = "\xE4\xB8".b
truncated.force_encoding("UTF-8")
puts truncated.encoding #=> UTF-8
puts truncated.valid_encoding? #=> false
begin
puts truncated.encode("UTF-8")
rescue Encoding::InvalidByteSequenceError => e
puts "Error: #{e.message}"
#=> Error: "\xE4\xB8" on UTF-8
end
What should we do?
I am not sure but I am not a fan of flow and control by exception. If the body is truncated perhaps we should return the string with placeholders for invalid byte sequences or a blank string?
| [body, truncated] | ||
| rescue StandardError => e | ||
| OpenTelemetry.handle_error(exception: e, message: 'error reading response body') | ||
| ['', false] |
There was a problem hiding this comment.
This does not make sense to me. Does returning an empty string effectively mean we are truncating the string in its entirety?
Won't the protobuf parser be unhappy with an empty string?
| client.read(content_length) if content_length > 0 | ||
|
|
||
| client.print "HTTP/1.1 #{status} Bad Request\r\n" | ||
| client.print "Content-Type: application/octet-stream\r\n" |
There was a problem hiding this comment.
IIUC, this is incorrect per the contract of the otel collector. It will respond with the same content type it received, which in our case is application/x-protobuf.
The exporter should probably check the response header content-type to ensure it is getting back when it requested.
If for some reason an intermediate response with say text/plain then we know something went wrong and should not attempt to parse the body as protobuf anyway.
|
|
||
| client.print "HTTP/1.1 #{status} Bad Request\r\n" | ||
| client.print "Content-Type: application/octet-stream\r\n" | ||
| client.print "Content-Length: #{body.bytesize}\r\n" |
There was a problem hiding this comment.
Can you add a test case for Transfer-Encoding: chunked without the Content-Length header which will likely be the most common response from the collecter?
Our exporters read HTTP response bodies without any limit. A misconfigured or misbehaving server could send an arbitrarily large response, causing the exporter to read it all into memory.
This implements a 4 MB response body limit also implemented by:
Based on https://cwe.mitre.org/data/definitions/789.html
Fixes #2079
Combined with kaylareopelle#15 by @robbkidd