Skip to content

Help with strings in byte buffer slices#409

Open
tjpalmer wants to merge 6 commits intomainfrom
struct-encode-string
Open

Help with strings in byte buffer slices#409
tjpalmer wants to merge 6 commits intomainfrom
struct-encode-string

Conversation

@tjpalmer
Copy link
Copy Markdown
Contributor

@tjpalmer tjpalmer commented Apr 27, 2026

  • Support external and possible future needs for easy string storage in struct slices
  • Allow encoding into and decoding from byte buffer slices in single statements and/or expressions
  • Run direct mvn tests for be-java temper-core
  • The unit tests are mostly written by AI, but they look reasonable to me

Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
ByteBuffer source,
int sourceStart,
int sourceLength,
CharsetDecoder decoder
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The charset decoder allows configuring how to handle incomplete chars at the end.

CharsetDecoder decoder
) throws CharacterCodingException {
if (decoder == null) {
decoder = StandardCharsets.UTF_8.newDecoder();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely either want platform encoding or utf8 in most cases. Probably depends on how often we expect to want structs for interchange vs just in process.

String s,
ByteBuffer target,
int targetStart,
int targetLength,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like length is better than end here since this is about fitting field definitions in data structs.

int targetStart,
int targetLength,
CharsetEncoder encoder,
byte padByte
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make overloads that default coder to null and pad to zero, but I haven't done that yet.

while (target.hasRemaining()) {
target.put(padByte);
}
return written;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning bytes written requires comparing with wanted to know if it all fit. Maybe different overloads to report different things might be useful?

CharsetDecoder decoder = StandardCharsets.ISO_8859_1.newDecoder();
String result = Core.decodeFromSlice(buffer, 0, 1, decoder);
assertEquals("£", result, "Should decode correctly using Latin-1");
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I only got utf8 testing, so I requested others.

String text = "✨😀";
ByteBuffer buffer = ByteBuffer.allocate(10);
// Slice of 5 bytes at offset 0
// Result: "✨" (3 bytes) fits, "😀" (4 bytes) fails, 2 bytes padding
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also specifically requested a case for a partially fitting multibyte char.

byte[] data = buffer.array();
assertEquals((byte)0xA3, data[0], "Latin-1 encoding for £");
assertEquals((byte)'?', data[1], "Replacement char for unmappable emoji");
assertEquals((byte)'.', data[2], "Padding");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess all this demonstrates why both "how many bytes got used" and "did the entire string fit" are both potentially interesting questions, depending on someone's use case. I don't know how to return both without allocation. Or maybe someone could pass in an object to receive the info that could be reused throughout a loop.

But what I have so far is likely good enough for my current needs.

<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.0.0</version>
</plugin>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the same versions that we generate into temper-built java projects.

Comment thread be-java/build.gradle

tasks.named("check") {
dependsOn tasks.named("testJavaTemperCore")
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test results here.

[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.144 s - in temper.core.SliceCoderTest

@tjpalmer tjpalmer marked this pull request as ready for review April 27, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant