Conversation
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
Signed-off-by: Tom <tom@temper.systems>
| ByteBuffer source, | ||
| int sourceStart, | ||
| int sourceLength, | ||
| CharsetDecoder decoder |
There was a problem hiding this comment.
The charset decoder allows configuring how to handle incomplete chars at the end.
| CharsetDecoder decoder | ||
| ) throws CharacterCodingException { | ||
| if (decoder == null) { | ||
| decoder = StandardCharsets.UTF_8.newDecoder(); |
There was a problem hiding this comment.
We likely either want platform encoding or utf8 in most cases. Probably depends on how often we expect to want structs for interchange vs just in process.
| String s, | ||
| ByteBuffer target, | ||
| int targetStart, | ||
| int targetLength, |
There was a problem hiding this comment.
Seems like length is better than end here since this is about fitting field definitions in data structs.
| int targetStart, | ||
| int targetLength, | ||
| CharsetEncoder encoder, | ||
| byte padByte |
There was a problem hiding this comment.
Could make overloads that default coder to null and pad to zero, but I haven't done that yet.
| while (target.hasRemaining()) { | ||
| target.put(padByte); | ||
| } | ||
| return written; |
There was a problem hiding this comment.
Returning bytes written requires comparing with wanted to know if it all fit. Maybe different overloads to report different things might be useful?
| CharsetDecoder decoder = StandardCharsets.ISO_8859_1.newDecoder(); | ||
| String result = Core.decodeFromSlice(buffer, 0, 1, decoder); | ||
| assertEquals("£", result, "Should decode correctly using Latin-1"); | ||
| } |
There was a problem hiding this comment.
At first, I only got utf8 testing, so I requested others.
| String text = "✨😀"; | ||
| ByteBuffer buffer = ByteBuffer.allocate(10); | ||
| // Slice of 5 bytes at offset 0 | ||
| // Result: "✨" (3 bytes) fits, "😀" (4 bytes) fails, 2 bytes padding |
There was a problem hiding this comment.
I also specifically requested a case for a partially fitting multibyte char.
| byte[] data = buffer.array(); | ||
| assertEquals((byte)0xA3, data[0], "Latin-1 encoding for £"); | ||
| assertEquals((byte)'?', data[1], "Replacement char for unmappable emoji"); | ||
| assertEquals((byte)'.', data[2], "Padding"); |
There was a problem hiding this comment.
I guess all this demonstrates why both "how many bytes got used" and "did the entire string fit" are both potentially interesting questions, depending on someone's use case. I don't know how to return both without allocation. Or maybe someone could pass in an object to receive the info that could be reused throughout a loop.
But what I have so far is likely good enough for my current needs.
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-surefire-plugin</artifactId> | ||
| <version>3.0.0</version> | ||
| </plugin> |
There was a problem hiding this comment.
I used the same versions that we generate into temper-built java projects.
|
|
||
| tasks.named("check") { | ||
| dependsOn tasks.named("testJavaTemperCore") | ||
| } |
There was a problem hiding this comment.
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.144 s - in temper.core.SliceCoderTest
Uh oh!
There was an error while loading. Please reload this page.