From d40b2703a87bddc99db43e38a23a4be47400681e Mon Sep 17 00:00:00 2001 From: Scott Myron Date: Tue, 28 Oct 2025 08:37:32 +0100 Subject: [PATCH] Use Vector API in the Java Extension MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Overview This PR uses the [jdk.incubator.vector module](https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/package-summary.html) as mentioned in [issue #739](https://github.com/ruby/json/issues/739) to accelerate generating JSON with the same algorithm as the C extension. The PR as it exists right now, it will attempt to build the `json.ext.VectorizedEscapeScanner` class with a target release of `16`. This is the first version of Java with support for the `jdk.incubator.vector` module. The remaining code is built for Java 1.8. The code will attempt to load the `json.ext.VectorizedEscapeScanner` only if the `json.enableVectorizedEscapeScanner` system property is set to `true` (or `1`). I'm not entirely sure how this is packaged / included with JRuby so I'd love @byroot and @headius's (and others?) thought about how to potential package and/or structure the JARs. I did consider adding the `json.ext.VectorizedEscapeScanner` to a separate `generator-vectorized.jar` but I thought I'd solicit feedback before spending any more time on the build / package process. Benchmarks Machine M1 Macbook Air Note: I've had trouble modifying the `compare.rb` I was using for the C extension to work reliability with the Java extension. I'll probably spend more time trying to get it to work, but as of right now these are pretty raw benchmarks. Below are two sample runs of the real-world benchmarks. The benchmarks are much more variable then the C extension for some reason. I'm not sure if HotSpot is doing something slightly different per execution. Vector API Enabled ``` scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=true' ruby -I"lib" benchmark/encoder-realworld.rb WARNING: Using incubator modules: jdk.incubator.vector == Encoding activitypub.json (52595 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 1.384k i/100ms Calculating ------------------------------------- json 15.289k (± 0.8%) i/s (65.41 μs/i) - 153.624k in 10.048481s == Encoding citm_catalog.json (500298 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 76.000 i/100ms Calculating ------------------------------------- json 753.787 (± 3.6%) i/s (1.33 ms/i) - 7.524k in 9.997059s == Encoding twitter.json (466906 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 173.000 i/100ms Calculating ------------------------------------- json 1.751k (± 1.1%) i/s (571.24 μs/i) - 17.646k in 10.081260s == Encoding ohai.json (20147 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 2.390k i/100ms Calculating ------------------------------------- json 23.829k (± 0.8%) i/s (41.97 μs/i) - 239.000k in 10.030503s ``` Vector API Disabled ``` scott@Scotts-MacBook-Air json % ONLY=json JAVA_OPTS='--add-modules jdk.incubator.vector -Djson.enableVectorizedEscapeScanner=false' ruby -I"lib" benchmark/encoder-realworld.rb WARNING: Using incubator modules: jdk.incubator.vector VectorizedEscapeScanner disabled. == Encoding activitypub.json (52595 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 1.204k i/100ms Calculating ------------------------------------- json 12.937k (± 1.1%) i/s (77.30 μs/i) - 130.032k in 10.052234s == Encoding citm_catalog.json (500298 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 80.000 i/100ms Calculating ------------------------------------- json 817.378 (± 1.0%) i/s (1.22 ms/i) - 8.240k in 10.082058s == Encoding twitter.json (466906 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 147.000 i/100ms Calculating ------------------------------------- json 1.499k (± 1.3%) i/s (667.08 μs/i) - 14.994k in 10.004181s == Encoding ohai.json (20147 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 2.269k i/100ms Calculating ------------------------------------- json 22.366k (± 5.7%) i/s (44.71 μs/i) - 224.631k in 10.097069s ``` `master` as of commit `c5af1b68c582335c2a82bbc4bfa5b3e41ead1eba` ``` scott@Scotts-MacBook-Air json % ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb == Encoding activitypub.json (52595 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 886.000 i/100ms Calculating ------------------------------------- json^C% scott@Scotts-MacBook-Air json % ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb == Encoding activitypub.json (52595 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 1.031k i/100ms Calculating ------------------------------------- json 10.812k (± 1.3%) i/s (92.49 μs/i) - 108.255k in 10.014260s == Encoding citm_catalog.json (500298 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 82.000 i/100ms Calculating ------------------------------------- json 824.921 (± 1.0%) i/s (1.21 ms/i) - 8.282k in 10.040787s == Encoding twitter.json (466906 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 141.000 i/100ms Calculating ------------------------------------- json 1.421k (± 0.7%) i/s (703.85 μs/i) - 14.241k in 10.023979s == Encoding ohai.json (20147 bytes) jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a Java HotSpot(TM) 64-Bit Server VM 21.0.7+8-LTS-245 on 21.0.7+8-LTS-245 +jit [arm64-darwin] Warming up -------------------------------------- json 2.274k i/100ms Calculating ------------------------------------- json 22.612k (± 0.9%) i/s (44.22 μs/i) - 227.400k in 10.057516s ``` Observations `activitypub.json` and `twitter.json` seem to be consistently faster with the Vector API enabled. `citm_catalog.json` seems consistently a bit slower and `ohai.json` is fairly close to even. --- Rakefile | 21 +++- .../AbstractByteListDirectOutputStream.java | 2 - java/src/json/ext/SWARBasicStringEncoder.java | 4 +- java/src/json/ext/StringEncoder.java | 37 ++++++- .../src/json/ext/VectorizedStringEncoder.java | 104 ++++++++++++++++++ test/json/json_encoding_test.rb | 4 + 6 files changed, 163 insertions(+), 9 deletions(-) create mode 100644 java/src/json/ext/VectorizedStringEncoder.java diff --git a/Rakefile b/Rakefile index 9681be0d..d09e8194 100644 --- a/Rakefile +++ b/Rakefile @@ -86,7 +86,8 @@ end JAVA_DIR = "java/src/json/ext" JAVA_RAGEL_PATH = "#{JAVA_DIR}/ParserConfig.rl" JAVA_PARSER_SRC = "#{JAVA_DIR}/ParserConfig.java" -JAVA_SOURCES = FileList["#{JAVA_DIR}/*.java"] +JAVA_SOURCES = FileList["#{JAVA_DIR}/*.java"].exclude("#{JAVA_DIR}/Vectorized*.java") +JAVA_VEC_SOURCES = FileList["#{JAVA_DIR}/Vectorized*.java"] JAVA_CLASSES = [] JRUBY_PARSER_JAR = File.expand_path("lib/json/ext/parser.jar") JRUBY_GENERATOR_JAR = File.expand_path("lib/json/ext/generator.jar") @@ -142,8 +143,8 @@ if defined?(RUBY_ENGINE) and RUBY_ENGINE == 'jruby' JRUBY_JAR = File.join(CONFIG["libdir"], "jruby.jar") if File.exist?(JRUBY_JAR) + classpath = (Dir['java/lib/*.jar'] << 'java/src' << JRUBY_JAR) * path_separator JAVA_SOURCES.each do |src| - classpath = (Dir['java/lib/*.jar'] << 'java/src' << JRUBY_JAR) * path_separator obj = src.sub(/\.java\Z/, '.class') file obj => src do if File.exist?(File.join(ENV['JAVA_HOME'], "lib", "modules")) @@ -154,6 +155,20 @@ if defined?(RUBY_ENGINE) and RUBY_ENGINE == 'jruby' end JAVA_CLASSES << obj end + + JAVA_VEC_SOURCES.each do |src| + obj = src.sub(/\.java\Z/, '.class') + file obj => src do + sh 'javac', '--add-modules', 'jdk.incubator.vector', '-classpath', classpath, '--release', '16', src do |success, status| + if success + puts "*** 'jdk.incubator.vector' support enabled ***" + else + puts "*** 'jdk.incubator.vector' support disabled ***" + end + end + end + JAVA_CLASSES << obj + end else warn "WARNING: Cannot find jruby in path => Cannot build jruby extension!" end @@ -199,11 +214,13 @@ if defined?(RUBY_ENGINE) and RUBY_ENGINE == 'jruby' generator_classes = FileList[ "json/ext/*ByteList*.class", "json/ext/OptionsReader*.class", + "json/ext/EscapeScanner*.class", "json/ext/Generator*.class", "json/ext/RuntimeInfo*.class", "json/ext/*StringEncoder*.class", "json/ext/Utils*.class" ] + puts "Creating generator jar with classes: #{generator_classes.join(', ')}" sh 'jar', 'cf', File.basename(JRUBY_GENERATOR_JAR), *generator_classes mv File.basename(JRUBY_GENERATOR_JAR), File.dirname(JRUBY_GENERATOR_JAR) end diff --git a/java/src/json/ext/AbstractByteListDirectOutputStream.java b/java/src/json/ext/AbstractByteListDirectOutputStream.java index c3175c84..b94d7f59 100644 --- a/java/src/json/ext/AbstractByteListDirectOutputStream.java +++ b/java/src/json/ext/AbstractByteListDirectOutputStream.java @@ -15,8 +15,6 @@ abstract class AbstractByteListDirectOutputStream extends OutputStream { static { String useSegmentedOutputStream = System.getProperty(PROP_SEGMENTED_BUFFER, PROP_SEGMENTED_BUFFER_DEFAULT); USE_SEGMENTED_BUFFER = Boolean.parseBoolean(useSegmentedOutputStream); - // XXX Is there a logger we can use here? - // System.out.println("Using segmented output stream: " + USE_SEGMENTED_BUFFER); } public static AbstractByteListDirectOutputStream create(int estimatedSize) { diff --git a/java/src/json/ext/SWARBasicStringEncoder.java b/java/src/json/ext/SWARBasicStringEncoder.java index a6695d99..bd0d1a70 100644 --- a/java/src/json/ext/SWARBasicStringEncoder.java +++ b/java/src/json/ext/SWARBasicStringEncoder.java @@ -71,7 +71,7 @@ void encode(ByteList src) throws IOException { } } - private boolean skipChunk(long x) { + boolean skipChunk(long x) { long is_ascii = 0x8080808080808080L & ~x; long xor2 = x ^ 0x0202020202020202L; long lt32_or_eq34 = xor2 - 0x2121212121212121L; @@ -80,7 +80,7 @@ private boolean skipChunk(long x) { return ((lt32_or_eq34 | eq92) & is_ascii) == 0; } - private boolean skipChunk(int x) { + boolean skipChunk(int x) { int is_ascii = 0x80808080 & ~x; int xor2 = x ^ 0x02020202; int lt32_or_eq34 = xor2 - 0x21212121; diff --git a/java/src/json/ext/StringEncoder.java b/java/src/json/ext/StringEncoder.java index 7f75476d..b874ad78 100644 --- a/java/src/json/ext/StringEncoder.java +++ b/java/src/json/ext/StringEncoder.java @@ -7,6 +7,8 @@ import java.io.IOException; import java.io.OutputStream; +import java.lang.reflect.Constructor; +import java.lang.reflect.InvocationTargetException; import java.nio.charset.StandardCharsets; import org.jcodings.Encoding; @@ -114,15 +116,37 @@ class StringEncoder extends ByteListTranscoder { protected final byte[] escapeTable; + private static final String VECTORIZED_STRING_ENCODER_CLASS = "json.ext.VectorizedStringEncoder"; + private static final String USE_VECTORIZED_BASIC_ENCODER_PROP = "jruby.json.useVectorizedBasicEncoder"; + private static final String USE_VECTORIZED_BASIC_ENCODER_DEFAULT = "false"; + private static final boolean USE_VECTORIZED_BASIC_ENCODER; + private static final StringEncoder VECTORIZED_SCANNER; + private static final String USE_SWAR_BASIC_ENCODER_PROP = "jruby.json.useSWARBasicEncoder"; private static final String USE_SWAR_BASIC_ENCODER_DEFAULT = "true"; private static final boolean USE_BASIC_SWAR_ENCODER; static { + String enableVectorizedScanner = System.getProperty(USE_VECTORIZED_BASIC_ENCODER_PROP, USE_VECTORIZED_BASIC_ENCODER_DEFAULT); + if ("true".equalsIgnoreCase(enableVectorizedScanner) || "1".equalsIgnoreCase(enableVectorizedScanner)) { + StringEncoder scanner; + try { + Class vectorizedStringEncoderClass = StringEncoder.class.getClassLoader().loadClass(VECTORIZED_STRING_ENCODER_CLASS); + Constructor vectorizedStringEncoderConstructor = vectorizedStringEncoderClass.getDeclaredConstructor(); + scanner = (StringEncoder) vectorizedStringEncoderConstructor.newInstance(); + } catch (ClassNotFoundException | NoSuchMethodException | InstantiationException | IllegalAccessException | InvocationTargetException e) { + // Fallback to the StringEncoder if we cannot load the VectorizedStringEncoder. + scanner = null; + } + VECTORIZED_SCANNER = scanner; + USE_VECTORIZED_BASIC_ENCODER = scanner != null; + } else { + VECTORIZED_SCANNER = null; + USE_VECTORIZED_BASIC_ENCODER = false; + } + USE_BASIC_SWAR_ENCODER = Boolean.parseBoolean( System.getProperty(USE_SWAR_BASIC_ENCODER_PROP, USE_SWAR_BASIC_ENCODER_DEFAULT)); - // XXX Is there a logger we can use here? - // System.out.println("Using SWAR basic encoder: " + USE_BASIC_SWAR_ENCODER); } OutputStream out; @@ -149,8 +173,15 @@ class StringEncoder extends ByteListTranscoder { this.escapeTable = escapeTable; } + @Override + public StringEncoder clone() { + return new StringEncoder(escapeTable); + } + static StringEncoder createBasicEncoder() { - if (USE_BASIC_SWAR_ENCODER) { + if (USE_VECTORIZED_BASIC_ENCODER) { + return (StringEncoder) VECTORIZED_SCANNER.clone(); + } else if (USE_BASIC_SWAR_ENCODER) { return new SWARBasicStringEncoder(); } else { return new StringEncoder(false); diff --git a/java/src/json/ext/VectorizedStringEncoder.java b/java/src/json/ext/VectorizedStringEncoder.java new file mode 100644 index 00000000..14b3c8d7 --- /dev/null +++ b/java/src/json/ext/VectorizedStringEncoder.java @@ -0,0 +1,104 @@ +package json.ext; + +import java.io.IOException; +import java.nio.ByteBuffer; + +import org.jruby.util.ByteList; + +import jdk.incubator.vector.ByteVector; +import jdk.incubator.vector.VectorMask; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorSpecies; + +class VectorizedStringEncoder extends SWARBasicStringEncoder { + private static final VectorSpecies SP = ByteVector.SPECIES_PREFERRED; + private static final ByteVector ZERO = ByteVector.zero(SP); + private static final ByteVector TWO = ByteVector.broadcast(SP, 2); + private static final ByteVector THIRTY_THREE = ByteVector.broadcast(SP, 33); + private static final ByteVector BACKSLASH = ByteVector.broadcast(SP, '\\'); + + @Override + public StringEncoder clone() { + return new VectorizedStringEncoder(); + } + + @Override + void encode(ByteList src) throws IOException { + byte[] ptrBytes = src.unsafeBytes(); + int ptr = src.begin(); + int len = src.realSize(); + int beg = 0; + int pos = ptr; + + while ((pos + SP.length() <= len)) { + ByteVector chunk = ByteVector.fromArray(SP, ptrBytes, ptr + pos); + // bytes are signed in java, so we need to remove negative values + VectorMask negative = chunk.lt(ZERO); + VectorMask tooLowOrDblQuote = chunk.lanewise(VectorOperators.XOR, TWO).lt(THIRTY_THREE).andNot(negative); + VectorMask needsEscape = chunk.eq(BACKSLASH).or(tooLowOrDblQuote); + if (needsEscape.anyTrue()) { + int chunkStart = pos; + long mask = needsEscape.toLong(); + + while(mask > 0) { + // nextMatch inlined + int index = Long.numberOfTrailingZeros(mask); + mask &= (mask - 1); + pos = chunkStart + index; + int ch = Byte.toUnsignedInt(ptrBytes[ptr + pos]); + + beg = pos = flushPos(pos, beg, ptrBytes, ptr, 1); + escapeAscii(ch, aux, HEX); + } + + // Skip over any remaining characters in the current chunk + pos = chunkStart + SP.length(); + continue; + } + + pos += SP.length(); + } + + ByteBuffer bb = ByteBuffer.wrap(ptrBytes, ptr, len); + if (pos + 8 <= len) { + long x = bb.getLong(ptr + pos); + if (skipChunk(x)) { + pos += 8; + } else { + int chunkEnd = ptr + pos + 8; + while (pos < chunkEnd) { + int ch = Byte.toUnsignedInt(ptrBytes[ptr + pos]); + int ch_len = ESCAPE_TABLE[ch]; + if (ch_len > 0) { + beg = pos = flushPos(pos, beg, ptrBytes, ptr, 1); + escapeAscii(ch, aux, HEX); + } else { + pos++; + } + } + } + } + + if (pos + 4 <= len) { + int x = bb.getInt(ptr + pos); + if (skipChunk(x)) { + pos += 4; + } + } + + while (pos < len) { + int ch = Byte.toUnsignedInt(ptrBytes[ptr + pos]); + int ch_len = ESCAPE_TABLE[ch]; + if (ch_len > 0) { + beg = pos = flushPos(pos, beg, ptrBytes, ptr, 1); + escapeAscii(ch, aux, HEX); + } else { + pos++; + } + } + + if (beg < len) { + append(ptrBytes, ptr + beg, len - beg); + } + } +} diff --git a/test/json/json_encoding_test.rb b/test/json/json_encoding_test.rb index 2789e94b..7ac06b2a 100644 --- a/test/json/json_encoding_test.rb +++ b/test/json/json_encoding_test.rb @@ -37,6 +37,10 @@ def test_generate_shared_string assert_equal '"234567890"', JSON.dump(s[2..-1]) s = '01234567890123456789"a"b"c"d"e"f"g"h' assert_equal '"\"a\"b\"c\"d\"e\"f\"g\""', JSON.dump(s[20, 15]) + s = "0123456789001234567890012345678900123456789001234567890" + assert_equal '"23456789001234567890012345678900123456789001234567890"', JSON.dump(s[2..-1]) + s = "0123456789001234567890012345678900123456789001234567890" + assert_equal '"567890012345678900123456789001234567890012345678"', JSON.dump(s[5..-3]) end def test_unicode