perf(lib): streamline print function using inline caching and table.c…#31
perf(lib): streamline print function using inline caching and table.c…#31Stricky4 wants to merge 3 commits intonanos-world:masterfrom
Conversation
…oncat Optimized the print function by streamlining the argument processing flow. - Initialized buffer and cached global functions (tostring, table.concat) in a single assignment. - Used a direct index-based loop for type conversion, avoiding repeated table insertions. - Benchmarks show significant performance gains, reducing execution time by up to 51%. - Maintained full compatibility with nil values and varied data types.
| @@ -1,19 +1,15 @@ | |||
| -- Overrides print to call Console Log instead | |||
| print = function(...) | |||
There was a problem hiding this comment.
The file has mixed indentation tabs/whitespace. Perhaps fix that too. (use tabs for smaller file size)
|
|
||
| -- After all, concatenate the results | ||
| return Console.Log(table.concat(buffer)) | ||
| return Console.Log(_concat(buffer, '\t')) |
There was a problem hiding this comment.
Mixed quote style. select uses " so stick with that.
| -- Table used to store the final output, which will be concatenated in the end | ||
| local buffer = {} | ||
| --caching | ||
| local buffer, _tostring, _concat = {...}, tostring, table.concat |
There was a problem hiding this comment.
{...} creates a new table every call which is bad, it better be done in C.
I would like to see benchmark against my raw table.pack version:
-- Localized global functions at module-level for better performance
local rawget, rawset, tostring, table_concat, table_pack = rawget, rawset, tostring, table.concat, table.pack
print = function(...)
local packed = table_pack(...)
for i = 1, packed.n do
rawset(packed, i, tostring(rawget(packed, i)))
end
return Console.Log(table_concat(packed, "\t"))
endThere was a problem hiding this comment.
I ran multiple benchmarks, and table.pack is consistently slower:
Run 1:
v1 ({...}) : 2.69s
v2 (table.pack) : 3.52s (~30% slower)
Run 2:
v1 ({...}) : 1.68s
v2 (table.pack) : 3.14s (~86% slower)
Both implementations allocate a table anyway, so table.pack doesn’t avoid allocations. In practice it introduces extra overhead (the .n field) and seems to be less JIT-friendly
Given that, {...} is not only simpler but also significantly faster here, so we’d prefer keeping the current implementation
There was a problem hiding this comment.
nanos world does not use LuaJIT, it uses Lua 5.4.8. Did you benchmark this ingame? Also, yes they both allocate, but table.pack does it in C.
btw, just in time I released benchmark library, check it out:
https://github.com/Cheatoid/nanos-world-vault/tree/main/library#benchmark
There was a problem hiding this comment.
Yeah, thanks for the correction regarding Lua 5.4, that said, it doesn’t change the conclusion since the benchmarks still show {...} is consistently faster
|
I have a better solution : local function print_voltaism(...)
return Console.Log(string.format(string.rep("%s\t", select("#", ...)), ...))
endMy testing code local args = {"test", "args", nil, 2, {}, true, 2.5, n=7}
local count = 100000
local ret_current = NanosUtils.Benchmark("print_current", count, print_current, table.unpack(args, 1, args.n))
local ret_stricky4 = NanosUtils.Benchmark("print_stricky4", count, print_stricky4, table.unpack(args, 1, args.n))
local ret_cheatoid = NanosUtils.Benchmark("print_cheatoid", count, print_cheatoid, table.unpack(args, 1, args.n))
local ret_voltaism = NanosUtils.Benchmark("print_voltaism", count, print_voltaism, table.unpack(args, 1, args.n))
print("print_current", tostring(ret_current) .. "ms")
print("print_stricky4", tostring(ret_stricky4) .. "ms")
print("print_cheatoid", tostring(ret_cheatoid) .. "ms")
print("print_voltaism", tostring(ret_voltaism) .. "ms") |
Replaced spaces with tabs for consistent indentation. Converted single quotes to double quotes for better string literal consistency.
|
Maybe maybe... add do
local _print, string_format = print, string.format
--- Prints a formatted string to the console.
---@param ... any # Format string and values to format.
printf = function(...) return _print(string_format(...)) end
end( would deprecate this ) |
Replaced the manual loop/buffer logic with a more efficient one-liner. Using string.format and string.rep is significantly faster than manual table insertion in this environment. Credits to vugi99 (https://github.com/vugi99) for this optimization.

…oncat
Optimized the print function by streamlining the argument processing flow.