Skip to content

perf(lib): streamline print function using inline caching and table.c…#31

Open
Stricky4 wants to merge 3 commits intonanos-world:masterfrom
Stricky4:patch-2
Open

perf(lib): streamline print function using inline caching and table.c…#31
Stricky4 wants to merge 3 commits intonanos-world:masterfrom
Stricky4:patch-2

Conversation

@Stricky4
Copy link
Copy Markdown

…oncat

Optimized the print function by streamlining the argument processing flow.

  • Initialized buffer and cached global functions (tostring, table.concat) in a single assignment.
  • Used a direct index-based loop for type conversion, avoiding repeated table insertions.
  • Benchmarks show significant performance gains, reducing execution time by up to 51%.
  • Maintained full compatibility with nil values and varied data types.

…oncat

Optimized the print function by streamlining the argument processing flow.
- Initialized buffer and cached global functions (tostring, table.concat) in a single assignment.
- Used a direct index-based loop for type conversion, avoiding repeated table insertions.
- Benchmarks show significant performance gains, reducing execution time by up to 51%.
- Maintained full compatibility with nil values and varied data types.
Comment thread Shared.lua
@@ -1,19 +1,15 @@
-- Overrides print to call Console Log instead
print = function(...)
Copy link
Copy Markdown

@Cheatoid Cheatoid Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file has mixed indentation tabs/whitespace. Perhaps fix that too. (use tabs for smaller file size)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

Comment thread Shared.lua Outdated

-- After all, concatenate the results
return Console.Log(table.concat(buffer))
return Console.Log(_concat(buffer, '\t'))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed quote style. select uses " so stick with that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

Comment thread Shared.lua Outdated
-- Table used to store the final output, which will be concatenated in the end
local buffer = {}
--caching
local buffer, _tostring, _concat = {...}, tostring, table.concat
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{...} creates a new table every call which is bad, it better be done in C.
I would like to see benchmark against my raw table.pack version:

-- Localized global functions at module-level for better performance
local rawget, rawset, tostring, table_concat, table_pack = rawget, rawset, tostring, table.concat, table.pack

print = function(...)
    local packed = table_pack(...)
    for i = 1, packed.n do
        rawset(packed, i, tostring(rawget(packed, i)))
    end
    return Console.Log(table_concat(packed, "\t"))
end

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran multiple benchmarks, and table.pack is consistently slower:

Run 1:
v1 ({...}) : 2.69s
v2 (table.pack) : 3.52s (~30% slower)

Run 2:
v1 ({...}) : 1.68s
v2 (table.pack) : 3.14s (~86% slower)

Both implementations allocate a table anyway, so table.pack doesn’t avoid allocations. In practice it introduces extra overhead (the .n field) and seems to be less JIT-friendly

Given that, {...} is not only simpler but also significantly faster here, so we’d prefer keeping the current implementation

Copy link
Copy Markdown

@Cheatoid Cheatoid Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nanos world does not use LuaJIT, it uses Lua 5.4.8. Did you benchmark this ingame? Also, yes they both allocate, but table.pack does it in C.
btw, just in time I released benchmark library, check it out:
https://github.com/Cheatoid/nanos-world-vault/tree/main/library#benchmark

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for the correction regarding Lua 5.4, that said, it doesn’t change the conclusion since the benchmarks still show {...} is consistently faster

@vugi99
Copy link
Copy Markdown
Contributor

vugi99 commented Apr 18, 2026

I have a better solution :

local function print_voltaism(...)
    return Console.Log(string.format(string.rep("%s\t", select("#", ...)), ...))
end

My testing code

local args = {"test", "args", nil, 2, {}, true, 2.5, n=7}
local count = 100000

local ret_current = NanosUtils.Benchmark("print_current", count, print_current, table.unpack(args, 1, args.n))
local ret_stricky4 = NanosUtils.Benchmark("print_stricky4", count, print_stricky4, table.unpack(args, 1, args.n))
local ret_cheatoid = NanosUtils.Benchmark("print_cheatoid", count, print_cheatoid, table.unpack(args, 1, args.n))
local ret_voltaism = NanosUtils.Benchmark("print_voltaism", count, print_voltaism, table.unpack(args, 1, args.n))

print("print_current", tostring(ret_current) .. "ms")
print("print_stricky4", tostring(ret_stricky4) .. "ms")
print("print_cheatoid", tostring(ret_cheatoid) .. "ms")
print("print_voltaism", tostring(ret_voltaism) .. "ms")

Results :
image

Replaced spaces with tabs for consistent indentation.

Converted single quotes to double quotes for better string literal consistency.
@Cheatoid
Copy link
Copy Markdown

Cheatoid commented Apr 18, 2026

Maybe maybe... add printf for convenience (after print is defined):

do
    local _print, string_format = print, string.format
    --- Prints a formatted string to the console.
    ---@param ... any # Format string and values to format.
    printf = function(...) return _print(string_format(...)) end
end

( would deprecate this )

Replaced the manual loop/buffer logic with a more efficient one-liner.
Using string.format and string.rep is significantly faster than manual 
table insertion in this environment.

Credits to vugi99 (https://github.com/vugi99) for this optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants