Skip to content

float 32 to s32 conversion optimisation#243

Open
henkmuller wants to merge 2 commits into
developfrom
feature/f32-optimisation
Open

float 32 to s32 conversion optimisation#243
henkmuller wants to merge 2 commits into
developfrom
feature/f32-optimisation

Conversation

@henkmuller

Copy link
Copy Markdown
Contributor

Assembly optimisation using VX4 features.

Provides 1.5 x round-trip speedup, useful for floating point FFTs

Notifying @andrewxcav, @alexyiuxmos

@henkmuller henkmuller requested a review from uvvpavel June 6, 2026 14:21

@uvvpavel uvvpavel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good opt! I forgot that we've had vector apis for those conversions and only optimised the scalar ones.
There are few comments, mostly around the code style

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that xm.entsp is gone, I would remove NSTACKWORDS altogether, so it's not modifiable anymore

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you removed s4, but s5 is still used, and both of them are saved to stack. Could you use s4 instead of s5 and only save s4 to stack? This is not gonna give us any cycles but will be a bit cleaner

#define tmp1 s2
#define tmp0 s3

#define _0 x28

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change it to t3 as it's more idiomatic, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants