mastodonien.de

fosstodon.org

Zeitpunkt              Nutzer    Delta   Tröts        TNR     Titel                     Version  maxTL
Do 11.07.2024 00:00:37    61.941     +12    3.533.951    57,1 Fosstodon                 4.2.10     500
Mi 10.07.2024 00:00:53    61.929      +5    3.530.507    57,0 Fosstodon                 4.2.10     500
Di 09.07.2024 00:00:08    61.924      +2    3.527.175    57,0 Fosstodon                 4.2.10     500
Mo 08.07.2024 00:00:19    61.922      +5    3.524.369    56,9 Fosstodon                 4.2.10     500
So 07.07.2024 00:00:04    61.917       0    3.522.885    56,9 Fosstodon                 4.2.10     500
Sa 06.07.2024 00:00:07    61.917      -2    3.523.632    56,9 Fosstodon                 4.2.10     500
Fr 05.07.2024 00:00:27    61.919       0    3.520.710    56,9 Fosstodon                 4.2.10     500
Do 04.07.2024 00:00:52    61.919      +2    3.517.337    56,8 Fosstodon                 4.2.9      500
Mi 03.07.2024 00:00:12    61.917      +2    3.513.906    56,8 Fosstodon                 4.2.9      500
Di 02.07.2024 00:01:44    61.915       0    3.510.479    56,7 Fosstodon                 4.2.9      500

Do 11.07.2024 12:49

{callme} update - Writing a faster sqrt() which is 2.5x faster than R's builtin sqrt()

First I create a naive version in C, then unroll the loops, then introduce SIMD instructions.

What's really nice about the SIMD is that I wrote x86 AVX SIMD instructions and ran the code on my mac ARM CPU using the SIMDe library.

*x64 SIMD* instructions on a mac *ARM* CPU.

That's freaking wild! All thanks to the SIMDe "translation" library.

github.com/coolbutuseless/call

github.com/simd-everywhere/sim

In-depth case study: A faster sqrt()

In this case study, I am investigating a faster sqrt() in R.

R’s sqrt() is already pretty fast as it calls the system math sqrt function internally. But can we go faster??

In the following sections I have written

    sqrt_simple() which is a naive implementation in C
    sqrt_unrolled() is an implementation in which I have manually unrolled the for loop.
    sqrt_simd_avx() which uses AVX SIMD instructions. This is particularly interesting as the code is running on macOS which doesn’t have AVX instructions!

In-depth case study: A faster sqrt() In this case study, I am investigating a faster sqrt() in R. R’s sqrt() is already pretty fast as it calls the system math sqrt function internally. But can we go faster?? In the following sections I have written sqrt_simple() which is a naive implementation in C sqrt_unrolled() is an implementation in which I have manually unrolled the for loop. sqrt_simd_avx() which uses AVX SIMD instructions. This is particularly interesting as the code is running on macOS which doesn’t have AVX instructions!

A simple bespoke sqrt for R written in C.

A simple bespoke sqrt for R written in C.

A simple bespoke sqrt for R written in C with unrolled loops for faster execution.

A simple bespoke sqrt for R written in C with unrolled loops for faster execution.

A bespoke sqrt for R written in C which uses AVX instructions on mac ARM CPU using the SIMDe

A bespoke sqrt for R written in C which uses AVX instructions on mac ARM CPU using the SIMDe "translation" library. This is 2.5x faster than the builtin sqrt in R.

[Öffentlich] Antw.: 0 Wtrl.: 0 Fav.: 0

Antw. · Weiterl. · Fav. · Lesez. · Pin · Stumm · Löschen