Zeitpunkt Nutzer Delta Tröts TNR Titel Version maxTL Do 11.07.2024 00:00:37 61.941 +12 3.533.951 57,1 Fosstodon 4.2.10 500 Mi 10.07.2024 00:00:53 61.929 +5 3.530.507 57,0 Fosstodon 4.2.10 500 Di 09.07.2024 00:00:08 61.924 +2 3.527.175 57,0 Fosstodon 4.2.10 500 Mo 08.07.2024 00:00:19 61.922 +5 3.524.369 56,9 Fosstodon 4.2.10 500 So 07.07.2024 00:00:04 61.917 0 3.522.885 56,9 Fosstodon 4.2.10 500 Sa 06.07.2024 00:00:07 61.917 -2 3.523.632 56,9 Fosstodon 4.2.10 500 Fr 05.07.2024 00:00:27 61.919 0 3.520.710 56,9 Fosstodon 4.2.10 500 Do 04.07.2024 00:00:52 61.919 +2 3.517.337 56,8 Fosstodon 4.2.9 500 Mi 03.07.2024 00:00:12 61.917 +2 3.513.906 56,8 Fosstodon 4.2.9 500 Di 02.07.2024 00:01:44 61.915 0 3.510.479 56,7 Fosstodon 4.2.9 500
coolbutuseless (@coolbutuseless) · 02/2023 · Tröts: 4.391 · Folger: 704
Do 11.07.2024 12:49
{callme} update - Writing a faster sqrt() which is 2.5x faster than R's builtin sqrt()
First I create a naive version in C, then unroll the loops, then introduce SIMD instructions.
What's really nice about the SIMD is that I wrote x86 AVX SIMD instructions and ran the code on my mac ARM CPU using the SIMDe library.
*x64 SIMD* instructions on a mac *ARM* CPU.
That's freaking wild! All thanks to the SIMDe "translation" library.
https://github.com/coolbutuseless/callme
https://github.com/simd-everywhere/simde
In-depth case study: A faster sqrt() In this case study, I am investigating a faster sqrt() in R. R’s sqrt() is already pretty fast as it calls the system math sqrt function internally. But can we go faster?? In the following sections I have written sqrt_simple() which is a naive implementation in C sqrt_unrolled() is an implementation in which I have manually unrolled the for loop. sqrt_simd_avx() which uses AVX SIMD instructions. This is particularly interesting as the code is running on macOS which doesn’t have AVX instructions!
A simple bespoke sqrt for R written in C.
A simple bespoke sqrt for R written in C with unrolled loops for faster execution.
A bespoke sqrt for R written in C which uses AVX instructions on mac ARM CPU using the SIMDe "translation" library. This is 2.5x faster than the builtin sqrt in R.
[Öffentlich] Antw.: 0 Wtrl.: 0 Fav.: 0