• R/O
  • HTTP
  • SSH
  • HTTPS

List of commits

Tags
No Tags

Frequently used words (click to add to your profile)

javac++androidlinuxc#windowsobjective-ccocoa誰得qtpythonphprubygameguibathyscaphec計画中(planning stage)翻訳omegatframeworktwitterdomtestvb.netdirectxゲームエンジンbtronarduinopreviewer

blake3パッケージ


RSS
Rev. Time Author
af0ef07 2020-01-30 03:01:40 Jack O'Connor

update the c/README.md example to hash stdin

37e153c 2020-01-29 05:59:16 Jack O'Connor

add NEON support to blake3_dispatch.c

Currently this requires setting the BLAKE3_USE_NEON preprocessor flag.
In the future we may enable this automatically on AArch32/64 or include
some kind of dynamic feature detection. (Though ARM makes this harder
than x86.)

As part of this, get rid of the IS_ARM flag. It wasn't being set
properly when I tried it on a Raspberry Pi.

Closes #30.

d7a37fa 2020-01-29 04:11:26 Jack O'Connor

clear errno before strtoull

I ran into a bug on ARM where we were getting non-zero here, from
something else that stuck around in error.

4304cd1 2020-01-29 03:26:37 Jack O'Connor

one more warning

d980514 2020-01-29 03:25:22 Jack O'Connor

fix unused variable warning

6742722 2020-01-28 06:21:34 Jack O'Connor

add a note about testing in main.c

8ce1cdd 2020-01-28 06:17:09 TheVice

[memset] removed call of 'memset' function according to the overwriting
of it content inside of blake3_hasher_finalize function.

4730ab2 2020-01-28 06:17:09 TheVice

[memset] placed function after checking of memory was done
on which it should be apply.

dec0c49 2020-01-28 03:10:25 Jack O'Connor

add a note about AVX-512 flags

444a338 2020-01-28 03:04:36 Jack O'Connor

remove an obsolete remark about performance

5ef22de 2020-01-28 03:02:00 Jack O'Connor

link to the C implementation from the README

71e605f 2020-01-27 06:12:10 Jack O'Connor


typo

1db856a 2020-01-27 06:07:51 Jack O'Connor

expand the C README for public consumption

214c70d 2020-01-24 09:42:41 Samuel Neves


Merge pull request #40 from erijo/cpp

Add extern "C" to blake3.h

182aea4 2020-01-24 04:42:34 Erik Johansson

Add extern "C" to blake3.h

So that the header can be included in C++-programs without getting linker
errors.

a830ab2 2020-01-23 21:17:43 Samuel Neves

streamline load_counters

avx2 before:

mov eax, esi
neg rax
vmovq xmm0, rax
vpbroadcastq ymm0, xmm0
vpand ymm0, ymm0, ymmword ptr [rip + .LCPI1_0]
vmovq xmm2, rdi
vpbroadcastq ymm1, xmm2
vpaddq ymm1, ymm0, ymm1
vmovdqa ymm0, ymmword ptr [rip + .LCPI1_1] # ymm0 = [0,2,4,6,4,6,6,7]
vpermd ymm3, ymm0, ymm1
mov r8d, eax
and r8d, 5
add r8, rdi
mov esi, eax
and esi, 6
add rsi, rdi
and eax, 7
vpshufd xmm4, xmm3, 231 # xmm4 = xmm3[3,1,2,3]
vpinsrd xmm4, xmm4, r8d, 1
add rax, rdi
vpinsrd xmm4, xmm4, esi, 2
vpinsrd xmm4, xmm4, eax, 3
vpshufd xmm3, xmm3, 144 # xmm3 = xmm3[0,0,1,2]
vpinsrd xmm3, xmm3, edi, 0
vmovdqa xmmword ptr [rdx], xmm3
vmovdqa xmmword ptr [rdx + 16], xmm4
vpermq ymm3, ymm1, 144 # ymm3 = ymm1[0,0,1,2]
vpblendd ymm2, ymm3, ymm2, 3 # ymm2 = ymm2[0,1],ymm3[2,3,4,5,6,7]
vpsrlq ymm2, ymm2, 32
vpermd ymm2, ymm0, ymm2
vextracti128 xmm1, ymm1, 1
vmovq xmm3, rax
vmovq xmm4, rsi
vpunpcklqdq xmm3, xmm4, xmm3 # xmm3 = xmm4[0],xmm3[0]
vmovq xmm4, r8
vpalignr xmm1, xmm4, xmm1, 8 # xmm1 = xmm1[8,9,10,11,12,13,14,15],xmm4[0,1,2,3,4,5,6,7]
vinserti128 ymm1, ymm1, xmm3, 1
vpsrlq ymm1, ymm1, 32
vpermd ymm0, ymm0, ymm1

avx2 after:

neg esi
vmovd xmm0, esi
vpbroadcastd ymm0, xmm0
vmovd xmm1, edi
vpbroadcastd ymm1, xmm1
vpand ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
vpaddd ymm1, ymm1, ymm0
vpbroadcastd ymm2, dword ptr [rip + .LCPI0_1] # ymm2 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
vpor ymm0, ymm0, ymm2
vpxor ymm2, ymm1, ymm2
vpcmpgtd ymm0, ymm0, ymm2
shr rdi, 32
vmovd xmm2, edi
vpbroadcastd ymm2, xmm2
vpsubd ymm0, ymm2, ymm0

de1458c 2020-01-23 20:51:46 Samuel Neves

name collision

37ea737 2020-01-23 19:58:45 Samuel Neves

more robust bit-trickery functions

e17c45d 2020-01-23 11:35:24 Jack O'Connor

version 0.1.3

Changes since 0.1.2:
- All x86 implementations include _mm_prefetch optimizations. These
improve performance for very large inputs.
- The C implementation performs parallel parent hashing, matching the
performance of the single-threaded Rust implementation.
- b3sum supports --no-mmap. Contributed by @cesarb.

163f522 2020-01-23 11:32:39 Jack O'Connor

port compress_subtree_to_parent_node from Rust to C

This recursive function performs parallel parent node hashing, which is
an important optimization.

de1cf00 2020-01-23 11:32:39 Jack O'Connor

add the round_down_to_power_of_2 algoirthm

This could probably be sped up by detecting LZCNT support, but it's
unlikely to be a bottleneck.

087d72e 2020-01-23 11:32:35 Jack O'Connor

clang-format

92d421d 2020-01-23 11:19:47 Jack O'Connor

add a larger test case

One thing I like to test is that, if I hack simd_degree to be higher
than MAX_SIMD_DEGREE, assertions fire. This requires a test case long
enough to exceed that number of chunks.

78e858d 2020-01-22 02:09:42 Jack O'Connor

expand comments about lazy merging

ccadbad 2020-01-22 01:41:20 Jack O'Connor

stack size in the optimized impl should be MAX_DEPTH + 1

d0c8fc1 2020-01-22 00:47:00 Jack O'Connor

use a better popcnt fallback algorithm

This one loops once for every set bit, rather than once for each bit
position to the right of the highest set bit.

https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation

67262df 2020-01-21 09:25:55 Jack O'Connor

double the maximum incremental subtree size

Because compress_subtree_to_parent_node effectively cuts its input in
half, we can give it an input that's twice as big, without violating the
CV stack invariant.

4a92e8e 2020-01-21 06:36:30 Jack O'Connor

add the reference impl doc test to CI

4021636 2020-01-21 06:19:16 Jack O'Connor

test the BLAKE3_NO_* vars in CI

40f4bdc 2020-01-21 05:24:03 Jack O'Connor

switch from BLAKE3_USE_* to BLAKE3_NO_*

This means that compiling C sources includes all implementations by
default, which is what most callers are going to want.