The Mathematics behind PQC: hash functions

06 May

Introduction

After analyzing in detail the Post Quantum algorithms that base their security on mathematical problems defined over lattices (Kyber, Dilithium, Falcon) and on error correcting codes (Classic McEliece), we can now focus on hash functions, that are the mathematical objects at the foundation of SPHINCS+, a digital signature scheme that has been selected at the end of the third round of the NIST competition. In August 2023, the American institute has published the first draft of this algorithm upcoming standard “FIPS-205: Stateless Hash-Based Digital Signature Standard”. It is expected that the final standard will be published in the first half of 2024 and it will represent a reference point for the entire cryptographic community, guiding researchers in implementing the scheme.

Road map

The next three articles of this PQC blog series will describe the main components that are functional to the construction of SPHINCS+. We will start by introducing the fundamental properties of hash functions that guarantee the security of algorithms such as SPHINCS+. We will then analyze the main characteristics of hash-based signatures: we will mostly focus on some peculiarities that set them apart from schemes based on different mathematical problems and that make them a very conservative choice in terms of security.

After seeing the more general aspects that are shared by the majority of hash-based signature algorithms, we will focus on SPHINCS+. Given the modular structure of this scheme we will proceed in an incremental way, starting from the description of the basic building blocks that will be combined to obtain SPHINCS+ and whose security properties guarantee that SPHINCS+ reaches the security levels required by NIST, even when the attacker has access to a quantum computer. We will introduce the concept of One-Time Signatures (OTS) and Few-Time Signatures (FTS), highlighting the limitations that preclude their use in many application scenarios; we will then present other mathematical objects, such as Merkle hash trees and Hypertress, that allow us to extend OTS and FTS and make them useful in real-world contexts. We will conclude by describing the general structure of SPHINCS+, followed by some observations regarding the main parametrization and the algorithm performances.

Hash functions and security properties

Before describing the main advantages in terms of security of the hash-based signature algorithms, it is useful to define the meaning of the term “cryptographic hash function”. A cryptographic hash function is a function $f\colon \{0,1\}^* \to \{0,1\}^n$ that is able to map arbitrary long strings in strings of fixed length, satisfying the following conditions:

Collision resistance: an attacker [1] must not be able to find two inputs $x, y$ such that $f(x) = f(y)$ ;
Pre-image resistance: given $y\in \{0,1\}^n$ , an attacker must not be able to find $x$ such that $f(x) = y$ ;
Second pre-image resistance: given $x_1$ and $y=f(x_1)$ , an attacker must not be able to find $x_2\ne x_1$ such that $f(x_2)=y$ .

Cryptographic hash functions and constructions defined using them are extremely popular in cryptography: they are in fact part of almost every communication protocol used today. For this reason cryptographic hash functions have been thoroughly analyzed, thus increasing the confidence that researchers have in their security.

The hash-based digital signature algorithms are seen by the international cryptographic community as the most conservative choice in terms of security, much like what happens among KEMs with Classic McEliece. The digital signature schemes based on hash functions are not the best choice in terms of performance or dimensions of public keys/signatures: nonetheless, when compared with other algorithms, they derive their security from a limited number of assumptions that are widely studied by cryptanalysts and are thus believed to be extremely stable with respect to known attacks.
When considering a generic digital signature algorithm that is able to sign messages of abitrary length (e.g. Dilithium of Falcon), it is possible to see that such scheme derives its security from two assumptions:

The intractability of a given mathematical problem (e.g., the Shortest Vector Problem on lattices);
The security of the cryptographic hash function used inside the algorithm.

This implies that if an attacker can break even only one assumption, then the security of the scheme will be compromised.

Hash-based signatures like SPHINCS+ base their security only on the second assumption: this implies that a hash-based algorithm could be attacked if it were possible to compromise the security of the hash function used to define it, but in that case an attacker could also compromise every other scheme that uses the same hash function.
Moreover, in some cryptographic schemes such as SPHINCS+, the security can be guaranteed even if the selected hash function is not collision resistant. It is still advisable to only choose primitives that are believed to be secure and for which there are not any known attacks. The recently proposed algorithms use functions such as SHA2 and SHAKE (belonging to the SHA3 family), while the use of insecure or obsolete algorithms (such as MD5 and SHA1) is deprecated.

Stateful vs stateless digital signatures

The first two hash-based quantum-resistant digital signature schemes – XMSS and LMS – have been proposed by the Internet Engineering Task Force (IETF) and then standardized by the NIST. Observe that their approval has been carried out earlier and independently with respect to the post-quantum cryptography NIST standardization process. Even if long-term security is ensured for most application scenarios, the main reason why these schemes have not been considered completely satisfying is their stateful property.

The term stateful identifies the need for keeping track of all signatures performed starting from the same private key. This turns out to be crucial to guarantee the overall security. Indeed, the reuse of certain signature parameters for different messages can be catastrophic. This requirement rules out many application scenarios, in which the memory availability is constrained and the number of signatures to be performed is huge as examples. Moreover, it is crucial to pay much attention in implementation to avoid synchronization issues; indeed, a desynchronization between non-volatile memory and RAM may produce the reuse of the above-mentioned parameters. For similar reasons, another issue to be considered is the cloning problem which happens when a private key is copied and used in different contexts without coordination. This may be the case when considering virtual machines or backup restoring.
Therefore, the NIST standardization process is focused on stateless solutions, which do not require tracking the performed signatures. Moreover, in view of easily substituting pre-quantum schemes like ECDSA ed RSA that are stateless, it’s fundamental to preserve this property in the post-quantum alternatives.

WOTS+

As reported in the introduction, Winternitz One-Time Signature Plus (WOTS+) is the first building block in the SPHINCS+ design. Considering WOTS+ as a standalone solution, it must be pointed out that allows the secure signature of just one message for each private key-public key pair.

In the following, a high-level description of the scheme is reported.

Let $n$ be the byte length of the message $\texttt{msg}$ to be signed and let $w$ be a positive integer number, $w$ is called Winternitz parameter. $\texttt{msg}$ is split into $lg_w:=\log_2{w}$ bit blocks, each of which is interpreted as a positive integer less than $w$ :

\texttt{msg}=\underbrace{x_1x_2\dots x_{lg_w}}_{m_1}\underbrace{x_{lg_w+1}\dots x_{2lg_w}}_{m_2}\,\dots\,\underbrace{x_{8n-lg_w+1}\dots x_{8n}}_{m_{len_1}}

where $x_i\in\{0,1\}$ , $m_j\in\{0,1,\dots,w-1\}$ and $len_1=\frac{8n}{lg_w}$ .

The private key $s_k$ consists in $len$ (later defined) $n$ byte secret strings, while the public key $pk$ is obtained from $sk$ and a hash function $\text{H}$ [2] through the below-described chains.

\begin{align*} sk_1 & \rightarrow \text{H}(sk_1) \rightarrow \text{H}(\text{H}(sk_1)) = \text{H}^2(sk_1) \rightarrow \cdots \rightarrow \text{H}^{w-1}(sk_1)=:pk_1 \\ sk_2 & \rightarrow \text{H}(sk_2) \rightarrow \text{H}(\text{H}(sk_2)) = \text{H}^2(sk_2) \rightarrow \cdots \rightarrow \text{H}^{w-1}(sk_2)=:pk_2 \\ & \vdots \\ sk_{len} & \rightarrow \text{H}(sk_{len}) \rightarrow \text{H}(\text{H}(sk_{len})) = \text{H}^2(sk_{len}) \rightarrow \cdots \rightarrow \text{H}^{w-1}(sk_{len})=:pk_{len} \end{align*}

From a computational point of view, observe that it’s easy to “unroll” these chains from the left side to the right side. Indeed, given a chain element, it’s enough to repeatedly apply the hash function to obtain other elements to its right. On the other hand, moving to the left is impractical due to the hash function “pre-image resistance” property. WOTS+ usability and security are based on this asymmetry.

The signature of $\texttt{msg}$ is computed as

$\sigma_1\sigma_2\dots\sigma_{len_1}=\text{H}^{m_1}(sk_1)\text{H}^{m_2}(sk_2)\dots\text{H}^{m_{len_1}}(sk_{len_1}),$
where $\sigma_i = \text{H}^{m_i}(sk_i)$ .

The receiver checks the following equalities in the verification phase.

\begin{align*} \text{H}^{w - m_1 - 1}(\sigma_1) & \stackrel{?}{=} pk_1 \\ \text{H}^{w - m_2 - 1}(\sigma_2) & \stackrel{?}{=} pk_2 \\ & \vdots \\ \text{H}^{w - m_{len_1} - 1}(\sigma_{len_1}) & \stackrel{?}{=} pk_{len_1} \end{align*}

However, starting from the signature of $\texttt{msg}$ it is possible to deduece a valid signature for any $\texttt{msg}'$ with $m_j\leq m_j'$ for each $j$ . Indeed:

\begin{align*} \sigma_1' & = \text{H}^{m_1' - m_1}(\sigma_1), \\ \sigma_2' & = \text{H}^{m_2' - m_2}(\sigma_2), \\ & \vdots \\ \sigma_{len_1}' & = \text{H}^{m_{len_1}' - m_{len_1}}(\sigma_{len_1}). \end{align*}

To overcome this vulnerability, the signature of the checksum $C=\sum_{j=1}^{len_{1}}(w-m_j-1)$ is attached to $\sigma_1\dots\sigma_{len_1}$ . It can be seen that the $C$ bit length is given by

len_2 := \left\lfloor\frac{\log_2{(len_1(w-1))}}{lg_w}\right\rfloor+1

and so

len = len_1 + len_2.

Thanks to the checksum, it is clear that the above-mentioned attack strategy is not doable anymore. Indeed, if $m_j\leq m_j'$ for each $j$ then $C\geq C'$ and so it is needed to go back in at least one of the hash chains corresponding to $C$ . However, the hash function inversion is assumed to be impractical.

Finally, the one time nature of the scheme is highlighted. This property makes WOTS+ unattractive in most applicative scenarios unless it is involved as a component of more complex cryptographic construction like SPHINCS+. As an example, consider a couple of distinct messages $\texttt{msg} = 01001010$ and $\texttt{msg}'=00101110$ signed with a unique private key $sk$ and the same following parameters

\begin{align*} n & = 1, \\ w & = 16, \\ lg_w & = 4, \\ len_1 & = 2, \\ len_2 & = 2, \\ len & = 4. \end{align*}

We have

\begin{align*} \texttt{msg} & = 0100\,1010 = 2\,5 = m_1\,m_2, \\ \texttt{msg}' & = 0010\,1110 = 4\,7 = m_1'\,m_2', \end{align*}

while the respective checksums are

\begin{align*} C & = 1110\,1000 = 7\,1 = C_1\,C_2, \\ C' & = 1000\,1000 = 1\,1 = C_1'\,C_2'. \end{align*}

Given the private key $sk=sk_1sk_2sk_3sk_4$ , the corresponding signatures are

\begin{align*} \sigma & = \sigma_1\sigma_2\sigma_3\sigma_4 = \text{H}^2(sk_1)\text{H}^5(sk_2)\text{H}^7(sk_3)\text{H}(sk_4), \\ \sigma' & = \sigma_1'\sigma_2'\sigma_3'\sigma_4' =\text{H}^4(sk_1)\text{H}^7(sk_2)\text{H}(sk_3)\text{H}(sk_4). \end{align*}

Now, consider the message $\dot{\texttt{msg}}=11000110$ , we have

\begin{align*} \dot{\texttt{msg}} & = 1100\,0110 = 3\,6 = \dot{m_1}\,\dot{m_2}, \\ \dot{C} & = 1010\,1000 = 5\,1 = \dot{C_1}\,\dot{C_2}. \end{align*}

To forge the corresponding signature $\dot{\sigma}$ , the following values have to be computed.

\text{H}^3(sk_1), \text{H}^6(sk_2), \text{H}^5(sk_3), \text{H}(sk_4)

This can be done without the $sk$ knowledge, starting from $\sigma_1$ , $\sigma_2$ , $\sigma_3'$ , $\sigma_4'$ , by applying $\text{H}$ and proving that the key reuse in WOTS+ is unsafe.

This article has described the main security characteristics of hash-based digital signatures, showing also a first example belonging to this family of schemes. In the next contributions, other constructions that overcome WOTS+ restrictions will be reported in order to reach the SPHINC+ description.

[1] The considered attacker, potentially equipped with quantum computing capabilities, is not computationally unbounded.
[2] instantiated as SHA-2 or SHAKE

This article belongs to a series of contributions, edited by the Telsy Cryptography Research Group, devoted to quantum computing and its implications on Cryptography. For reference to other articles, please refer to the index.

For other articles related to Quantum and Cryptography topics, please refer to the related categories in the blog.

The authors

Francesco Stocco, a master’s degree in Mathematics at the University of Padua and the Université de Bordeaux attending the course of study “Algebra Geometry And Number Theory” (ALGANT), joined the Telsy research group in Cryptography at the end of 2020 focusing in particular on issues related to quantum technologies.

Marco Rinaudo, a bachelor’s degree in Mathematics from the University of Turin and a master’s degree with a specialization in Cryptography from the University of Trento. Following a 2022 curricular internship at Telsy, he has been part of the Cryptography Research Group since January 2023.