T-count and T-depth of any multi-qubit unitary

Gheorghiu, Vlad; Mosca, Michele; Mukhopadhyay, Priyanka

doi:10.1038/s41534-022-00651-y

Download PDF

Article
Open access
Published: 30 November 2022

T-count and T-depth of any multi-qubit unitary

Vlad Gheorghiu^1,2,
Michele Mosca^1,2,3,4 &
Priyanka Mukhopadhyay ORCID: orcid.org/0000-0001-6463-9100^1,3

npj Quantum Information volume 8, Article number: 141 (2022) Cite this article

2532 Accesses
11 Citations
3 Altmetric
Metrics details

Subjects

Abstract

We design an algorithm to determine the (minimum) T-count of any n-qubit (n ≥ 1) unitary W of size 2ⁿ × 2ⁿ, over the Clifford+T gate set. The space and time complexity of our algorithm are $O\left({2}^{2n}\right)$ and $O\left({2}^{2n{{{{\mathcal{T}}}}}_{\epsilon }(W)+4n}\right)$, respectively. ${{{{\mathcal{T}}}}}_{\epsilon }(W)$ (ϵ-T-count) is the (minimum) T-count of an exactly implementable unitary U (${{{\mathcal{T}}}}(U)$), such that d(U,W) ≤ ϵ and ${{{\mathcal{T}}}}(U)\le {{{\mathcal{T}}}}({U}^{{\prime} })$ where ${U}^{{\prime} }$ is any exactly implementable unitary with $d({U}^{{\prime} },W)\le \epsilon$. d(. , .) is the global phase invariant distance. Our algorithm can also be used to determine the (minimum) T-depth as well as the minimum non-Clifford-gate count or depth required to implement any multi-qubit unitary with a finite universal gate set like Clifford+CS, Clifford+V, etc. For small enough ϵ, we can synthesize the optimal circuits.

A (quasi-)polynomial time heuristic algorithm for synthesizing T-depth optimal circuits

Article Open access 13 September 2022

Optimal two-qubit circuits for universal fault-tolerant quantum computation

Article Open access 22 June 2021

6-qubit optimal Clifford circuits

Article Open access 05 July 2022

Introduction

The vision of Feynman¹ that a quantum computer can be used to overcome the limitations of classical computers, gained momentum with the design of quantum algorithms that outperform their classical counterparts for popular challenging problems like factorization^2,3, searching an unstructured solution space⁴. Quantum circuit is one of the most popular way for describing and implementing quantum algorithms. These consist of a series of elementary operations or gates belonging to a universal set and dictated by the implementing technologies. Like their classical counterpart, circuit synthesis and optimization is a significant part of any quantum computer compilation process. A compiler primarily translates from a human-readable input (programming language) into instructions that can be executed directly on a hardware. An integral part of this process is quantum circuit synthesis, whose aim is to decompose a unitary operation into a sequence of gates from a universal set. Often, additional constraints are imposed on a synthesis task, like minimization of a certain resource like qubits, gates (total number), non-Clifford gates, multi-qubit gates, etc. We call them resource-optimal synthesis algorithm.

Our work primarily focuses on the “Clifford+T” gate set, a popular finite universal fault-tolerant set. Fault-tolerant quantum error correction^5,6 is required to control the accumulation of errors due to noise on quantum information, faulty quantum gates, faulty quantum state preparation, faulty measurements, etc. This is especially important for long computations, else the errors will make negligible the likelihood of obtaining a reliable and useful answer. The non-Clifford T gate has known constructions in most of the error correction schemes and the cost of fault-tolerantly implementing it exceeds the cost of the Clifford group gates by as much as a factor of hundred or more^7,8. The minimum number of T-gates required to implement certain unitaries is a quantifier of difficulty in many algorithms^9,10 that try to classically simulate quantum computation. So, even though alternative fault-tolerance methods such as completely transversal Clifford+T scheme¹¹ and anyonic quantum computing¹² are also being explored, minimization of the number of T gates (or T-count) in quantum circuits remain an important and widely studied goal. It has been argued in refs. ^13,14,15 that it is also important to reduce the maximum number of T gates in any circuit path (or T-depth).

The Solovay-Kitaev algorithm^16,17 guarantees that given an n-qubit unitary W, we can generate a circuit with a “discrete finite” universal gate set like Clifford+T, such that the unitary U implemented by the circuit is at most a certain distance from W. Here we note that there exists “infinite continuous” universal gate sets like Clifford+R_z(θ), with which we can implement any unitary, without any approximation. In this paper we focus on finite universal gate sets, that are more suitable for quantum error correction and fault tolerance. In fact, in quantum computation a set of gates is said to be universal if any quantum operation can be approximated to arbitrary accuracy by a quantum circuit involving only those gates¹⁸. A unitary is called exactly implementable by a gate set if there exists a quantum circuit with these gates, that implements it (up to some global phase). Otherwise, it is approximately implementable. Accordingly, a synthesis algorithm can be (a) exact when U = e^iϕW (ϕ is the global phase); or (b) approximate when d(U,W) ≤ ϵ for some ϵ > 0. d( . ) is a distance metric. For an unitary U that is exactly implementable by the Clifford+T set, its T-count (denoted by ${{{\mathcal{T}}}}(U)$) is the minimum number of T-gates required to implement it, while its T-depth (denoted by ${{{{\mathcal{T}}}}}_{d}(U)$) is the minimum T-depth of any circuit that implements it. These definitions can be generalized for approximately implementable unitaries and have been described in the section “Preliminaries”. In this paper we give algorithm for the following two problems.

ϵ-T-COUNT: Given an n-qubit unitary W and $\epsilon \in {{\mathbb{R}}}_{\ge 0}$, determine the T-count of a unitary U such that ${{{\mathcal{T}}}}(U)\le {{{\mathcal{T}}}}({U}^{{\prime} })$, where $U,{U}^{{\prime} }$ are n-qubit exactly implementable unitaries and $d(U,W),d({U}^{{\prime} },W)\le \epsilon$.

ϵ-T-DEPTH: Given an n-qubit unitary W and $\epsilon \in {{\mathbb{R}}}_{\ge 0}$, determine the T-depth of a unitary U such that ${{{{\mathcal{T}}}}}_{d}(U)\le {{{{\mathcal{T}}}}}_{d}({U}^{{\prime} })$, where $U,{U}^{{\prime} }$ are n-qubit exactly implementable unitaries and $d(U,W),d({U}^{{\prime} },W)\le \epsilon$.

${{{\mathcal{T}}}}(U)$ and ${{{{\mathcal{T}}}}}_{d}(U)$ are called the ϵ-T-count (${{{{\mathcal{T}}}}}_{\epsilon }(W)$) and ϵ-T-depth (${{{{\mathcal{T}}}}}_{d\epsilon }(W)$) of W, respectively. The T-count and T-depth-optimal circuits of U are called ϵ-T-count-optimal and ϵ-T-depth-optimal circuit for W, respectively. In this paper, we use the global phase invariant distance (see the “Preliminaries” section) as the distance metric and not the operator norm. This is because the global phase invariant distance ignores the global phase and hence avoids unnecessarily long approximating sequences that achieve a specific global phase. This can be the reason for the fact that the bound on T-count of single-qubit Z-rotations is less in ref. ¹⁹, which works with this distance, compared to refs. ^20,21, that works with operator norm. (More discussions can be found in ref. ²².) This distance is composable²² and has been used to synthesize unitaries in other models like topological quantum computation^23,24. It is not hard to see that if ϵ = 0 then we get the problem of synthesizing T-count and T-depth-optimal circuits for exactly implementable unitaries. In this case, both provable^25,26,27 and much more efficient heuristic^26,27 algorithms have been developed (see Table 1 for a comparison). We say that an algorithm is provable if its claimed efficiency and correctness or quality of solution (in this case optimality) can be proved by rigorous arguments. An algorithm is heuristic if either one or both of these factors are conjectured to be true.

Table 1 Complexity of some state-of-the-art optimal synthesis algorithms.

Full size table

Any synthesis algorithm will have complexity at least O(2ⁿ), the input size. Placing further optimality constraint makes the problem even harder, in fact impractical to synthesize on a PC after a certain value of n. So re-synthesis algorithms have been developed which takes a circuit implementing a unitary and then tries to reduce (not minimize) a certain resource (see for example, refs. ^28,29,30). In literature, usually these algorithms do not account for the complexity of synthesizing the circuit and claim a running time of poly(n). A detail study about the relative merit and de-merit of synthesis and re-synthesis algorithms, is beyond the scope of this work. But we would like to point out that the importance of designing better (optimal) synthesis algorithms or studying their complexity cannot be undermined, not only for theoretical reasons but also for the various applications they can have. Apart from guaranteeing optimality, they can be used as sub-routines in compiling large unitaries^22,31,32, assess the quality of re-synthesis algorithms, etc. For example, the T-depth-optimal synthesis algorithm of ref. ²⁷ was able to generate optimal circuits for standard unitaries like Fredkin, Peres, and Quantum OR, which could not be done by the re-synthesis methods used in ref. ³³. In the section “Discussion of implementation results”, we show that we get much less T-count for widely-used multi-qubit unitaries like controlled rotation, compared to the number of T-gates obtained by compiling them first into single-qubit rotations and then replacing the T-count-optimal circuit of each such single-qubit rotation.

To the best of our knowledge, before this paper there was no algorithm to determine ϵ-T-count or ϵ-T-depth of arbitrary multi-qubit (n ≥ 1) unitaries, considering any distance metric. Previous algorithms like^19,20,21 synthesize ϵ-T-count-optimal circuits for single qubit Z-rotations. In fact, even if we consider other discrete, finite universal gate sets like Clifford+V or Clifford+CS, there are no algorithms that work for arbitrary multi-qubit unitaries and minimize the non-Clifford gate count/depth. Even it is not clear how to modify or generalize the methods introduced in these papers. However, our results not only work for multi-qubit unitaries but can also be applied in these alternate bases, as explained in the next section.

In this paper, we give algorithms that can be used to synthesize (provably) ϵ-T-count and ϵ-T-depth-optimal circuits. We treat arithmetic operations on the entries of a unitary at unit cost and we do not account for the bit-complexity associated with specifying or manipulating them. Suppose the input n-qubit unitary is W, having size 2ⁿ × 2ⁿ. Then the space complexity of our algorithms, described in “Methods”, is poly(2ⁿ). The time complexity has an exponential dependence on ${{{{\mathcal{T}}}}}_{\epsilon }(W)$ or ${{{{\mathcal{T}}}}}_{d\epsilon }(W)$ while synthesizing ϵ-T-count or ϵ-T-depth optimal circuit, respectively (Table 1).

For the design and analysis of our algorithm the following results (see the “An exponential time and polynomial space algorithm” section) have been crucial. Suppose E is a unitary that is close to ${\mathbb{I}}$ in the global phase invariant distance, i.e., $d(E,{\mathbb{I}})\le \epsilon$. C₀ is an n-qubit Clifford operator. EC₀ behaves almost like Clifford C₀, i.e., it approximately inherits some characteristics from C₀. First, expanding both EC₀ and C₀ in the Pauli basis, we can see that the absolute value of the coefficients (at each point) are almost equal (amplitude test). Second, if we expand $(E{C}_{0})P{(E{C}_{0})}^{{\dagger} }$ in the Pauli basis then the absolute value of the coefficients is almost 1 at ${P}^{{\prime} }$ if ${C}_{0}P{C}_{0}^{{\dagger} }={P}^{{\prime} }$, and nearly zero at other points (conjugation test). These results may be of independent interest and can be used for resource-optimal synthesis in other bases as described below.

Most discrete universal gate sets consist of Clifford gates and one non-Clifford gate. Consider one such set Clifford+A, where A is a non-Clifford gate and let U_A is a unitary exactly implementable by this set. Since usually the cost of fault-tolerantly implementing the non-Clifford gate is higher, so we are required to optimize the A-count or A-depth, which are defined analogous to T-count and T-depth. One of the tricks in many resource-optimal-synthesis algorithms is to find a nice generating set ${{{\mathcal{G}}}}$ such that it has finite cardinality and U_A can be decomposed as follows.

$${U}_{A}={e}^{i\phi }\left(\mathop{\prod }\limits_{i=m}^{1}{G}_{i}\right){C}_{0}\qquad {G}_{i}\in {{{\mathcal{G}}}},\,{C}_{0}\in {{{{\mathcal{C}}}}}_{n},\,\phi \in \left[0,2\pi \right),\,m={{{\rm{A}}}}-{{{\rm{count}}}}/{{{\rm{A}}}}-{{{\rm{depth}}}}$$

(1)

Each ${G}_{i}\in {{{\mathcal{G}}}}$ has a specific property: A-count 1 or A-depth 1. Then we design a search algorithm to search for products of G_i such that, we get ${U}_{A}{C}_{0}^{-1}$ (up to a global phase), or rather ${U}_{A}{\left({\prod }_{i}{G}_{i}\right)}^{-1}$ is a Clifford. We know that for any discrete finite universal gate set not all unitaries are exactly implementable. Let V_A be one such unitary. U_A is a unitary such that d(U_A, V_A) ≤ ϵ and it has the minimum A-count or A-depth among all exactly implementable unitaries within ϵ distance of V_A. Then we can perform amplitude test (Theorem 3.1) and conjugation test (Theorem 3.3, Corollary 3.1) and get an ϵ-A-count-optimal or ϵ-A-depth-optimal decomposition of V_A. So it will be interesting and useful to find such nice generating set for other bases, as has been found for Clifford+T²⁵ (T-count), (T-depth)²⁷, and Clifford+CS³⁴ (CS-count, only for 2-qubit unitaries). One simple way of constructing ${{{\mathcal{G}}}}$ for count-optimality is to write U_A as follows.

$$\begin{array}{l}{U}_{A}\,\,=\,\,{e}^{i\phi }{C}_{1}{A}_{({q}_{1})}{C}_{2}{A}_{({q}_{2})}{C}_{3}\ldots {C}_{m}{A}_{({q}_{m})}{C}_{0}\\ \qquad=\,\,{e}^{i\phi }\left({C}_{1}{A}_{({q}_{1})}{C}_{1}^{{\dagger} }\right)\left({C}_{1}{C}_{2}{A}_{({q}_{2})}{C}_{2}^{{\dagger} }{C}_{1}^{{\dagger} }\right)\ldots\\ \qquad\qquad \left({C}_{1}\ldots {C}_{m}{A}_{({q}_{m})}{C}_{m}^{{\dagger} }\ldots {C}_{1}^{{\dagger} }\right){C}_{1}\ldots {C}_{m}{C}_{0}\\ \qquad=\,\,{e}^{i\phi }\left({C}_{1}{A}_{({q}_{1})}{C}_{1}^{{\dagger} }\right)\left({C}_{2}^{{\prime} }{A}_{({q}_{2})}{C}_{2}^{{\prime} {\dagger} }\right)\ldots \left({C}_{m}^{{\prime} }{A}_{({q}_{m})}{C}_{m}^{{\prime} {\dagger} }\right){C}_{0}^{{\prime} }\end{array}$$

In the above ${A}_{({q}_{i})}$ denotes the A-gate applied on qubit q_i. ${C}_{0},\ldots ,{C}_{m},{C}_{0}^{{\prime} }\ldots ,{C}_{m}^{{\prime} }\in {{{{\mathcal{C}}}}}_{n}$. So ${{{\mathcal{G}}}}=\{C{A}_{({q}_{i})}{C}^{{\dagger} }:C\in {{{{\mathcal{C}}}}}_{n},i\in [n]\}$, which is obviously finite since the Clifford group is finite^35,36. It is possible to get more compact sets by exploiting other algebraic properties (for example, ref. ²⁵). A generating set for depth-optimality can be constructed by conjugating products of at least n A-gates on distinct qubits by Clifford, as has been done in ref. ²⁷.

From Table 1 we see that for exactly implementable unitaries the provable algorithms like refs. ^25,26,27 had an exponential dependence on T-count and T-depth. Significant improvements were achieved in refs. ^26,27, where heuristics were designed that led to algorithms with a polynomial dependence on T-count and T-depth. The algorithm in our paper also suffers from exponential dependence on ϵ-T-count and ϵ-T-depth, which usualy have an inverse dependence on ϵ, i.e., ${{{{\mathcal{T}}}}}_{\epsilon },{{{{\mathcal{T}}}}}_{d\epsilon }\propto f(1/\epsilon )$. We conjecture that for approximately implementable unitaries it is not possible to have algorithms with a polynomial dependence on these parameters.

Conjecture 1. It is not possible to have ϵ-T-count or ϵ-T-depth-optimal synthesis algorithms with complexity ${{{\rm{poly}}}}\left({2}^{n},{{{{\mathcal{T}}}}}_{\epsilon }\right)$ or ${{{\rm{poly}}}}\left({2}^{n},{{{{\mathcal{T}}}}}_{d\epsilon }\right)$, respectively.

However, from the improvements in T-count obtained by us (see the “Discussion of implementation results” section), we feel it is important to design efficient multi-qubit resource-optimal synthesis algorithms. In many practical quantum algorithms it is not too difficult to decompose a large unitary into smaller ones. We can apply composability rules (for example, see ref. ²² for global phase invariant distance) and distribute the errors among these small unitaries, such that the overall error remains within the desired bound. The complexity of resource-optimal synthesis algorithms will determine the maximum size of the component unitaries in a decomposition. The larger the components, the better the resource estimates, as evident from the results in our paper (see the “Discussion of implementation results” section). It might be possible to decrease the exponent in the time complexity by applying techniques like meet-in-the-middle²⁵ or nested meet-in-the-middle^26,27. It is not hard to see that our algorithm can be parallelized. It will be interesting to investigate if additional tricks can be used. Another interesting question is to find more compact generating sets for other universal gate sets for multi-qubit unitaries.

Much work has been done to synthesize a circuit for any multi-qubit unitary (without provable optimality on any resource)^{16,17,18,20,37,38,39,40,41}. Comparatively little has been done for arbitrary multi-qubit unitaries, when additional constraints are imposed, like minimizing the T-count or T-depth. To the best of our knowledge, all the previous works for approximately implementable unitaries, have been exclusively for single-qubit unitaries, in fact specifically for R_z(θ) gates. They work with either operator norm^20,21 or global phase invariant distance¹⁹. However, considerable amount of work has been done to synthesize T-count and T-depth-optimal circuits for exactly implementable multi-qubit unitaries. These include algorithms with provable optimality like^25,26,27 that employ meet-in-the-middle (MITM) and nested MITM techniques, as well as more efficient heuristic algorithms whose optimality depend on some conjecture^26,27. A crisp summary of the complexity of some state-of-the-art optimal synthesis algorithms has been given in Table 1.

Work has also been done to approximate single-qubit unitaries in Clifford+V^42,43,44,45 basis and perform a CS-count-optimal (exact) synthesis³⁴ of two-qubit unitaries in Clifford+CS basis.

Results and discussion

Preliminaries

We write [K] = {1, 2, …, K}. We denote the n × n identity matrix by ${{\mathbb{I}}}_{n}$ or ${\mathbb{I}}$ if dimension is clear from the context. We denote the set of n-qubit unitaries by ${{{{\mathcal{U}}}}}_{n}$. The size of an n-qubit unitary is N × N where N = 2ⁿ. We have given detail description about the n-qubit Pauli operators (${{{{\mathcal{P}}}}}_{n}$), Clifford group (${{{{\mathcal{C}}}}}_{n}$) and the group (${{{{\mathcal{J}}}}}_{n}$) generated by the Clifford and T gates in Supplementary Note 1. In this subsection we give some additional definitions and results required for the rest of the paper. We observe the following when expanding a Clifford in the Pauli basis.

Fact 2.1 (ref. ⁴⁶). If $C\in {{{{\mathcal{C}}}}}_{n}$ then for each $P\in {{{{\mathcal{P}}}}}_{n}$$\exists {r}_{P}\in {\mathbb{C}}$, such that $C={\sum }_{P\in {{{{\mathcal{P}}}}}_{n}}{r}_{P}P$. Further, if ${r}_{P},{r}_{{P}^{{\prime} }}\,\ne\, 0$ for any pair of $P,{P}^{{\prime} }$, then $| {r}_{P}| =| {r}_{{P}^{{\prime} }}| =r$, for some $r\in {\mathbb{R}}$.

Fact 2.2. Let $Q={\sum }_{P\in {{{{\mathcal{P}}}}}_{n}}{q}_{P}P$ be the expansion of a matrix Q in the Pauli basis. Then

$${q}_{P}=Tr\left(QP\right)/N.$$

Further, if Q is a unitary then ${\sum }_{P\in {{{{\mathcal{P}}}}}_{n}}{\left\vert {q}_{P}\right\vert }^{2}=1$

The proof has been given in Supplementary Note 1 (Fact 2) Now consider a Clifford C which has an expansion, as given in Fact 2.1. Let there be M (≤N²) such non-zero coefficients. Since C is a unitary, so we apply Fact 2.2 and get the following.

$$M{r}^{2}=1\quad \ \Rightarrow \ r=\frac{1}{\sqrt{M}}\ge \frac{1}{N}.$$

(2)

We say that a unitary U is exactly implementable if there exists a Clifford+T circuit that implements it (up to some global phase), else it is approximately implementable. Specifically, we say W is ϵ-approximately implementable if there exists an exactly implementable unitary U such that d(U,W) ≤ ϵ. The Solovay-Kitaev algorithm^16,17 guarantees that any unitary is ϵ-approximately implementable, for arbitrary precision ϵ ≥ 0. We denote the set of exactly implementable unitaries by ${{{{\mathcal{J}}}}}_{n}$. In this paper, we use the following distance measure d(. , .), which has been used in previous works like refs. ^19,38 (qubit-based computing), refs. ^23,24 (topological quantum computing).

Definition 2.1 (Global phase invariant distance). Given two unitaries $U,W\in {{{{\mathcal{U}}}}}_{n}$, we define the global phase invariant distance between them as follows.

$$d(U,W)=\sqrt{1-\frac{\left\vert Tr\left({U}^{{\dagger} }W\right)\right\vert }{N}}$$

Composability of this distance with respect to tensor product and multiplication of unitaries, has been derived in ref. ²². This implies that if $U={\prod }_{i}\left({\otimes }_{j}{U}_{ij}\right)$, $V={\prod }_{i}\left({\otimes }_{j}{V}_{ij}\right)$ and d(U_ij, V_ij) ≤ ϵ_ij then we can upper bound d(U, V) as function of ϵ_ij.

Definition 2.2. The T-count of a circuit is the number of T-gates in it. The T-count of an exactly implementable unitary $U\in {{{{\mathcal{J}}}}}_{n}$, denoted by ${{{\mathcal{T}}}}(U)$, is the minimum number of T-gates required to implement it (up to a global phase).

In ref. ²⁵, the authors proved the following decomposition result, by which any exactly implementable unitary over the Clifford+T set can be written as a product of T-count 1 unitaries.

Theorem 2.1 (Proposition 1 in ref. ²⁵ (re-stated)). For any $U\in {{{{\mathcal{J}}}}}_{n}$ there exists a phase ϕ ∈ [0, 2π), Cliffords ${C}_{i}\in {{{{\mathcal{C}}}}}_{n}$ and Paulis ${P}_{i}\in {{{{\mathcal{P}}}}}_{n}\setminus \{{\mathbb{I}}\}$ for $i\in [{{{\mathcal{T}}}}(U)]$ such that

$$U={e}^{i\phi }\left(\mathop{\prod }\limits_{i={{{\mathcal{T}}}}(U)}^{1}R({P}_{i})\right){C}_{0}$$

(3)

where $R({P}_{i})={C}_{i}{T}_{({q}_{i})}{C}_{i}^{{\dagger} }=\frac{1}{2}(1+{e}^{\frac{i\pi }{4}}){\mathbb{I}}+\frac{1}{2}(1-{e}^{\frac{i\pi }{4}}){C}_{i}{Z}_{({q}_{i})}{C}_{i}^{{\dagger} } =\frac{1}{2}(1+{e}^{\frac{i\pi }{4}}){\mathbb{I}}+\frac{1}{2}(1-{e}^{\frac{i\pi }{4}}){P}_{i}$.

Using Fact 1 in Supplementary Note 1, given P and ${Z}_{({q}_{i})}$ we can compute (circuit for) C_i efficiently such that $P={C}_{i}{Z}_{({q}_{i})}{C}_{i}^{{\dagger} }$. A decomposition of U, as in Theorem 2.1, with the minimum number of T-gates is called a T-count-optimal decomposition of U.

Definition 2.3 (ϵ-T-count of approximately implementable unitaries). Let $W\in {{{{\mathcal{U}}}}}_{n}$ be an approximately implementable unitary. The ϵ-T-count of W, denoted by ${{{{\mathcal{T}}}}}_{\epsilon }(W)$, is equal to ${{{\mathcal{T}}}}(U)$, the T-count of an exactly implementable unitary $U\in {{{{\mathcal{J}}}}}_{n}$ such that d(U,W) ≤ ϵ and ${{{\mathcal{T}}}}(U)\le {{{\mathcal{T}}}}({U}^{{\prime} })$ for any ${U}^{{\prime} }\in {{{{\mathcal{J}}}}}_{n}$ within distance ϵ of W, i.e., $d({U}^{{\prime} },W)\le \epsilon$. We call a T-count-optimal circuit for any such U as the ϵ-T-count-optimal circuit for W.

It is not hard to see that the above definitions are very general and can be applied to any unitary $W\in {{{{\mathcal{U}}}}}_{n}$, exactly or approximately implementable. If a unitary is exactly implementable then ϵ = 0. In fact, nearly all the following results can be deduced for the special case of exactly implementable unitaries by applying ϵ = 0.

Discussion of implementation results

We implemented our algorithm ${{{{\mathcal{A}}}}}_{MIN}$ in standard C++17 on an Intel(R) Core(TM) i7-7700K CPU at 4.2GHz, with 8 cores and 48 GB RAM, running FreeBSD 13.1. We compiled the code using clang++ 13.0.0. We used OpenMP⁴⁷ for parallelization and the Eigen 3 matrix library⁴⁸ for some of the matrix operations. We applied our algorithm to return the T-count-optimal decomposition of the following unitaries: (i) 1-qubit R_z(θ) and R_k; (ii) 2-qubit controlled-R_z(θ) (cR_z(θ)); (iii) 2-qubit controlled R_k (cR_k); (iv) 3-qubit double controlled-R_z(θ) (ccR_z(θ)); (v) 3-qubit double controlled-R_k (ccR_k); (vi) 2-qubit Givens rotation (Givens(θ)); (vii) 2-qubit QFT.

$$(i)\quad {R}_{z}(\theta )=\left[\begin{array}{cc}{e}^{-i\theta /2}&0\\ 0&{e}^{i\theta /2}\end{array}\right];\qquad {R}_{k}=\left[\begin{array}{cc}1&0\\ 0&{e}^{2\pi i/{2}^{k}}\end{array}\right]={e}^{2\pi i/{2}^{k+1}}{R}_{z}\left(\frac{2\pi }{{2}^{k}}\right)$$

$$(ii)\quad c{R}_{z}(\theta )={{{\rm{diagonal}}}}\left(1,1,{e}^{-i\theta /2},{e}^{i\theta /2}\right)$$

$$(iii)\quad c{R}_{k}={{{\rm{diagonal}}}}\left(1,1,1,{e}^{2\pi i/{2}^{k}}\right)$$

$$(iv)\quad cc{R}_{z}(\theta )={{{\rm{diagonal}}}}\left(1,1,1,1,1,1,{e}^{-i\theta /2},{e}^{i\theta /2}\right)$$

$$(v)\quad cc{R}_{k}={{{\rm{diagonal}}}}\left(1,1,1,1,1,1,1,{e}^{2\pi i/{2}^{k}}\right)$$

$$(vi)\quad Givens(\theta )=\left[\begin{array}{cccc}1&0&0&0\\ 0&\cos \theta &-\sin \theta &0\\ 0&\sin \theta &\cos \theta &0\\ 0&0&0&1\end{array}\right]$$

For convenience, we have denoted some diagonal matrices by diagonal(a, b, c, …) which implies that the matrix has elements a, b, c, … along the diagonal and 0 elsewhere. We used Quantum++⁴⁹ to obtain the unitary from the circuit given in ref. ¹⁸ (Supplementary Fig. 1).

Controlled-R_z(θ) gates appear in many important quantum algorithms like Quantum Fourier Transform (QFT), phase estimation, factorization, order finding, hidden subgroup problem, Grover’s search¹⁸, quantum simulations⁵⁰. Givens rotation appears in a number of quantum chemistry aplications⁵¹. Before our work there was no algorithm for T-count of multi-qubit approximately implementable unitaries. In fact they returned optimal results only for single qubit R_z(θ). If possible other unitaries were decomposed into a sequence of R_z(θ) gates, for which we know the following empirical formula from ref. ¹⁹, where the T-count is averaged over θ.

$${{{{\mathcal{T}}}}}^{\epsilon }({R}_{z}(\theta ))=3.067\,{\log }_{2}\left(1/\epsilon \right)-4.322$$

(4)

Then an upper bound was given by adding the T-count of component unitaries. For example, in Fig. 1 we have shown two implementations of cR_z(θ) gate, that we found in literature. In Fig. 1a, two Fredkin gates (T-count = 7²⁶), one R_z(θ), and an extra ancilla⁵² is used. In Fig. 1b, the implementation uses two R_z gates. So upper bound on the T-count of cR_z(θ), averaged over θ is as follows.

$$\#{{{\rm{T}}}}(c{R}_{z}(\theta ))=3.067\,{\log }_{2}\left(1/\epsilon \right)-4.322+14=3.067\,{\log }_{2}\left(1/\epsilon \right)+9.678\quad [\,{{\mbox{Figure 1a}}}\,]$$

$$\#{{{\rm{T}}}}(c{R}_{z}(\theta ))=2\left(3.067\,{\log }_{2}\left(1/\epsilon \right)-4.322\right)=6.134\,{\log }_{2}\left(1/\epsilon \right)-8.644\quad [\,{{\mbox{Figure 1b}}}\,]$$

The first upper bound is better (gives less T-count) for every ϵ < 0.016, but the implementation uses an extra ancilla. In Fig. 1c and d we show an implementation of cR_k and ccR_z(θ), respectively. The latter circuit can be used to implement ccR_k by replacing the cR_z(θ) with cR_k. Upper bounds on T-count can be deduced in a similar manner for the respective unitaries.

**Fig. 1: Implementations of multi-controlled rotations.**

We took $\theta =\frac{2\pi }{{2}^{k}}$ and varied k from 2 to 11. We were more interested in synthesizing multi-qubit unitaries, since these were not T-count-optimally synthesized before. It took on an average 48 mins to synthesize a 2-qubit unitary with T-count at most 7; and about 5.7 h for a 3-qubit unitary with T-count at most 4. We have synthesized only one 2-qubit unitary with T-count 9. This is the 2-qubit QFT at ϵ = 10⁻¹⁸ and it took more than 4 days. We have not synthesized 2 and 3 qubit unitaries with higher T-count because of time constraint. We made the following observations.

1.
The T-count of controlled rotation gates reduce, as we increase the number of controls, at least for many of the angles and precision tested by us. This has been shown in Table 2. The average running time has been stated before.
Table 2 Comparison of ϵ-T-count of different (controlled) rotation gates for various angles and precision.
Full size table
2.
The T-count of 2-qubit QFT is equal to the T-count of R₂ and has been shown in Table 3. In this table we have also shown the T-count of 3-qubit QFT for some precision. The running time for these tests has been explicitly mentioned.
Table 3 ϵ-T-count of 2 and 3 qubit QFT.
Full size table
3.
The T-count of Givens(θ) is similar to cR_z(θ), on an average. (So we have not shown it separately.)
4.
The T-count of R_z(θ) (and hence R_k) agrees with the results given in ref. ¹⁹.

The numerical results of this subsection, together with instructions on how to reproduce them, are available online at https://github.com/vsoftco/approx-t.

Methods

An exponential time and polynomial space algorithm

In this section, we describe an algorithm for determining the ϵ-T-count of an n-qubit unitary $W\in {{{{\mathcal{U}}}}}_{n}$. This algorithm has space and time complexity $O\left({2}^{2n}\right)$ and $O\left({2}^{2n{{{{\mathcal{T}}}}}_{\epsilon }(W)+4n}\right)$, respectively. First, we derive some results that will help us design our algorithm.

Let U be an exactly synthesizable unitary such that d(U,W) ≤ ϵ.

$$d(W,U)=\sqrt{1-\frac{| Tr({W}^{{\dagger} }U)| }{N}}\le \epsilon \ \Rightarrow \ | Tr({W}^{{\dagger} }U)| \ge N(1-{\epsilon }^{2})$$

(5)

Let $U=\left(\mathop{\prod }\nolimits_{i = t}^{1}R({P}_{i})\right){C}_{0}{e}^{i\phi }$ for some ${C}_{0}\in {{{{\mathcal{C}}}}}_{n}$ and global phase ϕ. And W = UE. Then from Eq. (5) we have

$$\left\vert Tr\left({E}^{{\dagger} }\right)\right\vert =\left\vert Tr\left(E\right)\right\vert \ge N(1-{\epsilon }^{2}).$$

(6)

The above implies that $d(E,{\mathbb{I}})\le \epsilon$. We have

$$\begin{array}{l}\left\vert Tr\left({W}^{{\dagger} }\left(\mathop{\prod }\limits_{i=t}^{1}R({P}_{i})\right)\right)\right\vert \,\,=\,\,\left\vert Tr\left({E}^{{\dagger} }{U}^{{\dagger} }\left(\mathop{\prod }\limits_{i=t}^{1}R({P}_{i})\right)\right)\right\vert \\ \qquad\qquad\qquad\qquad\qquad\quad=\,\,\left\vert Tr\left({E}^{{\dagger} }{e}^{-i\phi }{C}_{0}^{{\dagger} }\left(\mathop{\prod }\limits_{i=1}^{t}{R}^{{\dagger} }({P}_{i})\right)\left(\mathop{\prod }\limits_{i=t}^{1}R({P}_{i})\right)\right)\right\vert \\ \qquad\qquad\qquad\qquad\qquad\quad=\,\,\left\vert Tr\left({E}^{{\dagger} }{C}_{0}^{{\dagger} }\right)\right\vert =\left\vert Tr\left(E{C}_{0}\right)\right\vert \end{array}$$

and similarly

$$\left\vert Tr\left({W}^{{\dagger} }\left(\mathop{\prod }\limits_{i=t}^{1}R({P}_{i})\right){P}_{1}\right)\right\vert =\left\vert Tr\left({E}^{{\dagger} }{C}_{0}^{{\dagger} }{P}_{1}\right)\right\vert =\left\vert Tr\left(E{C}_{0}{P}_{1}\right)\right\vert \qquad [{P}_{1}\in {{{{\mathcal{P}}}}}_{n}]$$

(7)

We now study some properties of $\left\vert Tr(E{C}_{0}{P}_{1})\right\vert$, which will help us check if we have identified a correct $\mathop{\prod }\nolimits_{i = t}^{1}R({P}_{i})$. For this, we prove the following theorem.

Theorem 3.1. Let $E\in {{{{\mathcal{U}}}}}_{n}$ be such that $\left\vert Tr\left(E\right)\right\vert \ge N\left(1-{\epsilon }^{2}\right)$, for some ϵ ≥ 0. ${C}_{0}={\sum }_{P\in {{{{\mathcal{P}}}}}_{n}}{r}_{P}P$ is an n-qubit Clifford. If $\left\vert \left\{P:{r}_{P}\,\,\ne\,\, 0\right\}\right\vert =M$ then

$$\begin{array}{l}\frac{1-{\epsilon }^{2}}{\sqrt{M}}-\sqrt{M}\sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}\le \left\vert Tr\left(E{C}_{0}{P}_{1}\right)/N\right\vert \le \frac{1}{\sqrt{M}}\\ \qquad +\sqrt{M}\sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}\quad [if\,{r}_{{P}_{1}}\ne \,0]\end{array}$$

(8)

$$and\quad 0\le \left\vert Tr\left(E{C}_{0}{P}_{1}\right)/N\right\vert \le \sqrt{M}\sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}\quad [if\,{r}_{{P}_{1}}=0]$$

(9)

Proof Since E is unitary, we can expand it in the Pauli basis as

$$E=\mathop{\sum}\limits_{P\in {{{{\mathcal{P}}}}}_{n}}{e}_{P}P$$

(10)

where ${e}_{P}=Tr\left(EP\right)/N$ and ${\sum }_{P}{\left\vert {e}_{P}\right\vert }^{2}=1$ (Fact 2.2). Thus

$$1\ge | {e}_{{\mathbb{I}}}| =\left\vert Tr\left(E\right)/N\right\vert \ge (1-{\epsilon }^{2})\qquad [\,{{\mbox{From inequality 6}}}\,]$$

(11)

$$\,{{\mbox{and}}}\,\,\mathop{\sum}\limits_{P\ne {\mathbb{I}}}| {e}_{P}{| }^{2}\le 1-{(1-{\epsilon }^{2})}^{2}=2{\epsilon }^{2}-{\epsilon }^{4}$$

(12)

$$\ \Rightarrow \ | {e}_{P}| \le \sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}\qquad [\forall P\,\,\ne \,\,{\mathbb{I}}]$$

(13)

Since ${C}_{0}={\sum }_{P\in {{{{\mathcal{P}}}}}_{n}}{r}_{P}P$, from Fact 2.1 and Eq. (2), we know that $| {r}_{P}| =r=\frac{1}{\sqrt{M}}$ or r_P = 0. Then

$$\begin{array}{l}E{C}_{0}\,\,=\,\,\mathop{\sum}\limits_{P\in {{{{\mathcal{P}}}}}_{n}}{r}_{P}EP=\mathop{\sum}\limits_{{P\in {{{{\mathcal{P}}}}}_{n}}\atop{{r}_{P}\,\ne\, 0}}{r}_{P}EP\\ \,{{\mbox{and}}}\,\,E{C}_{0}{P}_{1}={r}_{{P}_{1}}E+\mathop{\sum}\limits_{{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}\atop{{r}_{P}\,\ne\, 0}}{r}_{P}E{P}_{1}^{{\prime} }\quad \,{{\mbox{where}}}\,\,{P}_{1}\in {{{{\mathcal{P}}}}}_{n}\setminus \{{\mathbb{I}}\}\,{{\mbox{and}}}\,\,P{P}_{1}={P}_{1}^{{\prime} }\,\ne \,{\mathbb{I}}.\end{array}$$

So

$$\begin{array}{l}\left\vert Tr\left(E{C}_{0}{P}_{1}\right)\right\vert \,=\,\left\vert {r}_{{P}_{1}}Tr\left(E\right)+\mathop{\sum}\limits_{{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}\atop{{r}_{P}\ne 0}}{r}_{P}Tr\left(E{P}_{1}^{{\prime} }\right)\right\vert \\ \qquad\qquad\qquad=\left\vert \left({r}_{{P}_{1}}{e}_{{\mathbb{I}}}+\mathop{\sum}\limits_{{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}\atop{{r}_{P}\ne 0}}{r}_{P}{e}_{{P}_{1}^{{\prime} }}\right)\right\vert N\quad [\,{{\mbox{Using Fact 2.2}}}\,]\\ \end{array}$$

(14)

$$\begin{array}{l}\le \left\vert {r}_{{P}_{1}}\right\vert \left\vert {e}_{{\mathbb{I}}}\right\vert N+\mathop{\sum}\limits_{{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}\atop{{r}_{P}\ne 0}}\left\vert {r}_{P}\right\vert \left\vert {e}_{{P}_{1}^{{\prime} }}\right\vert N\quad [\because | {e}_{{\mathbb{I}}}| \le 1]\\ \le \left\vert {r}_{{P}_{1}}\right\vert N+\mathop{\sum}\limits_{{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}\atop{{r}_{P}\ne 0}}r\sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}N\qquad [\,{{\mbox{Using inequality 13}}}\,]\end{array}$$

(15)

From Eq. (14) we can also obtain the following lower bound.

$$\begin{array}{l}\left\vert Tr\left(E{C}_{0}{P}_{1}\right)\right\vert \,\,\ge \,\,\left\vert {r}_{{P}_{1}}\right\vert \left\vert {e}_{{\Bbb{I}}}\right\vert N-\mathop{\sum}\limits_{{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}\atop{{r}_{P}\ne 0}}\left\vert {r}_{P}\right\vert \left\vert {e}_{{P}_{1}^{{\prime} }}\right\vert N\\ \qquad\qquad\qquad\,\,\ge \,\,\left\vert {r}_{{P}_{1}}\right\vert (1-{\epsilon }^{2})N-\mathop{\sum}\limits_{{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}\atop{{r}_{P}\ne 0}}r\sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}N\quad [\,{{\mbox{Inequality 13 and 11}}}\,]\end{array}$$

(16)

Since $r=\frac{1}{\sqrt{M}}$, we prove the following inequalities.

$$\begin{array}{l}\frac{1-{\epsilon }^{2}}{\sqrt{M}}-\sqrt{M}\sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}\,\,\le \,\,\left\vert Tr\left(E{C}_{0}{P}_{1}\right)/N\right\vert \le \frac{1}{\sqrt{M}}+\sqrt{M}\sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}\quad [\,{{\mbox{if}}}\,\,{r}_{{P}_{1}}\ne 0]\\ \qquad\qquad\qquad {{\mbox{and}}}\,\quad 0\,\,\le \,\,\left\vert Tr\left(E{C}_{0}{P}_{1}\right)/N\right\vert \le \sqrt{M}\sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}\quad [\,{{\mbox{if}}}\,\,{r}_{{P}_{1}}=0]\end{array}$$

□

Basically, this theorem says that if E is close to identity then distribution of absolute-value-coefficients of EC₀ and C₀ in the Pauli basis expansion, is almost similar. In fact, we can have a more general theorem that can be deduced from the calculations in Theorem 3.1.

Theorem 3.2. Let $E\in {{{{\mathcal{U}}}}}_{n}$ be such that $\left\vert Tr\left(E\right)\right\vert \ge N\left(1-{\epsilon }^{2}\right)$, for some ϵ ≥ 0. $Q={\sum }_{P\in {{{{\mathcal{P}}}}}_{n}}{q}_{P}P$ is an n-qubit unitary. Then for each ${P}_{1}\in {{{{\mathcal{P}}}}}_{n}$,

$$\begin{array}{l}(1-{\epsilon }^{2})| {q}_{{P}_{1}}| -\mathop{\sum}\limits_{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}| {q}_{P}| \sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}\le \left\vert Tr\left(EQ{P}_{1}\right)/N\right\vert \le | {q}_{{P}_{1}}|\\ \qquad\qquad\quad + \mathop{\sum}\limits_{P\in {{{{\mathcal{P}}}}}_{n}\setminus \{{P}_{1}\}}| {q}_{P}| \sqrt{2{\epsilon }^{2}-{\epsilon }^{4}}.\end{array}$$

So we can define two sets ${{{{\mathcal{S}}}}}_{1}$ and ${{{{\mathcal{S}}}}}_{0}$ as follows.

$${{{{\mathcal{S}}}}}_{1}=\left\{\left\vert Tr\left(E{C}_{0}{P}_{1}\right)/N\right\vert :{r}_{{P}_{1}}\,\,\ne\,\, 0\right\}$$

(17)

$${{{{\mathcal{S}}}}}_{0}=\left\{\left\vert Tr\left(E{C}_{0}{P}_{1}\right)/N\right\vert :{r}_{{P}_{1}}=0\right\}$$

(18)

From our results so far, it follows that for small enough ϵ (which is usually the case in nearly all applications) the values in ${{{{\mathcal{S}}}}}_{1}$ are nearly equal, while those in ${{{{\mathcal{S}}}}}_{0}$ are nearly 0. Let $\Delta =\mathop{\max }\nolimits_{{t}_{1}\in {{{{\mathcal{S}}}}}_{1},{t}_{0}\in {{{{\mathcal{S}}}}}_{0}}({t}_{1}-{t}_{0})$. Then to get a positive difference we have the following.

$$\begin{array}{l}\Delta \,\,\ge \,\,\frac{1-{\epsilon }^{2}}{\sqrt{M}}-2\sqrt{M(2{\epsilon }^{2}-{\epsilon }^{4})} >\, 0\\ \quad\, \Rightarrow \, {\epsilon }^{4}-2{\epsilon }^{2}+\frac{1}{1+4{M}^{2}}\,\ge\, 0\end{array}$$

Solving this we obtain the following conditions.

$${\epsilon }^{2}\ge 1+\sqrt{1-\frac{1}{1+4{M}^{2}}}\quad \,{{\mbox{or}}}\,\quad {\epsilon }^{2}\le 1-\sqrt{1-\frac{1}{1+4{M}^{2}}}$$

(19)

Since usually ϵ < 1, so we consider the second inequality. Expanding the term in the square root we obtain

$${\epsilon }^{2}\le \frac{1}{2}\cdot \frac{1}{1+4{M}^{2}}+\frac{1}{2!}\cdot \frac{1}{2\cdot 2}{\left(\frac{1}{1+4{M}^{2}}\right)}^{2}+\frac{1}{3!}\cdot \frac{3}{2\cdot 2\cdot 2}{\left(\frac{1}{1+4{M}^{2}}\right)}^{3}+\ldots$$

Since this function decreases with M and 1 ≤ M ≤ N², so we can say that

$${\epsilon }^{2}\le 0.105572809\, \Rightarrow \ \epsilon \le 0.3249196962.$$

(20)

For all practical purposes, the value of ϵ is much smaller than this. So we can easily distinguish the sets ${{{{\mathcal{S}}}}}_{0}$ and ${{{{\mathcal{S}}}}}_{1}$.

Algorithm Now we are in a position to describe our exhaustive search algorithm, ${{{{\mathcal{A}}}}}_{MIN}$ (Algorithm 1), that determines the ϵ-T-count of a unitary $W\in {{{{\mathcal{U}}}}}_{n}$. This is an iterative procedure, where in every iteration we decide if ${{{{\mathcal{T}}}}}_{\epsilon }(W)=m$ for increasing values of a variable m.

Algorithm 1: ${{{{\mathcal{A}}}}}_{MIN}$

Algorithm 2: ${{{{\mathcal{A}}}}}_{DECIDE}$

Algorithm 3: ${{{{\mathcal{A}}}}}_{CONJ}$

The main idea to solve the decision version is as follows. Suppose we have to test if ${{{{\mathcal{T}}}}}_{\epsilon }(W)=m$. From the definitions given in the “Preliminaries” section, we know that if this is true then $\exists U\in {{{{\mathcal{J}}}}}_{n}$ such that ${{{\mathcal{T}}}}(U)\,=\,m$. Let $U=\left(\mathop{\prod }\nolimits_{i = m}^{1}{U}_{i}\right){C}_{0}$ where ${C}_{0}\in {{{{\mathcal{C}}}}}_{n}$ and ${U}_{i}\in \{R(P):P\,\ne\, {\mathbb{I}}\}$. Let $\widetilde{U}=\mathop{\prod }\nolimits_{i = m}^{1}{U}_{i}$. To test if we have guessed the correct $\widetilde{U}$, we can apply the results deduced in the previous section. Specifically we calculate ${W}^{{\prime} }={W}^{{\dagger} }\widetilde{U}$ and then calculate the set ${{{{\mathcal{S}}}}}_{c}=\left\{| Tr({W}^{{\prime} }P)/N| :P\in {{{{\mathcal{P}}}}}_{n}\right\}$, of coefficients. Then we check if we can distinguish two subsets ${{{{\mathcal{S}}}}}_{0}$ and ${{{{\mathcal{S}}}}}_{1}$, as shown in Eqs. (17) and (18), for some 1 ≤ M ≤ N². Further details have been given in Algorithm 2. Let us call this the amplitude test. After passing this test we have a unitary of the form E^†Q where Q is a unitary. This test sort of filters out the approximate values of the coefficients of Q in the Pauli basis (Theorem 3.2). So after passing this test Q will be a unitary with equal or nearly equal amplitudes or coefficients (absolute value) at some points and zero or nearly zero at other points. To ensure Q is a Clifford, i.e., ${W}^{{\prime} }={E}^{{\dagger} }{C}_{0}$ for some ${C}_{0}\in {{{{\mathcal{C}}}}}_{n}$, we perform the conjugation test (Algorithm 3) for further verification.

Theorem 3.3. Let $E,Q\in {{{{\mathcal{U}}}}}_{n}$ such that $d(E,{\mathbb{I}})\le \epsilon$. ${P}^{{\prime}}\in {{{{\mathcal{P}}}}}_{n}\setminus \{{\mathbb{I}}\}$ such that $Q{P}^{{\prime}}{Q}^{{\dagger}}={\sum}_{P}{\alpha}_{P}P$, where ${\alpha}_{P}\in {\mathbb{C}}$. Then for each ${P}^{{\prime}{\prime}}\in {{{{\mathcal{P}}}}}_{n}$,

$$\begin{array}{l}\min \left\{\right.0,| {\alpha }_{{P}^{{\prime}{\prime} }}| (1-4{\epsilon }^{2}+2{\epsilon }^{4})\\-2\epsilon \mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}| \left.\right\}\, \le \,\left\vert Tr\left(({E}^{{\dagger} }Q{P}^{{\prime} }{Q}^{{\dagger} }E){P}^{{\prime}{\prime} }\right)\right\vert /N\\ \qquad\qquad\qquad\quad\le \max \{| {\alpha }_{{P}^{{\prime}{\prime} }}| +2\epsilon \mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}| ,1\}\end{array}$$

Proof. We have the following.

$$\begin{array}{l}{E}^{{\dagger} }(Q{P}^{{\prime} }{Q}^{{\dagger} })E\,=\,\mathop{\sum}\limits_{\hat{P}}| {e}_{\hat{P}}{| }^{2}\hat{P}\left(\mathop{\sum}\limits_{P}{\alpha }_{P}P\right)\hat{P}+\mathop{\sum}\limits_{\hat{P}\ne \tilde{P}}\overline{{e}_{\hat{P}}}{e}_{\tilde{P}}\hat{P}\left(\mathop{\sum}\limits_{P}{\alpha }_{P}P\right)\tilde{P}\\ \qquad \qquad \qquad\,=\,| {e}_{{\mathbb{I}}}{| }^{2}\mathop{\sum}\limits_{P}{\alpha }_{P}P+\mathop{\sum}\limits_{\hat{P}\ne {\mathbb{I}},P}| {e}_{\hat{P}}{| }^{2}{\alpha }_{P}\hat{P}P\hat{P}+\mathop{\sum}\limits_{\hat{P}\ne \tilde{P},P}\overline{{e}_{\hat{P}}}{e}_{\tilde{P}}{\alpha }_{P}\hat{P}P\tilde{P}\end{array}$$

(21)

Multiplication by P^″ gives us the following.

$$\begin{array}{l}({E}^{{\dagger} }Q{P}^{{\prime} }{Q}^{{\dagger} }E){P}^{{\prime}{\prime} }\,=\,| {e}_{{\mathbb{I}}}{| }^{2}{\alpha }_{{P}^{{\prime}{\prime} }}{\mathbb{I}}+| {e}_{{\mathbb{I}}}{| }^{2}\mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}{\alpha }_{P}P{P}^{{\prime}{\prime} }+\mathop{\sum}\limits_{\hat{P}\ne {\mathbb{I}}}| {e}_{\hat{P}}{| }^{2}{\alpha }_{{P}^{{\prime}{\prime} }}(\pm {\mathbb{I}})\\ \qquad\qquad\qquad\qquad +\mathop{\sum}\limits_{\hat{P}\ne {\mathbb{I}},P\ne {P}^{{\prime}{\prime} }}| {e}_{\hat{P}}{| }^{2}{\alpha }_{P}\hat{P}P\hat{P}{P}^{{\prime}{\prime} }+\mathop{\sum}\limits_{\hat{P}\ne \tilde{P}}\overline{{e}_{\hat{P}}}{e}_{\tilde{P}}{\alpha }_{{P}^{{\prime}{\prime} }}\hat{P}{P}^{{\prime}{\prime} }\tilde{P}{P}^{{\prime}{\prime} }\\ \qquad\qquad\qquad\qquad +\mathop{\sum}\limits_{\hat{P}\ne \tilde{P},P\ne {P}^{{\prime}{\prime} }}\overline{{e}_{\hat{P}}}{e}_{\tilde{P}}{\alpha }_{P}\hat{P}P\tilde{P}{P}^{{\prime}{\prime} }\end{array}$$

So,

$$\begin{array}{l}\left\vert Tr\left(({E}^{{\dagger} }Q{P}^{{\prime} }{Q}^{{\dagger} }E){P}^{{\prime}{\prime} }\right)/N\right\vert \,\le \,| {\alpha }_{{P}^{{\prime}{\prime} }}| \mathop{\sum}\limits_{\hat{P}}| {e}_{\hat{P}}{| }^{2}\\ \qquad \qquad\qquad\qquad\qquad\qquad +\mathop{\sum}\limits_{\hat{P}\ne \tilde{P},P\ne {P}^{{\prime}{\prime} }}| \overline{{e}_{\hat{P}}}{e}_{\tilde{P}}{\alpha }_{P}| \left\vert Tr(\hat{P}P\tilde{P}{P}^{{\prime}{\prime} })/N\right\vert \\ \qquad \qquad\qquad\qquad\qquad\qquad =\,| {\alpha }_{{P}^{{\prime}{\prime} }}| +\mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}| \mathop{\sum}\limits_{\hat{P}\ne \tilde{P}}| \overline{{e}_{\hat{P}}}{e}_{\tilde{P}}| \left\vert Tr(\hat{P}P\tilde{P}{P}^{{\prime}{\prime} })/N\right\vert \qquad [\,{{\mbox{Fact}}}\,\,2.2]\end{array}$$

Given ${P}^{{\prime}{\prime} },P,\hat{P}$, we can have $\hat{P}P\tilde{P}{P}^{{\prime}{\prime} }=\pm {\mathbb{I}}$ for one particular value of $\tilde{P}$.

$$\begin{array}{l}\left\vert Tr\left(({E}^{{\dagger} }Q{P}^{{\prime} }{Q}^{{\dagger} }E){P}^{{\prime}{\prime} }\right)/N\right\vert \le | {\alpha }_{{P}^{{\prime}{\prime} }}| +\mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}| \mathop{\sum}\limits_{\hat{P}}| \overline{{e}_{\hat{P}}}| | {e}_{\hat{{P}^{{\prime} }}}| \\ \qquad [\hat{{P}^{{\prime} }}\,\ne\, \hat{P}\,\,{{\mbox{is such that}}}\,\,\hat{P}P\hat{{P}^{{\prime} }}{P}^{{\prime}{\prime} }=\pm \,{\mathbb{I}}]\end{array}$$

Let ${\mathbb{I}}P\hat{{P}_{0}^{{\prime} }}{P}^{{\prime}{\prime} }=\pm {\mathbb{I}}$ and $\hat{{P}_{0}}P{\mathbb{I}}{P}^{{\prime}{\prime} }=\pm {\mathbb{I}}$, for some Paulis $\hat{{P}_{0}^{{\prime} }},\hat{{P}_{0}}\in {{{{\mathcal{P}}}}}_{n}\setminus \{{\mathbb{I}}\}$. Then we can write

$$\begin{array}{l}\left\vert Tr\left(({E}^{{\dagger} }Q{P}^{{\prime} }{Q}^{{\dagger} }E){P}^{{\prime}{\prime} }\right)/N\right\vert \,\le \,| {\alpha }_{{P}^{{\prime}{\prime} }}| +\mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}| \left(| {e}_{{\Bbb{I}}}| | {e}_{\hat{{P}_{0}^{{\prime} }}}| +| {e}_{\hat{{P}_{0}}}| | {e}_{{\mathbb{I}}}| +\mathop{\sum}\limits_{\hat{P}\ne {\mathbb{I}},\hat{{P}_{0}}}| {e}_{\hat{P}}| | {e}_{\hat{{P}^{{\prime} }}}| \right)\\ \qquad\qquad\qquad\qquad\qquad\qquad\,\le \,| {\alpha }_{{P}^{{\prime}{\prime} }}| +\mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}| \left(| {e}_{\hat{{P}_{0}^{{\prime} }}}| +| {e}_{\hat{{P}_{0}}}| +\mathop{\sum}\limits_{\hat{P}\ne {\mathbb{I}},\hat{{P}_{0}}}| {e}_{\hat{P}}| | {e}_{\hat{{P}^{{\prime} }}}| \right)\quad [\,{{\mbox{Equation 11}}}]\end{array}$$

In Supplementary Note 2 we show that ${\sum }_{\hat{P}\ne {\mathbb{I}},\hat{{P}_{0}}}| {e}_{\hat{P}}| | {e}_{\hat{{P}^{{\prime} }}}| \le (2{\epsilon }^{2}-{\epsilon }^{4})$. We observe that in the above inequality we have taken $| {e}_{{\mathbb{I}}}| \le 1$, but if $| {e}_{{\mathbb{I}}}| =1$ then ∣e_P∣ = 0 for any $P\,\ne \, {\mathbb{I}}$, since ∑_P∣e_P∣² = 1. To get non-zero values for the sum within bracket $| {e}_{{\mathbb{I}}}| < 1$. If we have to maximize $| {e}_{\hat{{P}_{0}^{{\prime} }}}| +| {e}_{\hat{{P}_{0}}}|$ given Eq. (12), then if we consider an optimization problem with these two variables only, then it is not difficult to see that the maximum occurs if they have the same value. That is $| {e}_{\hat{{P}_{0}}}| +| {e}_{\hat{{P}_{0}^{{\prime} }}}| \le 2\sqrt{\frac{2{\epsilon }^{2}-{\epsilon }^{4}}{2}}\approx 2\epsilon$. Ignoring higher order terms of ϵ, we can write the following.

$$\left\vert Tr\left(({E}^{{\dagger} }Q{P}^{{\prime} }{Q}^{{\dagger} }E){P}^{{\prime}{\prime} }\right)/N\right\vert \le | {\alpha }_{{P}^{{\prime}{\prime} }}| +2\epsilon \mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}|$$

We also have the following lower bound using similar reasoning as above.

$$\begin{array}{l}\left\vert Tr\left(({E}^{{\dagger} }Q{P}^{{\prime} }{Q}^{{\dagger} }E){P}^{{\prime}{\prime} }\right)/N\right\vert \,\ge \,| {\alpha }_{{P}^{{\prime}{\prime} }}| \left(| {e}_{{\mathbb{I}}}{| }^{2}-\mathop{\sum}\limits_{\hat{P}\ne {\mathbb{I}}}| {e}_{\hat{P}}{| }^{2}\right)\\\qquad\qquad\qquad\qquad\qquad\qquad-\mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}| \mathop{\sum}\limits_{\hat{P}\ne \tilde{P}}| \overline{{e}_{\hat{P}}}{e}_{\tilde{P}}| \left\vert Tr(\hat{P}P\tilde{P}{P}^{{\prime}{\prime} })/N\right\vert \\ \qquad\qquad\qquad\qquad\qquad\qquad \ge \,| {\alpha }_{{P}^{{\prime}{\prime} }}| (1-4{\epsilon }^{2}+2{\epsilon }^{4})-2\epsilon \mathop{\sum}\limits_{P\ne {P}^{{\prime}{\prime} }}| {\alpha }_{P}| \end{array}$$

□

And we have the following corollary.

Corollary 3.1. Let ${C}_{0}\in {{{{\mathcal{C}}}}}_{n}$ and ${P}^{{\prime} }\in {{{{\mathcal{P}}}}}_{n}$ such that ${C}_{0}{P}^{{\prime} }{C}_{0}^{{\dagger} }=\tilde{P}\in {{{{\mathcal{P}}}}}_{n}$. If $E\in {{{{\mathcal{U}}}}}_{n}$ such that $d(E,{\Bbb{I}})\le \epsilon$, then

$$\begin{array}{l}(1-4{\epsilon }^{2}+2{\epsilon }^{4})\,\,\le \,\,\left\vert Tr\left(({E}^{{\dagger} }{C}_{0}{P}^{{\prime} }{C}_{0}^{{\dagger} }E){P}^{{\prime}{\prime} }\right)\right\vert /N\le 1\qquad if\,{P}^{{\prime}{\prime} }=\tilde{P}\\ \qquad\qquad\qquad 0\,\,\le \,\,\left\vert Tr\left(({E}^{{\dagger} }{C}_{0}{P}^{{\prime} }{C}_{0}^{{\dagger} }E){P}^{{\prime}{\prime} }\right)\right\vert /N\le 2\epsilon \qquad else.\end{array}$$

The above theorem and corollary basically says that EC₀ (or E^†C₀) approximately inherits the conjugation property of C₀. For each ${P}^{{\prime} }\in {{{{\mathcal{P}}}}}_{n}$, if we expand ${C}_{0}{P}^{{\prime} }{C}_{0}^{{\dagger} }$ in the Pauli basis then the absolute value of the coefficients has value 1 at one point, 0 in the rest. If we expand $E{C}_{0}{P}^{{\prime} }{C}_{0}^{{\dagger} }{E}^{{\dagger} }$ in the Pauli basis then one of the coefficients (absolute value) will be almost 1 and the rest will be almost 0. From Theorem 3.3 this pattern will not show for at least one Pauli ${P}^{{\prime}{\prime}{\prime} }\in {{{{\mathcal{P}}}}}_{n}$ if we have E^†Q, where $Q\,\notin \, {{{{\mathcal{C}}}}}_{n}$. If we expand $EQ{P}^{{\prime}{\prime}{\prime} }{Q}^{{\dagger} }{E}^{{\dagger} }$ or ${E}^{{\dagger} }Q{P}^{{\prime}{\prime}{\prime} }{Q}^{{\dagger} }E$ in the Pauli basis then the spike in the amplitudes will be in at least two points. Also, we observe that 2ϵ < (1 − 4ϵ² + 2ϵ⁴) for any ϵ ≤ 0.31. Thus there exists a distinguishable gap between the two cases of Corollary 3.1. For all practical purposes ϵ is much less than this value.

Synthesizing T-count-optimal circuits

So far we have been able to determine ${{{{\mathcal{T}}}}}_{\epsilon }(W)$ for any $W\in {{{{\mathcal{U}}}}}_{n}$. We now describe how we can synthesize ϵ-T-count-optimal circuit for W, using the above algorithms. It is easy to see that ${{{{\mathcal{A}}}}}_{DECIDE}$ can return a sequence {U_m, …, U₁} of unitaries such that $U=\left(\mathop{\prod }\nolimits_{i = {{{\mathcal{T}}}}(U)}^{1}{U}_{i}\right){C}_{0}{e}^{i\phi }$ (for some ${C}_{0}\in {{{{\mathcal{C}}}}}_{n}$) and ${{{\mathcal{T}}}}(U)={{{{\mathcal{T}}}}}_{\epsilon }(W)$. We can efficiently construct circuits for each ${U}_{i}\in \{R(P):P\,\ne \, {\mathbb{I}}\}$ using Fact 1 of Supplementary Note 1. So what remains, is to determine C₀. Then we can efficiently construct a circuit for it, for example, by using the results in ref. ⁵³.

If W = UE then at step 3 of Algorithm 2 we calculate ${W}^{{\prime} }={W}^{{\dagger} }\widetilde{U}={e}^{-i\phi }{E}^{{\dagger} }{C}_{0}^{{\dagger} }$, where $\widetilde{U}=\mathop{\prod }\nolimits_{i = m}^{1}{U}_{i}$. From Algorithm 2 we can also obtain the following information: (1) set ${{{{\mathcal{S}}}}}_{1}$, as defined in Eq. (17), (2) $M=| {{{{\mathcal{S}}}}}_{1}|$. Thus we can calculate $r=\frac{1}{\sqrt{M}}$ and from step 4 we can actually calculate the set $\widetilde{{{{{\mathcal{S}}}}}_{1}}=\left\{({t}_{P},P):| {t}_{P}| =| Tr({W}^{{\prime} }P)/N| \in {{{{\mathcal{S}}}}}_{1}\right\}$. From Eq. (14) we can say that for small enough ϵ (say $< < \frac{1}{M}$) we have

$$\frac{Tr\left({E}^{{\dagger} }{C}_{0}^{{\dagger} }{P}_{1}\right)}{Tr\left({E}^{{\dagger} }{C}_{0}^{{\dagger} }{P}_{2}\right)}\approx \frac{\overline{{r}_{{P}_{1}}}}{\overline{{r}_{{P}_{2}}}}$$

(22)

We perform the following steps.

1.
Calculate ${a}_{P}=\frac{{t}_{P}}{{t}_{{\mathbb{I}}}}=\frac{Tr({W}^{{\prime} }P)}{Tr({W}^{{\prime} })}$, where $({t}_{P},P)\in \widetilde{{{{{\mathcal{S}}}}}_{1}}$ (or equivalently $| {t}_{P}| \in {{{{\mathcal{S}}}}}_{1}$). We must remember that $({t}_{{\mathbb{I}}},{\mathbb{I}})\in \widetilde{{{{{\mathcal{S}}}}}_{1}}$.

We explained that from Eq. (14), ${a}_{P}\approx \frac{\overline{{r}_{P}}}{\overline{{r}_{{\Bbb{I}}}}}$. From Fact 2.1 we know that $\frac{| \overline{{r}_{P}}| }{| \overline{{r}_{{\Bbb{I}}}}| }=\frac{| {r}_{P}| }{| {r}_{{\Bbb{I}}}| }=1$. So we adjust the fractions such that their absolute value is 1. For small enough ϵ this adjustment is not much and so with a slight abuse we use the same notation for the adjusted values.
2.
Select $c,d\in {\mathbb{R}}$ such that c² + d² = r². Let $\widetilde{{r}_{{\mathbb{I}}}}=c+di$. Then we claim that the Clifford $\widetilde{{C}_{0}}=\widetilde{{r}_{{\mathbb{I}}}}{\sum }_{P:{r}_{P}\ne 0}\overline{{a}_{P}}P$ is sufficient for our purpose.

It is not hard to see that $\widetilde{{C}_{0}}={e}^{i\phi {\prime} }{C}_{0}$ for some $\phi {\prime} \in [0,2\pi )$. Thus if ${U}^{{\prime} }=\left(\mathop{\prod }\nolimits_{i = m}^{1}{U}_{i}\right)\widetilde{{C}_{0}}$, then ${{{\mathcal{T}}}}({U}^{{\prime} })={{{\mathcal{T}}}}(U)$ and $d({U}^{{\prime} },W)\le \epsilon$.

Time complexity: worst case analysis

We first analyse the time complexity of ${{{{\mathcal{A}}}}}_{CONJ}$. The outer and inner loop at steps 2 and 6, respectively, can run at most N²*N² = N⁴ times, where N = 2ⁿ. At step 7 multiplication of four N × N matrices take O(N²) time and calculating trace takes N time steps. So overall complexity at step 7 is dominated by O(N²). We note that at step 13 we do not need to calculate the product and trace again. In the worst case every loop at steps 2 and 6 are implemented, incurring an overall time complexity of O(N⁶).

Now we analyse the time complexity of ${{{{\mathcal{A}}}}}_{DECIDE}$, when testing for a particular T-count m. The algorithm loops over all possible products of m unitaries U_j, which is R(P) in case of T-count-decision. Since there can be N² − 1 non-identity Paulis P, so this loop happens at most N^2m times. Now in each such loop we do m matrix multiplications at step 2 and 3. This has time complexity O(mN²). At step 4 we make a list of N² real numbers. Each is obtained by multiplying two N × N matrices and then taking trace. So time complexity for making this list is O(N⁴). Sorting this list takes time O(nN²). The inner loop 5-13 happens N² times. Each of the list elements is checked and so step 8 has complexity O(N²). Now let within the inner loop the conjugation test is called k₁ times. So the loop 5-13 incurs a complexity O(k₁ ⋅ N⁶ + (N² − k₁)N²), when k₁ > 0, else it is O(N⁴). Let k is the number of outer loops (steps 2–14) for which conjugation test is invoked in the inner loop 5-13 and k₁ is the maximum number of times this test is called within any 5-13 inner loop. Then the overall complexity of ${{{{\mathcal{A}}}}}_{DECIDE}$ is O((N^2m − k) ⋅ (mN² + N⁴ + nN² + N⁴) + k ⋅ (mN² + N⁴ + nN² + k₁N⁶ + (N² − k₁)N²)) ∈ O((N^2m − k) ⋅ N⁴ + k ⋅ (k₁N⁶ + (N² − k₁)N²)), assuming m < N².

The conjugation test is invoked only if a unitary passes the amplitude test. We assume that the occurrence of non-Clifford unitaries with equal amplitude is not so frequent such that kk₁ < N^2m − k. (We did check this in our implementations.) Thus ${{{{\mathcal{A}}}}}_{DECIDE}$ has a complexity of O(N^2m+4), for one particular value of m. Hence, the overall algorithm ${{{{\mathcal{A}}}}}_{MIN}$ has a time complexity $O({N}^{2{{{{\mathcal{T}}}}}_{\epsilon }(W)+4})\in O({2}^{2n{{{{\mathcal{T}}}}}_{\epsilon }(W)+4n})$, with the given assumption.

Time complexity: practical considerations

1.
It is not hard to see that if [P₁, P₂] = P₁P₂ − P₂P₁ = 0 then $R({P}_{1})R({P}_{2})=(\alpha {\mathbb{I}}+\beta {P}_{1})(\alpha {\mathbb{I}}+\beta {P}_{2})=R({P}_{2})R({P}_{1})$, where $\alpha =\frac{1}{2}\left(1+{e}^{i\pi /4}\right)$ and $\beta =\frac{1}{2}\left(1-{e}^{i\pi /4}\right)$. Thus
$$\left(\mathop{\prod}\limits_{i}R({P}_{i})\right)R({P}_{1})R({P}_{2})\left(\mathop{\prod}\limits_{j}R({P}_{j})\right)=\left(\mathop{\prod}\limits_{i}R({P}_{i})\right)R({P}_{2})R({P}_{1})\left(\mathop{\prod}\limits_{j}R({P}_{j})\right).$$

So in step 2 of Algorithm 2 we need not loop over all m-length products of R(P). It is easy to check if two n-qubit Paulis commute. There are even number of places where the respective 1-qubit Paulis are non-identity and different. We need not go into actual matrix multiplications. This can speed-up the actual running time by orders of magnitude. For example, for the unitaries considered by us, we obtained a speed-up of 5–10 times. In fact, it may be possible to show that the asymptotic complexity also decreases. One can work out more such symmetries in order to prune the search space.
2.
We already made an assumption that the number of non-Cliffords that pass the amplitude test is much less. Even if such a unitary is tested in ${{{{\mathcal{A}}}}}_{CONJ}$, we need not loop over N⁴ times. As soon as there are 2 spikes for any outer loop Pauli P_out, the program exits (step 10). If a non-spike is “far enough” from 0 then also the program exits (step 14). So in most cases testing a non-Clifford with equal amplitudes take less time. If there is Clifford then all the N⁴ loops have to run, but then it implies that ${{{{\mathcal{A}}}}}_{DECIDE}$ has obtained T-count.
3.
Most of the matrix multiplications, especially by Paulis are sparse, so here the run-time complexity is less. In step 2 of Algorithm 2 one has to repeatedly multiply a unitary U by $R(P)=\alpha {\mathbb{I}}+\beta P$. Since P is sparse, we can first multiply U by P, then multiply each non-zero off-diagonal element by β and finally add α to the diagonal. This can reduce some practical running time.

Space complexity

The input to our algorithm is a N × N unitary, with space complexity O(N²). In step 1 of ${{{{\mathcal{A}}}}}_{DECIDE}$, we can store the single qubit Paulis and calculate R(P) whenever required. We require O(N²) space to perform matrix multiplication of two N × N matrices. In ${{{{\mathcal{A}}}}}_{DECIDE}$, we either store a N × N matrix or a list of N² real numbers (step 4). Even in ${{{{\mathcal{A}}}}}_{CONJ}$ we store one N × N matrix. Hence the overall space complexity is O(N²) ∈ O(2²ⁿ), without storing R(P). This increases running time because we have to calculate the n-qubit Paulis and R(P) repeatedly. But the asymptotic time complexity remains unchanged.

If we store the n-qubit Paulis or R(P), then we require O(N⁴) space. This factor dominates and overall space complexity is O(N⁴) ∈ O(2⁴ⁿ). In this approach, the actual running time reduces but the asymptotic time complexity remains same.

T-depth-optimal synthesis

The algorithms 1 (${{{{\mathcal{A}}}}}_{MIN}$) and 2 (${{{{\mathcal{A}}}}}_{DECIDE}$) can be used for T-depth-optimal-synthesis of any multi-qubit unitary, since we know there is a finite generating set²⁷ such that any exactly implementable unitary can be written as a product of elements from this set and a Clifford. We first give some definitions.

Definition 3.1 (T-depth of circuits). Suppose the unitary U implemented by a circuit is written as a product U = U_mU_m−1…U₁ such that each U_i can be implemented by a circuit in which all the gates can act in parallel or simultaneously. We say U_i has depth 1 and m is the depth of the circuit. The T-depth of a circuit is the number of unitaries U_i where the T/T^† gate is the only non-Clifford gate and all the T/T^† gates can act in parallel. (The remaining Clifford gates within each U_i may not act in parallel.)

Definition 3.2 (T-depth of exactly implementable unitaries). The T-depth or min-T-depth of an exactly synthesizable unitary $U\in {{{{\mathcal{J}}}}}_{n}$, denoted by ${{{{\mathcal{T}}}}}_{d}(U)$, is the minimum T-depth of a Clifford+T circuit that implements it (up to a global phase).

In ref. ²⁷ a subset, ${{\mathbb{V}}}_{n}\subset \{{\prod }_{i\in [n]}C{\overline{T}}_{(i)}{C}^{{\dagger} },C\in {{{{\mathcal{C}}}}}_{n},\overline{T}\in \{{{{\rm{T}}}},{{{{\rm{T}}}}}^{{\dagger} },{\mathbb{I}}\}\}$, of T-depth-1 unitaries has been defined. It has been shown that $| {{\mathbb{V}}}_{n}| \le n\cdot {2}^{5.6n}$ and any T-depth-1 unitary ${U}_{1}\in {{{{\mathcal{J}}}}}_{n}$ can be written as

$${U}_{1}={e}^{i\phi }\left(\mathop{\prod}\limits_{i\ge 1}{V}_{i}\right){C}_{0}\qquad \,{{\mbox{where}}}\,\,{V}_{i}\in {{\mathbb{V}}}_{n}\,\,{{\mbox{and}}}\,\,{C}_{0}\in {{{{\mathcal{C}}}}}_{n}$$

(23)

We call each V_i, with T-depth 1, as a (parallel) block and it can be written as product of R(P) or R^†(P), where $P\in \pm\, {{{{\mathcal{P}}}}}_{n}$. It is possible to multiply consecutive T-depth-1 unitaries to get another T-depth-1 unitary (conditions given in ref. ²⁷). Thus ${{\mathbb{V}}}_{n}$ can be regarded as a generating set (modulo Clifford) for the set of T-depth-1 unitaries, and hence for the complete group ${{{{\mathcal{J}}}}}_{n}$. A decomposition which has the minimum number of T-depth-1 unitaries is called a T-depth-optimal decomposition. A circuit implementing $U\in {{{{\mathcal{J}}}}}_{n}$ with the minimum T-depth is called a T-depth-optimal circuit.

Definition 3.3 (ϵ-T-depth of approximately implementable unitaries). The ϵ-T-depth of an approximately implementable unitary $W\in {{{{\mathcal{U}}}}}_{n}$, denoted by ${{{{\mathcal{T}}}}}_{d\epsilon }(W)$, is equal to ${{{{\mathcal{T}}}}}_{d}(U)$, the T-depth of an exactly implementable unitary $U\in {{{{\mathcal{J}}}}}_{n}$ such that d(U,W) ≤ ϵ and ${{{{\mathcal{T}}}}}_{d}(U)\le {{{{\mathcal{T}}}}}_{d}({U}^{{\prime} })$ for any ${U}^{{\prime} }\in {{{{\mathcal{J}}}}}_{n}$ and $d({U}^{{\prime} },W)\le \epsilon$.

We call a T-count-optimal (or T-depth-optimal) circuit for any such U as the ϵ-T-count-optimal (or ϵ-T-depth-optimal, respectively) circuit for W.

Modification of ${{{{\mathcal{A}}}}}_{MIN}$

Since the set ${{\mathbb{V}}}_{n}$ is finite, so it is not hard to see that algorithms ${{{{\mathcal{A}}}}}_{MIN}$ and ${{{{\mathcal{A}}}}}_{DECIDE}$ can be applied to find T-depth-optimal decomposition of any unitary $W\in {{{{\mathcal{U}}}}}_{n}$. Replace step 1 of Algorithm 2 by ${{{\mathcal{S}}}}\leftarrow {{\mathbb{V}}}_{n}$. Suppose W = UE for some exactly implementable unitary U such that ${{{{\mathcal{T}}}}}_{d\epsilon }(W)={{{{\mathcal{T}}}}}_{d}(U)$ and d(W,U) ≤ ϵ. Then we can decompose U as in Eq. (23). If we have guessed the correct V₁, …, V_d then after multiplying ∏_iV_i with W^† we are left with EC₀. Now the amplitude test and conjugation test can be applied to check if we have the correct guess. We have said before that it is possible to multiply consecutive V_i such that the product has T-depth 1. In that case the ϵ-T-depth is less than d. So to find the minimum possible T-depth we might have to iterate more than ${{{{\mathcal{T}}}}}_{d\epsilon }(W)$ times.

Time complexity

The time complexity of conjugation test ${{{{\mathcal{A}}}}}_{CONJ}$ is same as before. The analysis of the complexity of ${{{{\mathcal{A}}}}}_{DECIDE}$ is also similar, but here $| {{{\mathcal{S}}}}| =| {{\Bbb{V}}}_{n}|$, so if we take all possible $m^{\prime}$-length product at step 2, then the number of iterations for the outer loop 2-14 is at most $n{2}^{5.6nm^{\prime} }$. The complexity of the remaining steps are same, so the overall complexity of ${{{{\mathcal{A}}}}}_{DECIDE}$ is $O(n{2}^{5.6nm^{\prime} +4n})$. We explained before that it is possible to combine more than one consecutive unitaries ${V}_{i}\in {{\mathbb{V}}}_{n}$ such that we get one T-depth 1 unitary. Thus this procedure gives a T-depth $m\le m^{\prime}$. We do not know how much is the difference $m^{\prime} -m$.

Alternatively, we can do a pessimistic analysis of ${{{{\mathcal{A}}}}}_{DECIDE}$. This algorithm is basically an exhaustive search procedure to test for a certain T-depth m. Let in step 2 we make sure that we have a T-depth-m unitary $\tilde{U}$, i.e., it is not possible to combine any further. Basically, this means $\tilde{U}$ is from the set of T-depth-1 unitaries modulo Clifford. Now there can be at most $O({4}^{{n}^{2}})$ of these. This is because there can be at most $O({4}^{{n}^{2}})$n-length product of R(P). This is a naive bound and more explanations can be found in ref. ²⁷. So this time the outer loop can occur at most $O({4}^{{n}^{2}m})$ times. Arguing in the same way, the complexity of ${{{{\mathcal{A}}}}}_{DECIDE}$ is $O({2}^{2{n}^{2}m+4n})$, and hence complexity of ${{{{\mathcal{A}}}}}_{MIN}$ is $O({2}^{2{n}^{2}{{{{\mathcal{T}}}}}_{d\epsilon }(W)+4n})$.

Space complexity

In step 1 of ${{{{\mathcal{A}}}}}_{DECIDE}$ we can store $| {{\mathbb{V}}}_{n}|$ in a symbolic way, for example, for each ${V}_{i}={\prod }_{j}R({P}_{j})\in {{\mathbb{V}}}_{n}$, simply store the Paulis in the product. Then we can calculate the necessary matrices whenever necessary by taking products of the corresponding R(P)s. In all other steps we store N × N matrix, taking at most O(N²) ∈ O(2²ⁿ) space. Thus space complexity is O(n2^5.6n). As explained before this approach leads to more running time, without affecting the asymptotic time complexity.

In this paper, we do not implement our algorithm to determine ϵ-T-depth. For small enough ϵ, we can use the procedure chalked out in the “An exponential time and polynomial space algorithm” section to synthesize a T-depth-optimal circuit.

Data availability

Numerical results together with instructions on how to reproduce them, are available online at https://github.com/vsoftco/approx-t.

Code availability

The code is available from the corresponding author on request.

References

Feynman, R. P. Simulating physics with computers. Int. J. Theor. Phys. 21, 467–488 (1982).
Shor, P. W. Algorithms for quantum computation: discrete logarithms and factoring. in Proc. of the 35th Ann. Symp. on Foundations of Computer Science 124–134 (IEEE,1994).
Shor, P. W. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 41, 303–332 (1999).
Article MathSciNet MATH Google Scholar
Grover, L. K. A fast quantum mechanical algorithm for database search. in Proc. of the 28th Ann. ACM Symp. on Theory of Computing 212–219 (1996).
Zhou, X., Leung, D. W. & Chuang, I. L. Methodology for quantum logic gate construction. Phys. Rev. A 62, 052316 (2000).
Article Google Scholar
Bravyi, S. & Kitaev, A. Universal quantum computation with ideal Clifford gates and noisy ancillas. Phys. Rev. A 71, 022316 (2005).
Article MathSciNet MATH Google Scholar
Fowler, A. G., Stephens, A. M. & Groszkowski, P. High-threshold universal quantum computation on the surface code. Phys. Rev. A 80, 052312 (2009).
Article Google Scholar
Aliferis, P., Gottesman, D. & Preskill, J. Quantum accuracy threshold for concatenated distance-3 codes. Quantum Inf. Comput. 6, 97–165 (2006).
MathSciNet MATH Google Scholar
Bravyi, S. & Gosset, D. Improved classical simulation of quantum circuits dominated by Clifford gates. Phys. Rev. Lett. 116, 250501 (2016).
Article Google Scholar
Bravyi, S., Smith, G. & Smolin, J. A. Trading classical and quantum computational resources. Phys. Rev. X 6, 021043 (2016).
Google Scholar
Paetznick, A. & Reichardt, B. W. Universal fault-tolerant quantum computation with only transversal gates and error correction. Phys. Rev. Lett. 111, 090505 (2013).
Article Google Scholar
Kitaev, A. Y. Fault-tolerant quantum computation by anyons. Ann. Phys. 303, 2–30 (2003).
Article MathSciNet MATH Google Scholar
Fowler, A. G. Time-optimal quantum computation. Preprint at https://arXiv.org/quant-ph/1210.4626 (2012).
Amy, M. et al. Estimating the cost of generic quantum pre-image attacks on SHA-2 and SHA-3. in Int. Conf. on Selected Areas in Cryptography 317–337 (Springer, 2016).
Di Matteo, O., Gheorghiu, V. & Mosca, M. Fault-tolerant resource estimation of quantum random-access memories. IEEE Trans. Quantum Eng. 1, 1–13 (2020).
Article Google Scholar
Kitaev, A. Y. Quantum computations: algorithms and error correction. Russ. Math. Surv. 52, 1191 (1997).
Article MathSciNet MATH Google Scholar
Dawson, C. M. & Nielsen, M. A. The Solovay-Kitaev algorithm. Quantum Inf. Comput. 6, 81–95 (2006).
MathSciNet MATH Google Scholar
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 2010).
Kliuchnikov, V., Maslov, D. & Mosca, M. Practical approximation of single-qubit unitaries by single-qubit quantum Clifford and T circuits. IEEE Trans. Comput. 65, 161–172 (2015).
Article MathSciNet MATH Google Scholar
Selinger, P. Efficient Clifford+T approximation of single-qubit operators. Quantum Inf. Comput. 15, 159–180 (2015).
MathSciNet Google Scholar
Ross, N. J. & Selinger, P. Optimal ancilla-free Clifford+T approximation of Z-rotations. Quantum Inf. Comput. 16, 901–953 (2016).
MathSciNet Google Scholar
Mukhopadhyay, P. Composability of global phase invariant distance and its application to approximation error management. J. Phys. Commun. 5, 115017 (2021).
Kliuchnikov, V., Bocharov, A. & Svore, K. M. Asymptotically optimal topological quantum compiling. Phys. Rev. Lett. 112, 140504 (2014).
Article Google Scholar
Johansen, E. G. & Simula, T. Fibonacci anyons versus Majorana fermions: a Monte Carlo approach to the compilation of braid circuits in SU(2)_k anyon models. PRX Quantum 2, 010334 (2021).
Article Google Scholar
Gosset, D., Kliuchnikov, V., Mosca, M. & Russo, V. An algorithm for the T-count. Quantum Inf. Comput. 14, 1261–1276 (2014).
MathSciNet Google Scholar
Mosca, M. & Mukhopadhyay, P. A polynomial time and space heuristic algorithm for T-count. Quantum Sci. Technol. 7, 015003 (2021).
Article Google Scholar
Gheorghiu, V., Mosca, M. & Mukhopadhyay, P. A (quasi-) polynomial time heuristic algorithm for synthesizing T-depth optimal circuits. NPJ Quantum Inf. 8, 1–11 (2022).
Article Google Scholar
Amy, M., Maslov, D. & Mosca, M. Polynomial-time T-depth optimization of Clifford+T circuits via matroid partitioning. IEEE Trans. Computer-Aided Design Integr. Circuits Syst. 33, 1476–1489 (2014).
Article Google Scholar
Gheorghiu, V., Huang, J., Li, S. M., Mosca, M. & Mukhopadhyay, P. Reducing the CNOT count for Clifford+T circuits on NISQ architectures. in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2022).
Häner, T. & Soeken, M. Lowering the T-depth of quantum circuits by reducing the multiplicative depth of logic networks. Preprint at https://arXiv.org/quant-ph/2006.03845 (2020).
Häner, T., Roetteler, M. & Svore, K. M. Managing approximation errors in quantum programs. Preprint at https://arXiv.org/quant-ph/1807.02336 (2018).
Meuli, G., Soeken, M., Roetteler, M. & Häner, T. Enabling accuracy-aware quantum compilers using symbolic resource estimation. Proc. ACM Program. Lang. 4, 1–26 (2020).
Article Google Scholar
Amy, M., Maslov, D., Mosca, M. & Roetteler, M. A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits. IEEE Trans. Computer-Aided Design of Integr. Circuits Syst. 32, 818–830 (2013).
Article Google Scholar
Glaudell, A. N., Ross, N. J. & Taylor, J. M. Optimal two-qubit circuits for universal fault-tolerant quantum computation. NPJ Quantum Inf. 7, 1–11 (2021).
Article Google Scholar
Calderbank, A. R., Rains, E. M., Shor, P. M. & Sloane, N. J. A. Quantum error correction via codes over GF(4). IEEE Trans. Inf. Theory 44, 1369–1387 (1998).
Article MathSciNet MATH Google Scholar
Ozols, M. Clifford group. Essays at University of Waterloo, Spring (2008).
Kitaev, A. Y., Shen, A., Vyalyi, M. N. & Vyalyi, M. N. Classical and Quantum Computation Number 47 (American Mathematical Soc., (2002).
Fowler, A. G. Constructing arbitrary Steane code single logical qubit fault-tolerant gates. Quantum Inf. Comput. 11, 867–873 (2011).
MathSciNet MATH Google Scholar
Kliuchnikov, V., Maslov, D. & Mosca, M. Asymptotically optimal approximation of single qubit unitaries by Clifford and T circuits using a constant number of ancillary qubits. Phys. Rev. Lett. 110, 190502 (2013).
Article Google Scholar
de Brugière, T. G., Baboulin, M., Valiron, B. & Allouche, C. Quantum circuits synthesis using Householder transformations. Comput. Phys. Commun. 248, 107001 (2020).
Article MathSciNet Google Scholar
Malvetti, E., Iten, R. & Colbeck, R. Quantum circuits for sparse isometries. Quantum 5, 412 (2021).
Article Google Scholar
Ross, N. J. Optimal ancilla-free Clifford+V approximation of Z-rotations. Quantum Inf. Comput. 15, 932–950 (2015).
MathSciNet Google Scholar
Bocharov, A., Gurevich, Y. & Svore, K. M. Efficient decomposition of single-qubit gates into V basis circuits. Phys. Rev. A 88, 012313 (2013).
Article Google Scholar
Blass, A., Bocharov, A. & Gurevich, Y. Optimal ancilla-free Pauli+V circuits for axial rotations. J. Math. Phys. 56, 122201 (2015).
Article MathSciNet MATH Google Scholar
Kliuchnikov, V., Bocharov, A., Roetteler, M. & Yard, J. A framework for approximating qubit unitaries. Preprint at https://arXiv.org/quant-ph/1510.03888 (2015).
Beigi, S. & Shor, P. W. C3, semi-Clifford and generalized semi-Clifford operations. Quantum Inf. Comput, 10, 41–59 (2010).
MathSciNet MATH Google Scholar
The OpenMP API Specification for Parallel Programming. https://www.openmp.org/.
Eigen: a C++ Template Library for Linear Algebra. http://eigen.tuxfamily.org.
Gheorghiu, V. Quantum++: a modern C++ quantum computing library. PLoS ONE 13, e0208073 (2018).
Article Google Scholar
Jones, N. C. et al. Faster quantum chemistry simulation on fault-tolerant quantum computers. New J. Phys. 14, 115023 (2012).
Article MATH Google Scholar
Arrazola, J. M. et al. Universal quantum circuits for quantum chemistry. Quantum 6, 742 (2022).
Kliuchnikov, V., Maslov, D. & Mosca, M. Fast and efficient exact synthesis of single-qubit unitaries generated by Clifford and T gates. Quantum Inf. Comput. 13, 607–630 (2013).
MathSciNet Google Scholar
Aaronson, S. & Gottesman, D. Improved simulation of stabilizer circuits. Phys. Rev. A 70, 052328 (2004).
Article Google Scholar

Download references

Acknowledgements

We thank Vern I. Paulsen and Adina Goldberg for useful discussions. We thank Earl Campbell and Nathan Wiebe for pointing out the (previous) implementations of cR_z(θ) gate (Fig. 1). We thank the anonymous reviewers for many helpful comments that helped us improve the manuscript and also for pointing out a mistake in the pseudocode (Algorithm 3). We also thank Jiaxin Huang and Hong Tao Zhang for pointing out mistakes in the pseudocode, as well as running some of the tests on their laptops. The authors wish to thank NTT Research for their financial and technical support. This work was supported in part by Canada’s NSERC. Research at IQC is supported in part by the Government of Canada through Innovation, Science and Economic Development Canada. The Perimeter Institute (PI) is supported in part by the Government of Canada and Province of Ontario (PI).

Author information

Authors and Affiliations

Institute for Quantum Computing, University of Waterloo, Waterloo, ON, Canada
Vlad Gheorghiu, Michele Mosca & Priyanka Mukhopadhyay
softwareQ Inc., Kitchener, ON, Canada
Vlad Gheorghiu & Michele Mosca
Department of Combinatorics and Optimization, University of Waterloo, Waterloo, ON, Canada
Michele Mosca & Priyanka Mukhopadhyay
Perimeter Institute for Theoretical Physics, Waterloo, ON, Canada
Michele Mosca

Authors

Vlad Gheorghiu
View author publications
You can also search for this author in PubMed Google Scholar
Michele Mosca
View author publications
You can also search for this author in PubMed Google Scholar
Priyanka Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The ideas were given by P.M. The software implementations were done by V.G. All the authors contributed to the preparation of the manuscript.

Corresponding author

Correspondence to Priyanka Mukhopadhyay.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information: T-count and T-depth of any multi-qubit unitary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gheorghiu, V., Mosca, M. & Mukhopadhyay, P. T-count and T-depth of any multi-qubit unitary. npj Quantum Inf 8, 141 (2022). https://doi.org/10.1038/s41534-022-00651-y

Download citation

Received: 06 March 2022
Accepted: 10 November 2022
Published: 30 November 2022
DOI: https://doi.org/10.1038/s41534-022-00651-y

This article is cited by

Synthesizing efficient circuits for Hamiltonian simulation
- Priyanka Mukhopadhyay
- Nathan Wiebe
- Hong Tao Zhang
npj Quantum Information (2023)
Improving the implementation of quantum blockchain based on hypergraphs
- Francisco Orts
- Remigijus Paulavičius
- Ernestas Filatovas
Quantum Information Processing (2023)
A (quasi-)polynomial time heuristic algorithm for synthesizing T-depth optimal circuits
- Vlad Gheorghiu
- Michele Mosca
- Priyanka Mukhopadhyay
npj Quantum Information (2022)

Subjects

Abstract

Similar content being viewed by others

A (quasi-)polynomial time heuristic algorithm for synthesizing T-depth optimal circuits

Optimal two-qubit circuits for universal fault-tolerant quantum computation

6-qubit optimal Clifford circuits

Introduction

Results and discussion

Preliminaries

Discussion of implementation results

Methods

An exponential time and polynomial space algorithm

Synthesizing T-count-optimal circuits

Time complexity: worst case analysis

Time complexity: practical considerations

Space complexity

T-depth-optimal synthesis

Modification of \({{{{\mathcal{A}}}}}_{MIN}\)

Time complexity

Space complexity

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information: T-count and T-depth of any multi-qubit unitary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Synthesizing efficient circuits for Hamiltonian simulation

Improving the implementation of quantum blockchain based on hypergraphs

A (quasi-)polynomial time heuristic algorithm for synthesizing T-depth optimal circuits

Search

Quick links