Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Training deep neural networks for binary communication with the Whetstone method

A preprint version of the article is available at arXiv.

Abstract

The computational cost of deep neural networks presents challenges to broadly deploying these algorithms. Low-power and embedded neuromorphic processors offer potentially dramatic performance-per-watt improvements over traditional processors. However, programming these brain-inspired platforms generally requires platform-specific expertise. It is therefore difficult to achieve state-of-the-art performance on these platforms, limiting their applicability. Here we present Whetstone, a method to bridge this gap by converting deep neural networks to have discrete, binary communication. During the training process, the activation function at each layer is progressively sharpened towards a threshold activation, with limited loss in performance. Whetstone sharpened networks do not require a rate code or other spike-based coding scheme, thus producing networks comparable in timing and size to conventional artificial neural networks. We demonstrate Whetstone on a number of architectures and tasks such as image classification, autoencoders and semantic segmentation. Whetstone is currently implemented within the Keras wrapper for TensorFlow and is widely extendable.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the Whetstone process.
Fig. 2: Training a single network through the Whetstone process.
Fig. 3: How Whetstone training influences the performance of different network topologies and tasks.
Fig. 4: Whetstone training requires N-hot output encodings.
Fig. 5: Whetstone has the ability to sharpen diverse networks.

Similar content being viewed by others

Data availability

All data used come from publicly available datasets: MNIST34, Fashion-MNIST35, CIFAR36 and COCO19. Whetstone is available at https://github.com/SNL-NERL/Whetstone, licensed under the GPL.

References

  1. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  2. Pinheiro, P. O., Collobert, R. & Dollár, P. Learning to segment object candidates. Proc. 28th International Conference on Neural Information Processing Systems 2, 1990–1998 (2015).

    Google Scholar 

  3. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  Google Scholar 

  4. Yang, T.-J., Chen, Y.-H. & Sze, V. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 6071–6079 (IEEE, 2017).

  5. Coppola, G. & Dey, E. Driverless cars are giving engineers a fuel economy headache. Bloomberg.com https://www.bloomberg.com/news/articles/2017-10-11/driverless-cars-are-giving-engineers-a-fuel-economy-headache (2017).

  6. Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) 10–14 (IEEE, 2014).

  7. Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) 1–12 (IEEE, 2017).

  8. Rao, N. Intel® nervana™ neural network processors (NNP) redefine AI silicon. Intel https://ai.intel.com/intel-nervana-neural-network-processors-nnp-redefine-ai-silicon/ (2018).

  9. Hemsoth, N. Intel, Nervana shed light on deep learning chip architecture. The Next Platform https://www.nextplatform.com/2018/01/11/intel-nervana-shed-light-deep-learning-chip-architecture/ (2018).

  10. Markidis, S. et al. Nvidia tensor core programmability, performance & precision. Preprint at https://arxiv.org/abs/1803.04014 (2018).

  11. Merolla, P. A. et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345, 668–673 (2014).

    Article  Google Scholar 

  12. Khan, M. M. et al. Spinnaker: mapping neural networks onto a massively-parallel chip multiprocessor. In IEEE International Joint Conference on Neural Networks, 2008, IJCNN 2008 (IEEE World Congress on Computational Intelligence) 2849–2856 (IEEE, 2008).

  13. Schuman, C. D. et al. A survey of neuromorphic computing and neural networks in hardware. Preprint at https://arxiv.org/abs/1705.06963 (2017).

  14. James, C. D. et al. A historical survey of algorithms and hardware architectures for neural-inspired and neuromorphic computing applications. Biolog. Inspired Cogn. Architec. 19, 49–64 (2017).

    Article  Google Scholar 

  15. Knight, J. C., Tully, P. J., Kaplan, B. A., Lansner, A. & Furber, S. B. Large-scale simulations of plastic neural networks on neuromorphic hardware. Front. Neuroanat. 10, 37 (2016).

    Article  Google Scholar 

  16. Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105, 2295–2329 (2017).

    Article  Google Scholar 

  17. Bergstra, J., Yamins, D. & Cox, D. D. Hyperopt: a python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in Science Conference 13–20 (Citeseer, 2013).

  18. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A. & Talwalkar, A. Hyperband: a novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 18, 6765–6816 (2017).

    MathSciNet  MATH  Google Scholar 

  19. Lin, T.-Y. et al. Microsoft coco: common objects in context. In European Conference on Computer Vision, 740–755 (Springer, 2014).

  20. Hunsberger, E. & Eliasmith, C. Training spiking deep networks for neuromorphic hardware. Preprint at https://arxiv.org/abs/1611.05141 (2016).

  21. Esser, S. K., Appuswamy, R., Merolla, P., Arthur, J. V. & Modha, D. S. Backpropagation for energy-efficient neuromorphic computing. In Advances in Neural Information Processing Systems 28 (eds Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R.) 1117–1125 (Curran Associates, Red Hook, 2015).

  22. Esser, S. et al. Convolutional networks for fast, energy-efficient neuromorphic computing. 2016. Preprint at http://arxiv.org/abs/1603.08270 (2016).

  23. Rueckauer, B., Lungu, I.-A., Hu, Y., Pfeiffer, M. & Liu, S.-C. Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Front. Neurosci. 11, 682 (2017).

    Article  Google Scholar 

  24. Bohte, S. M., Kok, J. N. & La Poutré, J. A. Spikeprop: backpropagation for networks of spiking neurons. In European Symposium on Artificial Neural Networks 419–424 (ELEN, London, 2000).

  25. Huh, D. & Sejnowski, T. J. Gradient descent for spiking neural networks. Preprint at https://arxiv.org/abs/1706.04698 (2017).

  26. Cao, Y., Chen, Y. & Khosla, D. Spiking deep convolutional neural networks for energy-efficient object recognition. Int. J. Comput. Vis. 113, 54–66 (2015).

    Article  MathSciNet  Google Scholar 

  27. Hunsberger, E. & Eliasmith, C. Spiking deep networks with LIF neurons. Preprint at https://arxiv.org/abs/1510.08829 (2015).

  28. Liew, S. S., Khalil-Hani, M. & Bakhteri, R. Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems. Neurocomputing 216, 718–734 (2016).

    Article  Google Scholar 

  29. Nise, N. S. Control Systems Engineering, 5th edn (Wiley, New York, NY, 2008).

  30. Chollet, F. et al. Keras https://github.com/fchollet/keras (2015).

  31. Rothganger, F., Warrender, C. E., Trumbo, D. & Aimone, J. B. N2A: a computational tool for modeling from neurons to algorithms. Front. Neural Circuits 8, 1 (2014).

    Article  Google Scholar 

  32. Davison, A. P. et al. Pynn: a common interface for neuronal network simulators. Front. Neuroinform. 2, 11 (2009).

    Google Scholar 

  33. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R. & Bengio, Y. Binarized neural networks. In Proceedings of Advances in Neural Information Processing Systems 4107–4115 (Curran Associates, Red Hook, 2016).

  34. LeCun, Y., Cortes, C. & Burges, C. Mnist handwritten digit database. AT&T Labs http://yann.lecun.com/exdb/mnist 2 (2010).

  35. Xiao, H., Rasul, K. & Vollgraf, R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. Preprint at https://arxiv.org/abs/1708.07747 (2017).

  36. Krizhevsky, A. & Hinton, G. Learning Multiple Layers of Features from Tiny Images. Technical Report, Univ. Toronto (2009).

Download references

Acknowledgements

This work was supported by Sandia National Laboratories’ Laboratory Directed Research and Development (LDRD) Program under the Hardware Acceleration of Adaptive Neural Algorithms Grand Challenge project and the DOE Advanced Simulation and Computing program. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, a wholly owned subsidiary of Honeywell International, for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

This Article describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the US Department of Energy or the US Government.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to Whetstone algorithm theory and design. W.S. and R.D. implemented code and performed experiments. W.S., C.M.V., R.D. and J.B.A. analysed results. W.S., C.M.V. and J.B.A. wrote the manuscript.

Corresponding authors

Correspondence to William Severa or James B. Aimone.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary notes and figures

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Severa, W., Vineyard, C.M., Dellana, R. et al. Training deep neural networks for binary communication with the Whetstone method. Nat Mach Intell 1, 86–94 (2019). https://doi.org/10.1038/s42256-018-0015-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-018-0015-y

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics