Chao Zhang successfully defended his Ph.D. Thesis at CECA on June 5, 2017. Congratulations!
For more information (in Chinese), please refer to: http://ceca.pku.edu.cn/news.php?action=detail&article_id=690
Abstract: The design of storage hardware is challenged by the rapid-growing big data technologies. The emerging computing systems require even larger and faster storage hardware, however, the traditional storage hardware is facing slowing growth and increased energy consumption. Non-volatile memory (NVM) shows great advantages on access speed, volatility, static energy, storage density, compared with traditional memories. NVMs such as Spin-Torque Transfer RAM (STT-RAM), Phase-Change Memory (PCM) and Resistive Random Access Memory (RRAM) are proved to have enormous potential to replace the traditional memories. Racetrack memory (RM) stores multiple bits inside tape-like nanowires, which amplifies the benefits on access speed and storage density. In order to access the data stored in the nanowires, a "shift" operation is introduced to access racetrack memory. Thus, an extra overhead is induced, which is sensitive to the access patterns. Prior research focuses on two aspects: reducing the shift operations to improve the energy efficiency or performance; accelerating special calculations using the shift operation provided by racetrack memory. However, previous work has not provided a comprehensive solution for reliable and efficient racetrack memory design.
Based on a thorough study on both circuit level and architecture level technologies, this thesis proposes a set of solutions for reliable and energy-efficient racetrack memory, targeting on the problems such as circuit-level models, position errors, thermal issue of the shift operation, read-write asymmetry, case-specific optimizations etc.
The majority contributions of this work are summarized below into five aspects:
1. Circuit-level models for racetrack memory. Due to the enormous configuration parameters involved in racetrack memory devices, the settings selected by previous research are significantly different from each other. The lack of uniformed abstract of the memory puts the architecture-level research at risk. Based on physical mechanisms of RM, a circuit layout model and reliability model are explored and built for racetrack memory. Based on this set of models, the following studies have a uniformed basis to rely on, enabling further optimizations on reliability and energy efficiency.
2. Detection and correction of the position errors. Unreliable shift operation misaligns the data in RM, which leads to errors when memory is accessed. This kind of error is fundamentally different from the bit-flip error in traditional memories, which is called a new error type: ``position error''. Hi-Fi playback technique is proposed, using coding mechanism and adaptive architecture, to ensure the reliability of the shift operation. Using Sub-threshold shift and position error correction codes, Hi-Fi playback enhances the mean time to failure of RM to hundreds of years from several microseconds. And it only adds eligible performance overhead.
3. Thermal control method for the shift operation. The current requirement for the shift operation in domain wall based RM is high, and thus rapid shift may lead to hot spots and compromise the circuit reliability. Based on a quota method, the shift operation control mechanism protects the memory from overheating.
The quota-based shift operation control mechanism splits the shift operations both in space and time domains. It costs only 3.5% performance when controlling the temperature of memory chip under 100 degrees Celsius within the normal working condition.
4. Cache bypassing method for asymmetric-access characteristics. Since the mechanisms for read and write in RM are generally independent, the cost for them is asymmetric. This reduces the efficiency of existing cache optimizations which are prepared for symmetric-access memories. Targeted on general asymmetric-access memories, such as racetrack memory, statistical based cache bypassing is proposed, to further improve its energy efficiency. With the statistical cache bypassing, the energy efficiency and performance of caches are both improved by about 10%.
5. Use case specific energy/performance optimizations. The extra shift operation induced by RM makes the energy/performance overhead related to the use cases. Thus there are different sweet points for different scenarios. This thesis also explores different application scenarios for racetrack memory, with solutions to improve the performance and energy efficiency. Using hybrid RM-SRAM cells, the cache performance can be improved by 14%, reducing the overhead of context switch.
The research on reliable and energy-efficient racetrack memory is a study requiring multiple domains of knowledge. The thesis found that combining the domain knowledge from different levels can generally help reducing the difficulty to solve several hard problems in specific levels. Leveraging the architecture-level designs, several reliability issues in circuit level can be easily solved. Based on special functions provided by circuits, the performance in architecture level can be significantly improved. Targeting on different application scenarios, the combined optimization from both architecture and circuit levels can improve the efficiency of these domains significantly. Based on such research philosophy, this thesis explores and designs the optimizations for reliability and energy efficiency. The involved research in this thesis have been published in multiple important conferences and journals in the computer architecture field, providing references for future work of the racetrack memory.