Faculty Dr Md Najrul Islam

Dr Md Najrul Islam

Assistant Professor

Department of Electronics and Communication Engineering

Contact Details

najrulislam.h@srmap.edu.in

Office Location

Desk No:45, Level 4, Admin Block

Education

2024
PhD
IIT Mandi
2018
M.Tech
NIT Meghalaya
2016
B.E
Tripura Institute of Technology
India

Personal Website

Research Interest

  • VLSI-Algorithm Design of Convolutional-Neural-Network (CNN) Hardware Accelerator for Edge Application.
  • Digital VLSI-Architecture Design of Efficient Neural-Network inference engine for Edge Applications.
  • Application-Specific Integrated Circuit (ASIC) Chip Design, Implementation, Testing and Field-Programmable Gate Array (FPGA) prototyping for digital processing tasks in Artificial Intelligence, Post Quantum cryptography, Encryption accelerators, High Efficiency Video Codec, Wireless Communication System, Power Electronics, and Computer Arithmetic.
  • Testing of Neural-Network inference engine by developing a test-bed based on practical object classification scenario using FPGA and ASIC platform & software environment.

Awards

  • 2024 - VLSID Fellowship, awarded with IEEE VLSID-2024 fellowship to attend tutorials and main conference at 37th IEEE International Conference on VLSI Design and 23rd International Conference on Embedded systems, Kolkata, India, Jan 5-10, 2024.
  • 2023 - VLSID Fellowship, awarded with IEEE VLSID-2023 fellowship to attend tutorials and main conference at 36nd IEEE International Conference on VLSI Design and 22th International Conference on Embedded systems, Hyderabad, India, Jan 8-12, 2023.
  • 2019 - MHRD HTRA fellowship to persue PhD from IIT Mandi
  • 2016 - GATE Fellowship , Received full fellowship to pursue M.Tech from NIT Meghalaya

Memberships

  • Member, Institute of Electrical and Electronics Engineers (IEEE).

Publications

  • Energy-Efficient and High-Throughput CNN Inference Engine Based on Memory-Sharing and Data-Reusing for Edge Applications

    Islam M.N., Shrestha R., Chowdhury S.R.

    IEEE Transactions on Circuits and Systems I: Regular Papers, 2024, DOI Link

    View abstract ⏷

    This paper proposes implementation-friendly and dynamically reconfigurable VLSI-algorithm for convolutional neural network (CNN) inference engine. Based on this algorithm and additionally suggested techniques, high-throughput and hardware-efficient architecture of kernel processing unit (KPU) for the CNN inference engine has been presented here. It specially enables the proposed CNN inference-engine to achieve efficient local data reuse for all the computations of state-of-the-art CNN models. Hardware implementation of such KPU on Zynq UltraScale+ ZCU102 FPGA-board is capable of delivering 3.68× higher throughput and 3.40× higher energy-efficiency than the contemporary designs in the literature. Furthermore, this work suggests hardware-efficient architecture of classify unit for the CNN inference engine. It delivers 79.44% better hardware efficiency than the state-of-the-art work, when implemented on FPGA platform. In addition, aforementioned efficient-architectures of KPU and classify unit are aggregated to construct energy-efficient and high-throughput design of a complete CNN inference-engine. It has been FPGA implemented, and ASIC synthesized as well as post-layout simulated in 28 nm FD-SOI technology node. Hence, the suggested CNN inference engine with 864 processing elements delivers a peak throughput of 6.65 TOPs while operating at a maximum clock frequency of 3.85 GHz. To the best of authors' knowledge, such unified design and implementation of CNN inference engine (including both KPU and classify unit), which is specifically capable of supporting efficient reuse of local data for all computation while processing the state-of-the-art CNN models, has been reported for the first time in this paper. Finally, the proposed CNN inference engine has been functionally validated in the real-world test scenario for the object classification application, using contemporary CNN models.
  • Low-Complexity lassification Technique and Hardware-Efficient Classify-Unit Architecture for CNN Accelerator

    Islam M.N., Shrestha R., Chowdhury S.R.

    Proceedings of the IEEE International Conference on VLSI Design, 2024, DOI Link

    View abstract ⏷

    This paper proposes simplified classification technique to reduce the complexity of softmax-based classification in the convolutional neural network (CNN) inference engine/accelerator. It primarily allows the CNN accelerator to directly classify the object from the activation of fully connected (FC) layer that avoids complex exponential and divisive operations. Corresponding to the suggested technique, this work also presents a hardware-efficient VLSI architecture of classify unit for CNN accelerator. Furthermore, the proposed classify-unit architecture has been ASIC synthesized and post-layout simulated in 28 nm-FDSOI technology node. As a result, our design delivers a peak throughput of 2.5 GIPS with a hardware efficiency of 5.05× 103 GIPS/mW/mm2. Comparison of these results with the relevant reported works indicates that the proposed classify unit manifests 24.1 × lesser area and 12.5× better hardware efficiency than the state-of-the-art implementations. Finally, complete CNN accelerator that is integrated with the proposed classify unit has been functionally validated with the aid of Zynq UltraScale+ ZCU102 FPGA-board in real-world scenario, using the MobileNet-V2 CNN model.
  • An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks

    Islam M.N., Shrestha R., Chowdhury S.R.

    IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2022, DOI Link

    View abstract ⏷

    This article proposes an uninterrupted processing technique for the convolutional neural network (CNN) accelerator. It primarily allows the CNN accelerator to simultaneously perform both processing element (PE) operation and data fetching that reduces its latency and enhances the achievable throughput. Corresponding to the suggested technique, this work also presents a low latency VLSI-architecture of the CNN accelerator using the new random access line-buffer (RALB)-based design of PE array. Subsequently, the proposed CNN-accelerator architecture has been further optimized by reusing the local data in PE array, incurring better energy conservation. Our CNN accelerator has been hardware implemented on Zynq-UltraScale + MPSoC-ZCU102 FPGA board, and it operates at a maximum clock frequency of 340 MHz, consuming 4.11 W of total power. In addition, the suggested CNN accelerator with 864 PEs delivers a peak throughput of 587.52 GOPs and an adequate energy efficiency of 142.95 GOPs/W. Comparison of aforementioned implementation results with the literature has shown that our CNN accelerator delivers 33.42% higher throughput and 6.24 × better energy efficiency than the state-of-the-art work. Eventually, the field-programmable gate array (FPGA) prototype of the proposed CNN accelerator has been functionally validated using the real-world test setup for the detection of object from input image, using the GoogLeNet neural network.
  • A New Hardware-Efficient VLSI-Architecture of GoogLeNet CNN-Model Based Hardware Accelerator for Edge Computing Applications

    Islam M.N., Shrestha R., Chowdhury S.R.

    Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI, 2022, DOI Link

    View abstract ⏷

    In the current trend, deep neural network (DNN) models that use only one type of convolution filters (mostly 3×3 convolution) are widely adopted for the hardware implementation. However, inception-type convolutional neural network (CNN) model such as GoogLeNet showed that by combining different types of convolutions like 1 × 1, 3× 3, 7×7 etc., higher level of accuracy is achievable at the lower computational cost. This paper proposes first VLSI-architecture of a hardware accelerator for such GoogLeNet CNN models and a versatile CNN accelerator-architecture that is capable of performing three different types of convolution tasks with approximately equal hardware-efficiencies. This design of hardware accelerator has been synthesized and implemented on ZYNQ-102 Ultra-Scale+ FPGA-board with 16-bit brain-float quantization that consumes only 13.7k LUTs and delivers a throughput of 13.5 GFLOPS that makes our accelerator 16.45% more hardware efficient than the state-of-the-art implementations.
  • A 10-bit 2.33 fJ/conv. SAR-ADC with high speed capacitive DAC switching using a novel effective asynchronous control circuitry

    Begum F., Mishra S., Islam M.N., Dandapat A.

    Analog Integrated Circuits and Signal Processing, 2019, DOI Link

    View abstract ⏷

    Successive-approximation-register (SAR) ADC has gained popularity owing to its low power consumption in the growing field of ADC development. This work describes such a structure through the use of a novel low offset comparator thereby reducing the non-linearity performance along with significant improvement in energy-delay metric. A high speed control circuitry is introduced to improve the overall frequency of operation of SAR-ADC minimizing its speed limitation. Capacitive based digital-to-analog converter is used that switches in alternate cycles to reduce the static power dissipation. The ADC architecture is designed in 45-nm CMOS technology at layout of 0.0139mm2. The extracted results show that the proposed design is a reliable framework to ascertain the effectiveness of SAR-ADC with a faster performance. The results demonstrate an improvement of 47.75% in figure-of-merit. SNDR and SFDR are found to be 57.2 dB and 61.4 dB respectively at input frequency of 10 MHz. The sampling frequency is taken as 1 GHz with a power supply of 1 V.
  • Analysis and Proposal of a Flash Subranging ADC Architecture

    Begum F., Mishra S., Najrul Islam M., Dandapat A.

    Lecture Notes in Electrical Engineering, 2019, DOI Link

    View abstract ⏷

    The paper describes the architectures of various subranging flash analog-to-digital converters (ADCs) along with the proposal of a novel subranging algorithm. A comparative study of the state-of-the-art designs are made with respect to the figure of merit (FoM) and excess delay (EXD) parameter. Simulations are carried out in generic process design kit (GPDK) 45-nm technology in SPECTRE environment. The proposed subranging ADC shows an overall improvement of power by 94% per conversion cycle as compared to flash ADC. The EXD of proposed subranging ADC shows an improvement by 74% compared to binary flash ADC but however is slower than flash ADC by 34%.

Patents

Projects

Scholars

Interests

  • Hardware accelerator for AI applications
  • Low power VLSI circuits
  • Post quantum cryptography

Thought Leaderships

There are no Thought Leaderships associated with this faculty.

Top Achievements

Research Area

No research areas found for this faculty.

Computer Science and Engineering is a fast-evolving discipline and this is an exciting time to become a Computer Scientist!

Computer Science and Engineering is a fast-evolving discipline and this is an exciting time to become a Computer Scientist!

Recent Updates

No recent updates found.

Education
2016
B.E
Tripura Institute of Technology
India
2018
M.Tech
NIT Meghalaya
2024
PhD
IIT Mandi
Experience
Research Interests
  • VLSI-Algorithm Design of Convolutional-Neural-Network (CNN) Hardware Accelerator for Edge Application.
  • Digital VLSI-Architecture Design of Efficient Neural-Network inference engine for Edge Applications.
  • Application-Specific Integrated Circuit (ASIC) Chip Design, Implementation, Testing and Field-Programmable Gate Array (FPGA) prototyping for digital processing tasks in Artificial Intelligence, Post Quantum cryptography, Encryption accelerators, High Efficiency Video Codec, Wireless Communication System, Power Electronics, and Computer Arithmetic.
  • Testing of Neural-Network inference engine by developing a test-bed based on practical object classification scenario using FPGA and ASIC platform & software environment.
Awards & Fellowships
  • 2024 - VLSID Fellowship, awarded with IEEE VLSID-2024 fellowship to attend tutorials and main conference at 37th IEEE International Conference on VLSI Design and 23rd International Conference on Embedded systems, Kolkata, India, Jan 5-10, 2024.
  • 2023 - VLSID Fellowship, awarded with IEEE VLSID-2023 fellowship to attend tutorials and main conference at 36nd IEEE International Conference on VLSI Design and 22th International Conference on Embedded systems, Hyderabad, India, Jan 8-12, 2023.
  • 2019 - MHRD HTRA fellowship to persue PhD from IIT Mandi
  • 2016 - GATE Fellowship , Received full fellowship to pursue M.Tech from NIT Meghalaya
Memberships
  • Member, Institute of Electrical and Electronics Engineers (IEEE).
Publications
  • Energy-Efficient and High-Throughput CNN Inference Engine Based on Memory-Sharing and Data-Reusing for Edge Applications

    Islam M.N., Shrestha R., Chowdhury S.R.

    IEEE Transactions on Circuits and Systems I: Regular Papers, 2024, DOI Link

    View abstract ⏷

    This paper proposes implementation-friendly and dynamically reconfigurable VLSI-algorithm for convolutional neural network (CNN) inference engine. Based on this algorithm and additionally suggested techniques, high-throughput and hardware-efficient architecture of kernel processing unit (KPU) for the CNN inference engine has been presented here. It specially enables the proposed CNN inference-engine to achieve efficient local data reuse for all the computations of state-of-the-art CNN models. Hardware implementation of such KPU on Zynq UltraScale+ ZCU102 FPGA-board is capable of delivering 3.68× higher throughput and 3.40× higher energy-efficiency than the contemporary designs in the literature. Furthermore, this work suggests hardware-efficient architecture of classify unit for the CNN inference engine. It delivers 79.44% better hardware efficiency than the state-of-the-art work, when implemented on FPGA platform. In addition, aforementioned efficient-architectures of KPU and classify unit are aggregated to construct energy-efficient and high-throughput design of a complete CNN inference-engine. It has been FPGA implemented, and ASIC synthesized as well as post-layout simulated in 28 nm FD-SOI technology node. Hence, the suggested CNN inference engine with 864 processing elements delivers a peak throughput of 6.65 TOPs while operating at a maximum clock frequency of 3.85 GHz. To the best of authors' knowledge, such unified design and implementation of CNN inference engine (including both KPU and classify unit), which is specifically capable of supporting efficient reuse of local data for all computation while processing the state-of-the-art CNN models, has been reported for the first time in this paper. Finally, the proposed CNN inference engine has been functionally validated in the real-world test scenario for the object classification application, using contemporary CNN models.
  • Low-Complexity lassification Technique and Hardware-Efficient Classify-Unit Architecture for CNN Accelerator

    Islam M.N., Shrestha R., Chowdhury S.R.

    Proceedings of the IEEE International Conference on VLSI Design, 2024, DOI Link

    View abstract ⏷

    This paper proposes simplified classification technique to reduce the complexity of softmax-based classification in the convolutional neural network (CNN) inference engine/accelerator. It primarily allows the CNN accelerator to directly classify the object from the activation of fully connected (FC) layer that avoids complex exponential and divisive operations. Corresponding to the suggested technique, this work also presents a hardware-efficient VLSI architecture of classify unit for CNN accelerator. Furthermore, the proposed classify-unit architecture has been ASIC synthesized and post-layout simulated in 28 nm-FDSOI technology node. As a result, our design delivers a peak throughput of 2.5 GIPS with a hardware efficiency of 5.05× 103 GIPS/mW/mm2. Comparison of these results with the relevant reported works indicates that the proposed classify unit manifests 24.1 × lesser area and 12.5× better hardware efficiency than the state-of-the-art implementations. Finally, complete CNN accelerator that is integrated with the proposed classify unit has been functionally validated with the aid of Zynq UltraScale+ ZCU102 FPGA-board in real-world scenario, using the MobileNet-V2 CNN model.
  • An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks

    Islam M.N., Shrestha R., Chowdhury S.R.

    IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2022, DOI Link

    View abstract ⏷

    This article proposes an uninterrupted processing technique for the convolutional neural network (CNN) accelerator. It primarily allows the CNN accelerator to simultaneously perform both processing element (PE) operation and data fetching that reduces its latency and enhances the achievable throughput. Corresponding to the suggested technique, this work also presents a low latency VLSI-architecture of the CNN accelerator using the new random access line-buffer (RALB)-based design of PE array. Subsequently, the proposed CNN-accelerator architecture has been further optimized by reusing the local data in PE array, incurring better energy conservation. Our CNN accelerator has been hardware implemented on Zynq-UltraScale + MPSoC-ZCU102 FPGA board, and it operates at a maximum clock frequency of 340 MHz, consuming 4.11 W of total power. In addition, the suggested CNN accelerator with 864 PEs delivers a peak throughput of 587.52 GOPs and an adequate energy efficiency of 142.95 GOPs/W. Comparison of aforementioned implementation results with the literature has shown that our CNN accelerator delivers 33.42% higher throughput and 6.24 × better energy efficiency than the state-of-the-art work. Eventually, the field-programmable gate array (FPGA) prototype of the proposed CNN accelerator has been functionally validated using the real-world test setup for the detection of object from input image, using the GoogLeNet neural network.
  • A New Hardware-Efficient VLSI-Architecture of GoogLeNet CNN-Model Based Hardware Accelerator for Edge Computing Applications

    Islam M.N., Shrestha R., Chowdhury S.R.

    Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI, 2022, DOI Link

    View abstract ⏷

    In the current trend, deep neural network (DNN) models that use only one type of convolution filters (mostly 3×3 convolution) are widely adopted for the hardware implementation. However, inception-type convolutional neural network (CNN) model such as GoogLeNet showed that by combining different types of convolutions like 1 × 1, 3× 3, 7×7 etc., higher level of accuracy is achievable at the lower computational cost. This paper proposes first VLSI-architecture of a hardware accelerator for such GoogLeNet CNN models and a versatile CNN accelerator-architecture that is capable of performing three different types of convolution tasks with approximately equal hardware-efficiencies. This design of hardware accelerator has been synthesized and implemented on ZYNQ-102 Ultra-Scale+ FPGA-board with 16-bit brain-float quantization that consumes only 13.7k LUTs and delivers a throughput of 13.5 GFLOPS that makes our accelerator 16.45% more hardware efficient than the state-of-the-art implementations.
  • A 10-bit 2.33 fJ/conv. SAR-ADC with high speed capacitive DAC switching using a novel effective asynchronous control circuitry

    Begum F., Mishra S., Islam M.N., Dandapat A.

    Analog Integrated Circuits and Signal Processing, 2019, DOI Link

    View abstract ⏷

    Successive-approximation-register (SAR) ADC has gained popularity owing to its low power consumption in the growing field of ADC development. This work describes such a structure through the use of a novel low offset comparator thereby reducing the non-linearity performance along with significant improvement in energy-delay metric. A high speed control circuitry is introduced to improve the overall frequency of operation of SAR-ADC minimizing its speed limitation. Capacitive based digital-to-analog converter is used that switches in alternate cycles to reduce the static power dissipation. The ADC architecture is designed in 45-nm CMOS technology at layout of 0.0139mm2. The extracted results show that the proposed design is a reliable framework to ascertain the effectiveness of SAR-ADC with a faster performance. The results demonstrate an improvement of 47.75% in figure-of-merit. SNDR and SFDR are found to be 57.2 dB and 61.4 dB respectively at input frequency of 10 MHz. The sampling frequency is taken as 1 GHz with a power supply of 1 V.
  • Analysis and Proposal of a Flash Subranging ADC Architecture

    Begum F., Mishra S., Najrul Islam M., Dandapat A.

    Lecture Notes in Electrical Engineering, 2019, DOI Link

    View abstract ⏷

    The paper describes the architectures of various subranging flash analog-to-digital converters (ADCs) along with the proposal of a novel subranging algorithm. A comparative study of the state-of-the-art designs are made with respect to the figure of merit (FoM) and excess delay (EXD) parameter. Simulations are carried out in generic process design kit (GPDK) 45-nm technology in SPECTRE environment. The proposed subranging ADC shows an overall improvement of power by 94% per conversion cycle as compared to flash ADC. The EXD of proposed subranging ADC shows an improvement by 74% compared to binary flash ADC but however is slower than flash ADC by 34%.
Contact Details

najrulislam.h@srmap.edu.in

Scholars
Interests

  • Hardware accelerator for AI applications
  • Low power VLSI circuits
  • Post quantum cryptography

Education
2016
B.E
Tripura Institute of Technology
India
2018
M.Tech
NIT Meghalaya
2024
PhD
IIT Mandi
Experience
Research Interests
  • VLSI-Algorithm Design of Convolutional-Neural-Network (CNN) Hardware Accelerator for Edge Application.
  • Digital VLSI-Architecture Design of Efficient Neural-Network inference engine for Edge Applications.
  • Application-Specific Integrated Circuit (ASIC) Chip Design, Implementation, Testing and Field-Programmable Gate Array (FPGA) prototyping for digital processing tasks in Artificial Intelligence, Post Quantum cryptography, Encryption accelerators, High Efficiency Video Codec, Wireless Communication System, Power Electronics, and Computer Arithmetic.
  • Testing of Neural-Network inference engine by developing a test-bed based on practical object classification scenario using FPGA and ASIC platform & software environment.
Awards & Fellowships
  • 2024 - VLSID Fellowship, awarded with IEEE VLSID-2024 fellowship to attend tutorials and main conference at 37th IEEE International Conference on VLSI Design and 23rd International Conference on Embedded systems, Kolkata, India, Jan 5-10, 2024.
  • 2023 - VLSID Fellowship, awarded with IEEE VLSID-2023 fellowship to attend tutorials and main conference at 36nd IEEE International Conference on VLSI Design and 22th International Conference on Embedded systems, Hyderabad, India, Jan 8-12, 2023.
  • 2019 - MHRD HTRA fellowship to persue PhD from IIT Mandi
  • 2016 - GATE Fellowship , Received full fellowship to pursue M.Tech from NIT Meghalaya
Memberships
  • Member, Institute of Electrical and Electronics Engineers (IEEE).
Publications
  • Energy-Efficient and High-Throughput CNN Inference Engine Based on Memory-Sharing and Data-Reusing for Edge Applications

    Islam M.N., Shrestha R., Chowdhury S.R.

    IEEE Transactions on Circuits and Systems I: Regular Papers, 2024, DOI Link

    View abstract ⏷

    This paper proposes implementation-friendly and dynamically reconfigurable VLSI-algorithm for convolutional neural network (CNN) inference engine. Based on this algorithm and additionally suggested techniques, high-throughput and hardware-efficient architecture of kernel processing unit (KPU) for the CNN inference engine has been presented here. It specially enables the proposed CNN inference-engine to achieve efficient local data reuse for all the computations of state-of-the-art CNN models. Hardware implementation of such KPU on Zynq UltraScale+ ZCU102 FPGA-board is capable of delivering 3.68× higher throughput and 3.40× higher energy-efficiency than the contemporary designs in the literature. Furthermore, this work suggests hardware-efficient architecture of classify unit for the CNN inference engine. It delivers 79.44% better hardware efficiency than the state-of-the-art work, when implemented on FPGA platform. In addition, aforementioned efficient-architectures of KPU and classify unit are aggregated to construct energy-efficient and high-throughput design of a complete CNN inference-engine. It has been FPGA implemented, and ASIC synthesized as well as post-layout simulated in 28 nm FD-SOI technology node. Hence, the suggested CNN inference engine with 864 processing elements delivers a peak throughput of 6.65 TOPs while operating at a maximum clock frequency of 3.85 GHz. To the best of authors' knowledge, such unified design and implementation of CNN inference engine (including both KPU and classify unit), which is specifically capable of supporting efficient reuse of local data for all computation while processing the state-of-the-art CNN models, has been reported for the first time in this paper. Finally, the proposed CNN inference engine has been functionally validated in the real-world test scenario for the object classification application, using contemporary CNN models.
  • Low-Complexity lassification Technique and Hardware-Efficient Classify-Unit Architecture for CNN Accelerator

    Islam M.N., Shrestha R., Chowdhury S.R.

    Proceedings of the IEEE International Conference on VLSI Design, 2024, DOI Link

    View abstract ⏷

    This paper proposes simplified classification technique to reduce the complexity of softmax-based classification in the convolutional neural network (CNN) inference engine/accelerator. It primarily allows the CNN accelerator to directly classify the object from the activation of fully connected (FC) layer that avoids complex exponential and divisive operations. Corresponding to the suggested technique, this work also presents a hardware-efficient VLSI architecture of classify unit for CNN accelerator. Furthermore, the proposed classify-unit architecture has been ASIC synthesized and post-layout simulated in 28 nm-FDSOI technology node. As a result, our design delivers a peak throughput of 2.5 GIPS with a hardware efficiency of 5.05× 103 GIPS/mW/mm2. Comparison of these results with the relevant reported works indicates that the proposed classify unit manifests 24.1 × lesser area and 12.5× better hardware efficiency than the state-of-the-art implementations. Finally, complete CNN accelerator that is integrated with the proposed classify unit has been functionally validated with the aid of Zynq UltraScale+ ZCU102 FPGA-board in real-world scenario, using the MobileNet-V2 CNN model.
  • An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks

    Islam M.N., Shrestha R., Chowdhury S.R.

    IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2022, DOI Link

    View abstract ⏷

    This article proposes an uninterrupted processing technique for the convolutional neural network (CNN) accelerator. It primarily allows the CNN accelerator to simultaneously perform both processing element (PE) operation and data fetching that reduces its latency and enhances the achievable throughput. Corresponding to the suggested technique, this work also presents a low latency VLSI-architecture of the CNN accelerator using the new random access line-buffer (RALB)-based design of PE array. Subsequently, the proposed CNN-accelerator architecture has been further optimized by reusing the local data in PE array, incurring better energy conservation. Our CNN accelerator has been hardware implemented on Zynq-UltraScale + MPSoC-ZCU102 FPGA board, and it operates at a maximum clock frequency of 340 MHz, consuming 4.11 W of total power. In addition, the suggested CNN accelerator with 864 PEs delivers a peak throughput of 587.52 GOPs and an adequate energy efficiency of 142.95 GOPs/W. Comparison of aforementioned implementation results with the literature has shown that our CNN accelerator delivers 33.42% higher throughput and 6.24 × better energy efficiency than the state-of-the-art work. Eventually, the field-programmable gate array (FPGA) prototype of the proposed CNN accelerator has been functionally validated using the real-world test setup for the detection of object from input image, using the GoogLeNet neural network.
  • A New Hardware-Efficient VLSI-Architecture of GoogLeNet CNN-Model Based Hardware Accelerator for Edge Computing Applications

    Islam M.N., Shrestha R., Chowdhury S.R.

    Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI, 2022, DOI Link

    View abstract ⏷

    In the current trend, deep neural network (DNN) models that use only one type of convolution filters (mostly 3×3 convolution) are widely adopted for the hardware implementation. However, inception-type convolutional neural network (CNN) model such as GoogLeNet showed that by combining different types of convolutions like 1 × 1, 3× 3, 7×7 etc., higher level of accuracy is achievable at the lower computational cost. This paper proposes first VLSI-architecture of a hardware accelerator for such GoogLeNet CNN models and a versatile CNN accelerator-architecture that is capable of performing three different types of convolution tasks with approximately equal hardware-efficiencies. This design of hardware accelerator has been synthesized and implemented on ZYNQ-102 Ultra-Scale+ FPGA-board with 16-bit brain-float quantization that consumes only 13.7k LUTs and delivers a throughput of 13.5 GFLOPS that makes our accelerator 16.45% more hardware efficient than the state-of-the-art implementations.
  • A 10-bit 2.33 fJ/conv. SAR-ADC with high speed capacitive DAC switching using a novel effective asynchronous control circuitry

    Begum F., Mishra S., Islam M.N., Dandapat A.

    Analog Integrated Circuits and Signal Processing, 2019, DOI Link

    View abstract ⏷

    Successive-approximation-register (SAR) ADC has gained popularity owing to its low power consumption in the growing field of ADC development. This work describes such a structure through the use of a novel low offset comparator thereby reducing the non-linearity performance along with significant improvement in energy-delay metric. A high speed control circuitry is introduced to improve the overall frequency of operation of SAR-ADC minimizing its speed limitation. Capacitive based digital-to-analog converter is used that switches in alternate cycles to reduce the static power dissipation. The ADC architecture is designed in 45-nm CMOS technology at layout of 0.0139mm2. The extracted results show that the proposed design is a reliable framework to ascertain the effectiveness of SAR-ADC with a faster performance. The results demonstrate an improvement of 47.75% in figure-of-merit. SNDR and SFDR are found to be 57.2 dB and 61.4 dB respectively at input frequency of 10 MHz. The sampling frequency is taken as 1 GHz with a power supply of 1 V.
  • Analysis and Proposal of a Flash Subranging ADC Architecture

    Begum F., Mishra S., Najrul Islam M., Dandapat A.

    Lecture Notes in Electrical Engineering, 2019, DOI Link

    View abstract ⏷

    The paper describes the architectures of various subranging flash analog-to-digital converters (ADCs) along with the proposal of a novel subranging algorithm. A comparative study of the state-of-the-art designs are made with respect to the figure of merit (FoM) and excess delay (EXD) parameter. Simulations are carried out in generic process design kit (GPDK) 45-nm technology in SPECTRE environment. The proposed subranging ADC shows an overall improvement of power by 94% per conversion cycle as compared to flash ADC. The EXD of proposed subranging ADC shows an improvement by 74% compared to binary flash ADC but however is slower than flash ADC by 34%.
Contact Details

najrulislam.h@srmap.edu.in

Scholars