The numerical solution of large sparse linear systems is a fundamental aspect of many large-scale scientific and engineering computations. The multifrontal method, as a prevalent direct solving approach, currently lacks a corresponding implementation on domestic Digital Signal Processor platforms. With the development of the domestic FT series high-performance digital signal processors, there is an urgent need to meet the demand for efficient numerical solutions of large sparse linear systems in practical engineering applications. In response to the above-mentioned issue, We implement and optimize the multifrontal method for the FT-M6678 platform. By analyzing the hardware architecture of the FT-M6678 and the characteristics of the multifrontal method algorithm, we employ compilation optimization, loop unrolling, and single-instruction multiple data vectorization techniques. This fully leverages the independent functional units and register resources of the platform, achieving instruction-level and data-level parallelism. Considering the storage hierarchy of the FT-M6678, we configure the first and second-level caches, set attributes for the cacheability of external memory, and design memory layout based on memory bandwidth. We allocate different data and code segments to distinct storage areas, optimizing the storage and access of data. Experimental test data is sourced from the University of Florida Sparse Matrix Collection. The acceleration ratio of the algorithm after optimization on the FT-M6678 platform ranges from 16.0~38.9. Compared to the TMS320C6678 platform, the performance improvement can reach up to 2.3 times.
Due to climate warming and increased precipitation, the permafrost of the Tibet Plateau (TP) has undergone serious degradation along with obvious lake expansion in recent decades. Model simulation is often used to analyze the contribution of permafrost melting to lake expansion, which may have many limitations. Taking Hohxil Lake (HL) basin over north TP as an example, this study makes full use of Sentinel-1 images by an improved small baseline subset interferometric technique (SBAS-InSAR), monitors the permafrost deformation from 2015 to 2020, and estimates its contribution to the lake expansion. The results show that the permafrost settlements mainly occur in the flat terrain around HL. The average line of sight (LOS) de-formation rate of permafrost is -3.59 ± 0.001 cm/yr, where there existed many obvious funnel-shaped thawing areas around the lake, indicating a close relationship between lake expansion and permafrost under-ground ice melting. The long-term linear deformation rate of underground ice is inverted by the traditional linear model, and the melting rate is estimated to be (31.17 ± 0.0054) ×106 m3/yr with 9.3% contribution to the HL expansion. This study takes full advantage of Interferometric Synthetic Aperture Radar (InSAR) to quantitatively analyze the contribution of permafrost to lake expansion, which provides a new insight into the study of permafrost hydrological process and the proposed method can be easily extended to analyze lake water budget for underground ice in other watershed over the TP.
KEYWORDS: Optimization (mathematics), Transplantation, Data transmission, Parallel computing, Field programmable gate arrays, Data storage, Data processing, Computer programming, Associative arrays, Algorithm development
In order to further improve the execution efficiency of the SM3 cryptographic hash algorithm, give full play to the advantages of the mainstream heterogeneous platform of CPU+GPU, SM3 algorithm message is filled, expanded and iterative compression part on the device end, by reasonably dividing thread blocks to make the full use of thread block resources, use loop unrolling to increase independent memory operation, use shared memory to hide the delay of visiting instructions. Experiments on files of different sizes on NVIDIA Tesla P100, the data show that the SM3 algorithm for GPU achieves an acceleration ratio of 0.593611 to 1.207481 compared with the CPU serial implementation, and the acceleration effect becomes faster with the file size, providing a reference for the implementation of the national secret algorithm on the GPU platform.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.