|
1.INTRODUCTIONData access time is a critical parameter in radio-electronic systems. Examples of solutions for radio-electronic systems in which access to data is a critical parameter have been described in the articles [9,10,16,17]. These solutions are associated with the creation of microwave signal patterns in radio-electronic reconnaissance devices. This applies primarily to radio-electronic reconnaissance systems and devices, where measurement results are compared with the patterns stored in the database. That comparison is expected to take place in a time similar to real time. This requires searching for efficient methods of accessing data stored in databases. One of the solutions is to use parallel processing techniques which utilize the computing power of available hardware resources more efficiently. The problem of parallel processing in databases is raised in many publications. In paper [12] the problem of classification for Architecture of Parallel Databases was raised. Other papers presented query optimization for parallel processing in databases. [11,13,14,15]. 2.METHODS AFFECTING THE DATABASE SERVER PERFORMANCEMethods of performance improvement can be divided into software and hardware. Programming methods include the possibility of using indexes on tables, partitioning tables and indexes, maintaining statistics for data contained in database objects, automatic tuning of query plans, parallel processing, buffering. Hardware methods include the use of many physical computational clusters, skilful distribution of data files, and fine-tuning of the RDBMS system operation to maximize the processor’s capabilities and appropriately allocated memory [2]. 2.1Software methodsIn order to speed up data retrieval in tables, RDBMS uses various types of indexes [1]. The author’s work [8] discusses the basic indexes and the principles of their use. Partitioning involves splitting the table into partitions according to system requirements, e.g. partitioning of the microwave patterns database into partitions according to the adopted frequency range. Partitioning allows to increase the performance of queries that reference one partition instead of searching the entire table [1]. Tuning in the RDBMS system is done using the native Procedural Structured Query Language (PL / SQL). In the DBMS_PROFILER package RDBMS stores procedures, which offer a number of actions to analyze and suggest improvement of the application code. With this package, one can easily identify the most expensive operations in the code and try to optimize their operation. 2.2Parallel processingEach query in the database is analysed by the optimizer (parsing). If the execution plan includes parallel processing, the following steps occur [5]:
After arranging the query plan, the parallel execution coordinator - (PX coordinator) decides how the subtasks will be executed. The number of processes that can be started for one task is defined by Degree Of Parallelism (DOP). The DOP level can be defined by the user or automatically by the RDBMS system. 3RESEARCH ENVIRONMENTThe purpose of the studies was to assess DOP that should be used to achieve maximum query performance in the test conditions. Furthermore, there has been made an attempt to answer the questions: - whether the use of parallel queries significantly affects the performance of the database management system, - in what hardware and software conditions the use of parallel processing is beneficial from the point of view of RDBMS system performance. The research has been carried out for various measurement conditions that included: the operating system used (Linux and Windows), a different number of processor cores, a different number of parallel sessions, the use of efficient SSDs, the use of the index. Two test environments have been prepared for the purposes of the study, Research Environment no. 1
Research Environment no. 2
Oracle Enterprise Edition 11.2.0.4 64-bit has been selected for the research. The test database environment has been prepared on the basis of scripts and tips, available in the article “How Many Slaves?” by well-known Oracle database administration specialist Doug Burns [6]. The test database was prepared based on the scripts for the RDBMS study published by Oracle [7]. Modifications have been introduced, especially in the scope of script preparation for the Windows environment. The parameters of the database instance are shown below, The database instance parameters that apply to parallel processing are listed below. It is worth paying attention to the following parameters:
The script for creating the TEST_TAB1 table and filling them with rows with random data - universal for both operating systems is presented below: The script below creates the TEST_TAB3 test tables: Four tests have been developed for research:
4RESEARCH RESULTSThe level of parallelization has been tested in the range from 1 to 128 with the following step: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16, 24, 32, 48, 64, 96, 128. This allowed the identification of differences in query performance at a low level of parallelism, where these differences were most visible, and the observation of the impact of the overhead associated with running a very large number of processes at a level of parallelization of over 100. The level of parallelism recorded on the horizontal axis of charts as 1 means no parallel processing applications, and thus no additional subprocesses (Parallel Execution Slaves - PX Slaves) have been created. 4.1Test No. 1 - CentOS systemThe test results are presented in Fig. 1. The data series in the graph determine the application of the server configuration with a given number of processor cores in the range from 1 to 8. Conclusions resulting from the conducted research:
It is important to notice that during hash joining of tables with grouping and sorting commands, twice as many subprocesses - PX Slaves have run than in the case of full scan. That results in much longer response times, as shown in Figure 2. Conclusions resulting from the conducted research:
Figure 3 shows the CPU usage (s) during the test. The start of testing is marked with a vertical black line. The first range of processor activity (to the right of the vertical line) covers the testing time using a single-core processor. One core was added successively. It can be seen that only for the configuration with a single-core processor, the resources have been used in 100%, which basically means that at that time any other work on the server would have been practically impossible. In other cases, the processor’s idle time has been greater than 0, so it can be assumed that the server would have been able to handle more parallel sessions. The red horizontal line in Figure 4 indicates the number of processor cores available, and above it the processes running in the queue have been marked for each configuration, which in practice means that each processor core must perform several dozen operations in series. Analysing Figure 5, it can be concluded that increasing the number of processor cores does not affect the operation of disks, which must be used equally to return the desired result. 4.2Test No. 1 - Windows systemThe test results are shown in Figure 6. Figure 6 shows query times for both the full table scan operation - the PST series and the hash join - the ZH series. Conclusions resulting from the conducted research:
4.3Test No. 2 - CentOS systemThe results of tests for test No. 2 are presented below. During the tests, the processor configuration was changed in the range of 1 to 8 cores. Query times for different sessions in most cases have been identical or have not differed significantly, so that only single times have been selected for each of the series of tests in the DOP range from 1 to 8. With a view to run several sessions simultaneously, each with a level of parallelism above 1, it was reasonable to expect that the server would start to work much worse. However, as it can be seen in the chart in Figure 7, raising the level of parallelism in the range of 1 to 8 has brought benefits. The curves begin to overlap only from a DOP level of 5, with the number of simultaneous sessions reaching ten. This can be explained by the fact that slow resources of a fast processor can fully cope with the processing of more simultaneous processes. Only the launch of 64 concurrent sessions brought a significant decrease in performance for all DOP levels. Only in Figure 8 one can observe the problems which appeared while the server was running. Up to the right of the black vertical line, the characteristics of CPU consumption are outlined when performing multisession tests. The 64-parallel session test describes it on the right side of the figure. They show how the server’s idle time has decreased. For 64 simultaneous sessions, work on the server was possible, but the slowdown was strongly felt. The waiting time for any operations in the system shell has increased several times. The attempt to perform the test for 128 simultaneous sessions ended with the server to be blocked and restarted. Figure 9 shows that only the work of 64 parallel sessions that the HDD was heavily charged. Tests carried out in the range of 1 to 10 sessions showed that disk consumption was about 4,000 operations per second. 4.4Test No. 3 - CentOS systemThe test has been run once in the most efficient configuration with eight processor cores. The purpose of the test was to check whether the search for a given row of the TEST_TAB1 table could be done faster by using parallel processing or by creating an index on the STR_PAD column and searching with its help. According to Figure 10, the use of an optimal processor configuration for server cannot compete using an efficient index. The best time when using a high DOP level of 32 is 42.7 seconds, which is sometimes almost 27 times slower than searching by index. It should be noted that searching by index does not use parallel processing, which can be seen in Figure 11. Figure 11 presents a plan of the performed query along with the times of individual operations. Forcing the lack of use of the index (NO_INDEX) caused the optimizer to return to the query plan using parallel processing. As can be seen in Figure 12, there are waiting times for parallel processing operations: PX Deq: Execute Reply, PX Deq: Execution Msg. In addition, using parallel processing, the reading of data from the disk takes place after direct path read, which further introduces a delay in the use of the index. 4.5Test 4–Windows systemBoth the TEST_TAB1 full table scan operation and the hash join of two large TEST_TAB1 and TEST_TAB2 tables were tested for all selected levels of DOP parallelization. The change from test 1 was the use of one of the variants of a fast SSD. The test results are shown in Figure 13. Using a faster SSD drive caused that the worst query time at full table scan was better than the lowest time on a powerful Linux server. It is also worth noting that in the case of such large data access times (reading at a level of over 508 MB / s), the role of the processor is limited to a minimum, and thus the level of parallelism is not of practical importance here, which can be seen in the graph above - for full table scanning, the transition from the “weakest” time 21.3 s to the “best” 20.8 s is only 0.5 seconds. Comparing Figures 14 and 15, we can see that the use of SSD significantly increased the reading of data from the disk in a unit of time compared to the HDD, which RDBMS was able to use and translate into a short time of query execution. 5.CONCLUSIONSThe results presented above indicate that, under certain conditions, the use of parallel processing can significantly improve the response time of an RDBMS system to an inquiry. The biggest benefit is the introduction of parallel processing itself, i.e. the transition from a DOP level of 1 to 2. However, in the tested cases, the advantage of other hardware and software techniques over the parallel processing technique is clearly visible. Particularly, the use of efficient SSDs, as well as the use of a binary tree index. Contemporary database systems are forced to deal with an increasing amount of data to be processed. This is solved by investing in newer and more efficient hardware infrastructure. However, it is more important to be able to use your resources using a number of hardware and software solutions to maximize database system performance. This is how to optimize the operation of this system and shorten the time of access to data. The database system consists of a very complex structure. In the relational system, the data contained in the tables are interrelated, which further hinders the rapid implementation of complex queries. An important issue is therefore the proper database design, optimal configuration of your equipment, fine-tuning of the application code. If the above are complete, one can proceed to the implementation and testing of further methods, such as parallel processing. The main problems we aimed to address in this paper were: “how to tune the database and the queries addressed to it?”, and “how to choose the DOP parallelization factor so that the use of parallel processing would be profitable”. Various hardware configurations, application complexity, time required to perform a given operation, number of users - these and many other variables should be taken into account when using parallel processing. First of all, it should be assessed whether its use makes sense. REFERENCESORACLE Database Concepts, https://docs.oracle.com/cd/E11882_01/index.htm Google Scholar
Kevin Loney,
“Oracle Database 11g The Complete Reference,”
Google Scholar
ORACLE Database Performance Tuning Guide, https://docs.oracle.com/cd/E11882_01/server.112/e41573/toc.htm Google Scholar
ORACLE Database Reference, PARALLEL ADAPTIVE MULTI USER, https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/PARALLEL_ADAPTIVE_MULTI_USER.html#GUID-6A4F12CA-E6EA-4D13-A725-80B86404ECFA Google Scholar
Parallel Execution with Oracle Database, WHITE PAPER/FEBRUARY 20,
(2019) https://www.oracle.com/technetwork/database/bi-datawarehousing/twp-parallel-execution-fundamentals-133639.pdf Google Scholar
Douglas Ian Burns,
“How Many Slaves?,”
http://oracledoug.com/serendipity/index.php?/archives/840-How-Many-Slaves.html Google Scholar
SQL trace, 10046, trcsess and tkprof in Oracle, https://oracle-base.com/articles/misc/sql-trace-10046-trcsess-and-tkprof Google Scholar
Wajszczyk Bronislaw Ignacy, Biernacki Konrad,
“Optimization of the efficiency of search operations in the relational database of radio electronic systems,”
in 2017 Radioelectronic Systems Conference,
(2018). Google Scholar
Matuszewski J., Paradowski L.,
“The Knowledge Based Approach for Emitter Identification,”
in 12th International Conference on Microwaves and Radar (MIKON),
810
–814
(1998). https://doi.org/10.1109/MIKON.1998.742832 Google Scholar
Matuszewski J.,
“The Radar Signature in Recognition System Database,”
in 19th International Conference on Microwaves, Radar and Wireless Communications MIKON-2012Warsaw,
617
–622
(2012). https://doi.org/10.1109/MIKON.2012.6233565 Google Scholar
Li, J. & Zhang,
“Cluster based parallel database management system for data intensive computing,”
W. Front. Comput. Sci. China, 3 302
(2009) https://doi.org/10.1007/s11704-009-0031-5 Google Scholar
Pushpa Rani Suri, Sudesh Rani,
“A New Classification for Architecture of Parallel Databases,”
Information Technology Journal, 7 983
(2008). https://doi.org/10.3923/itj.2008.983.991 Google Scholar
Todd Eavis, Ahmad Taleb,
“Query Optimization and Execution in a Parallel Analytics DBMS,”
in Parallel & Distributed Processing Symposium (IPDPS) 2012 IEEE 26th International,
897
–908
(2012). Google Scholar
N.TomovE, Dempster M.H, Williams.A, Burger H., Taylor P.J.B., King. P, Broughton,
“Analytical response time estimation in parallel relational database systems,”
School of Mathematical and Computer Sciences, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS, UK https://doi.org/10.1016/j.parco.2003.11.003 Google Scholar
C. S. Pan, M. L. Zymbler,
“Development of a parallel Database Management System on the Basis of Open-Source POSTGRESQL DBMS,”
Vestnik YuUrGU. Ser. Mat. Model. Progr.,
(12), 112
–120
(2012). Google Scholar
Pietkiewicz, T., Wajszczyk, B.,
“Fusion of identification information from ELINT-ESM sensors,”
in 2017 Radioelectronic Systems conference,
(2018). Google Scholar
Pietkiewicz, T.,
“Removal of conflicts in fusion of identification information from ELINT-ESM sensors,”
in XII Conference on Reconnaissance and Electronic Warfare Systems,
(2019). Google Scholar
|