meanings of Dictionary of Algorithms and Data Structures encyclopedia of Dictionary of Algorithms and Data Structures dictionary of Dictionary of Algorithms and Data Structures thesaurus on Dictionary of Algorithms and Data Structures books about Dictionary of Algorithms and Data Structures dreams about Dictionary of Algorithms and Data Structures
 Dictionary of Algorithms and Data Structures - Definition 

Parallel External Algorithms External sorting performance is very dependent on assumptions about the problem, such as data size, I/O system, organization of input and output files, etc. The performance estimate must also take into account the I/O time for input and output to be useful. In many cases the I/O time is large compared to the compare and sort overhead times, and the problem changes from one of multiprocessor sorting to one of overall system optimization. Lack of time prevented me from doing anything but a few theoretical investigations. Useless Algorithms Many different external sorting algorithms have been proposed for multi-computer environments. Most of them are not suitable for multi-processor computers. For example, in [5] two classes of external algorithms are given, the External Tree sort, and the Pipelined External sort. The External Tree sort assumes that the total main memory available to all processors is equal to the input data size. In this case, a shared memory computer could as well use an internal sorting algorithm. The Pipelined External sort assumes that there are log2 (n) + 1 processors (processes) available. This is not unrealistic, but the algorithm also uses 2+4log2 n files, all of which must be read and written in parallel. The file requirement disqualified it from consideration. Simple External Sort The assumption for ``Simple External sort is that I/O is done sequentially with no parallelism (ie. the input and output files are read and written sequentially, one element at a time), and that the time to compare two elements is small compared to the time to read/write an element. The goal is, of course, to produce a sorted version of the input file as quickly as possible.

Table 5: Simple External sort.


In the algorithm in table 5, m is the number of elements that will fit in main memory at one time. This should be a relatively large number in most cases. For example, if the main memory is 50 MB and the element size is 1000 B, m will be approximately 50000. There are two steps in the algorithm. First, n/m sorted subfiles are produced by reading m elements at a time from the input file, sorting them using the Parallel Quicksort internal sort, and writing the sorted elements. In the second step the subfiles are merged, all at the same time, producing the output file. If the subfiles are to reside on the same physical device, the merge step must assume adequate buffering, for example m / (n/m + 1) element buffers for each file in order to reduce seek overhead. The time to sort n elements, assuming sequential I/O, is

\begin{eqnarray*}

t(n) & = & \frac{n}{m} \left( m t_r + t_\alpha \frac{m}{p} \log_2 m + m t_w \right) +\\

    		& & + n \left( t_w + t_r + \frac{n}{m} t_\beta \right),

\end{eqnarray*} where tr, tw, tα, tβ are the read, write, sort, and merge time per element respectively. Assuming that n/m ≤log2 n, the sorting time is O(n log2 n) . If we further assume that m= n/log2 n and n >> 0 then the sorting time can be approximated by t(n) ≈ 2n (tr + tw) + n log2 n (tα/p + tβ), which is close to optimal given the assumption about I/O and comparison times. A further refinement would be to do the merge step in parallel, although n/m would have to be relatively large in order to get any noticeable performance benefit. External Index Sort The External Index sort is an algorithm which produces a sorted index to the input file, instead of producing a sorted output file. It only works for problems with large elements, stored on a random access device, and who are relatively random (ie. most of the elements should differ in their first bytes). In the algorithm in table 6 r is the number of bytes stored in main memory from each element, for example four. Table 6: External Index sort.


If ν is the probability for two elements to be identical in their first r bytes; tr is the read, tsr is the seek and read, and ts is the sort time per element respectively; the sorting time will be t(n)= n tr + {1p} (ts + νtsr) n log2 n. This is close to optimal, provided that ν is close to zero. Improved Block Bitonic Sort If the merge step in the Simple External sort described above is replaced with a Block Bitonic Merge, merging n/m sorted subfiles, where b is the number of pages (input buffers) per processor, the result is the Improved Block Bitonic sort as given in [7, pages 304-307]. The execution time given for this algorithm is given as where b=n/(m p) is number of memory pages per processor (at least two), k is the number of elements per page, tm , tc is the time to move an element in main memory, and compare two elements, respectively. The other notation is the same as in ``Simple External sort above. The elements are assumed to be read and written a page at a time. Of course, bpk times the element size is equal to the amount of main memory used. The algorithm also assumes that the I/O is distributed between several devices so that there is no I/O contention between the different processors. Simple External Sort Compared to Improved Block Bitonic Sort With the time approximations given above, it is possible to do theoretical comparisons of the Simple External and Improved Block Bitonic sorts. Example 1. Assuming available main memory to be 80 MB, n= 107 , and the element size to be 100 B, and p=4 we get m= 800000 , and b=3 . Reasonable assumption for the times are tα=10 μ s, tβ=5μ s, tr=tw=100 μ s, tm=10 μ s, and tc= 2 μ s. The page-size will be k= 1 element for the Improved Block Bitonic sort, in order to get reasonable performance. The time for Simple External sort is then 5115 s, and for Improved Block Bitonic sort sort 4089 s, an improvement of about 20%. Example 2. If only one processor is used, p=1 , the sorting time is 6586 s for the Simple External sort, and 10356 s for the Improved Block Bitonic sort sort, which is far worse. Using only bk=3 elements per processor for the Improved Block Bitonic sort might degrade I/O performance. However, with adequate buffering and one I/O device per subfile, the performance should be within the limits given above. The examples above show that the Improved Block Bitonic sort has higher parallelism, but worse worst-case performance than the Simple External sort.


Procedure External_Merge_Sort (FILE : FileData) Begin

          Do  
  Begin

No  := 0 ;

           OpenNewFile (File1, File2);

OpenFile (FileData);

           Read (temp from FileData);
           Write (temp  to File2);
                While (Read (Var  from FileData)  not  EOF )   Do
                          Begin

If (Var ณ temp) Then

                                        Write (Var to File2);
                                   Else
                                      Begin

Write (Var to File1);

                                                No := No+1;
                                     End;
             	                     temp := Var;
                    	  End;
       If (No น  0)    Then    Mergesort (File1 , File2 , FileData);
  End;
         While (No >0);
      	   CloseFile (File2);
         	   MoveFile(File2 to  FileData);	

End.











Review External Sorting--This term is used to refer to sorting methods that are employed when the data to be sorted is too large to fit in primary memory. Characteristics of External Sorting 1. During the sort, some of the data must be stored externally. Typically the data will be stored on tape or disk. 2. The cost of accessing data is significantly greater than either bookkeeping or comparison costs. 3. There may be severe restrictions on access. For example, if tape is used, items must be accessed sequentially. Criteria for Developing an External Sorting Algorithm 1. Minimize number of times an item is accessed. 2. Access items in sequential order

Sort Merge Sort merge is the strategy of choice for external sorting because it: 1. Accesses records sequentially 2. Minimizes block accesses 3. Gives a stable sort

Sort Merge Strategy 1. Divide the file into runs such that the size of a run is small enough to fit into main memory 2. Sort each run in main memory using a fast in-memory sorting algorithm 3. Merge the resulting runs together into successively bigger runs, until the file is sorted.

Balanced Multiway Merging Strategy 1. Select an equal number of I/O units (e.g., tapes, disks) 2. Sort Step o Select half of the I/O units to be output files o Using an in-memory sorting algorithm, create the initial runs and divide them evenly among the output files 3. Merge Step o (a) On each merge step, alternate the I/O units so that on one step, a unit is an input file and on the next step it is an output file o (b)Read one run from each of the input files, merge the runs, and store the resulting run on an output file. Alternate output files so that the runs are evenly distributed on the output files. o (c)Repeat step b until all the runs on the input files have been merged. o (d)Repeat steps a-c until the file has been sorted Example Sort the keys A S O R T I N G A N D M E R G I N G E X A M P L E assuming that initial run sizes are 3 and that 6 I/O units are used. Pseudo-Code The pseudo-code in this section assumes that the user provides an input file and an output file. It also assumes that two parameters for the balanced multi-way algorithm are determined by the user via switches: 1. initial run-size: The size, in number of records, of the initial runs 2. num-ways: The parameter P. In other words, P-way merging will be used. The Main Procedure main(argc, argv) {

   assign default values for run_size and num_ways
   parse the switches and, if appropriate, assign the user-defined
        values to run-size and num_ways
   open the output file 
   create two arrays of scratch files
   read the input file, create the initial runs, and assign the runs
       to the scratch output files (create_initial_runs)
   sort the runs using the sort_merge algorithm (sort_merge)
   close the output file

} Creating the Initial Runs create_initial_runs(input_file_name, run_size, num_ways) {

   allocate a dynamic array, a, large enough to accommodate runs of 
       size run_size
   open the input file 
   for i = 0 to NUM_WAYS-1 {
       open output_scratch_file i 
   }
   more_input = true
   next_output_file = 0
   num_runs_per_output_file = 0
   while (more_input) {
       for i = 1 to run_size {   /* entry 0 in the array is reserved for

a possible sentinel value. If your

                                     sorting algorithm does not require a

sentinel value, the for loop could start at i = 0 and go to (run_size - 1) */

           if (not end_of_input_file) 

read a record into a[i] else { more_input = false break }

        }
        sort array a using an in-memory algorithm like quicksort

/* write the records to the appropriate scratch output file for j = 1 to i { /* can't assume that the loop runs to run_size since the last run's length may be less than run_size */

            write a[i] to scratch_output_file[next_output_file]
        }

output the sentinel value to scratch_output_file[next_output_file]

/* everytime we get back to the first output file, increment the number of runs per output file by 1 */ if (next_output_file == 0) num_runs_per_output_file = num_runs_per_output_file + 1 next_output_file = (next_output_file + 1) % num_ways

   }
   /* make sure the same number of runs are assigned to each scratch 
      output file */
   if (next_output_file != 0) {
       for i = next_output_file to (num_ways - 1) {

output the sentinel value to scratch_output_file[i]

       }     
   }
   for i = 0 to (num_ways -1)
       close scratch_output_file[i] 
   close the input file 

   return num_runs_per_output_file

} The Sort-Merge Procedure sortmerge(output_file, num_runs_per_scratch_output_file, num_ways) {

 for (N = num_runs_per_scratch_output_file; N > 1; 
       N = ceiling(N / num_ways) /* ceiling is a function that rounds up

to the nearest integer. You need to write this function yourself */

 {
   open_scratch_files()  /* open input and output scratch files */
   for i = 0 to (N-1) {
     create_run(output_scratch_files[i % num_ways], true, num_ways);
   }
   /* make sure the same number of runs are assigned to each scratch 
      output file */
   if ((i % num_ways) != 0) {
       for j = (i % num_ways) to (num_ways - 1) {

output the sentinel value to scratch_output_file[j]

       }     
   }
   close_scratch_files();  /* close input and output scratch files */
 }
 /* make the last run write into the output file */
 open_scratch_files();
 create_run(output_file, false, num_ways);
 close_scratch_files();

}

create_run(output_file, generate_sentinel_value_flag, num_ways) {

 /* initialize the merge_array */
 for i = 0 to (num_ways -1) {
   read a record from input_scratch_file i to merge_array[i]
 /* create the run */
 while (true) {
   find the minimum key and the index of that minimum key (min_index) 
        in merge_array
   if (min == SENTINEL_VALUE)  /* the run is complete if the sentinel value
     break;                         is reached */
   write the record with the minimum key in merge_array to output_file
   read the next record from the input_scratch_file with index min_index
       into merge_array[min_index]
 }
 if (generate_sentinel_value_flag == true)
   write the sentinal value to the output file

}

Opening Scratch Files If the sort is a num_ways sort, then your program will need num_ways scratch input files and num_ways scratch output files. Your program will need to alternate between using a scratch file as an input file and as an output file. On one run it will be an input file, then on the next run an output file, then an input file, etc. The easiest way to handle this alternation is to maintain a flag that keeps track of which of your two file arrays is currently the input array. For example, you might declare a flag called input1, which if true indicates that your first array is the current input array and if false indicates that your second array is the current input array. As an example of the use of this flag, here is the code for open_scratch_files: void open_scratch_files () {

 // if input1 is true, open the first bank of files for input and
 // the second bank for output
 if (input1) {
   for (i = 0; i < num_ways; i++) {
     filearray1[i].Open('i');
     filearray2[i].Open('o');
   }
 }
 // if input1 is false, open the first bank of files for output and
 // the second bank for input
 else {
   for (i = 0; i < num_ways; i++) {
     filearray1[i].Open('o');
     filearray2[i].Open('i');
   }
 }

}

Performance 


1. If M records can be sorted in-memory and the file consists of 2. N records, then the number of initial runs is N / M. 3. 4. 5. If there are 2P I/O units available, then the number of 6. subsequent passes is ceiling(logP(N / M)) since each 7. pass reduces the number of runs by P. Here ceiling(x) means 8. the smallest integer greater than or equal to x. External Sorting Introduction External sorting refers to the sorting of a file that is on disk (or tape). Internal sorting refers to the sorting of an array of data that is in RAM. The main concern with external sorting is to minimize disk access since reading a disk block takes about a million times longer than accessing an item in RAM (according to Shaffer -- see the reference at the end of this document). Perhaps the simplest form of external sorting is to use a fast internal sort with good locality of reference (which means that it tends to reference nearby items, not widely scattered items) and hope that your operating system's virtual memory can handle it. (Quicksort is one sort algorithm that is generally very fast and has good locality of reference.) If the file is too huge, however, even virtual memory might be unable to fit it. Also, the performance may not be too great due to the large amount of time it takes to access data on disk. Methods Most external sort routines are based on mergesort. They typically break a large data file into a number of shorter, sorted "runs". These can be produced by repeatedly reading a section of the data file into RAM, sorting it with ordinary quicksort, and writing the sorted data to disk. After the sorted runs have been generated, a merge algorithm is used to combine sorted files into longer sorted files. The simplest scheme is to use a 2-way merge: merge 2 sorted files into one sorted file, then merge 2 more, and so on until there is just one large sorted file. A better scheme is a multiway merge algorithm: it might merge perhaps 128 shorter runs together. Analysis According to Shaffer, a multiway merge using half a megabyte of RAM and a disk block size of 4 KB could hold 128 disk blocks in RAM at once. This would allow 128 runs to be merged together in one pass. The average initial run size would be 1 MB. (See Shaffer on how that can be obtained with only 1/2 MB of RAM.) A file of size 128 MB could be sorted in 2 passes (one to build the initial runs and one to merge them). A file of size 16 gigabytes could be sorted in just 3 passes. Note that you do not want to jump back and forth between 2 or more files in trying to merge them (while writing to a third file). This would likely produce a lot of time-consuming disk seeks. Instead, on a single-user PC, it is better to read a block of each of the 2 (or more) files into RAM and carry out the merge algorithm there, with only the output being written (sequentially) to disk. When the merge algorithm exhausts one of the blocks of data, refill it by reading from disk another block of the associated file. This is called buffering. On a larger machine where the disk drive is being shared among many users, it may not make sense to worry about this as the read/write head is going to be seeking all over the place anyway. Practical Data Shaffer presents the following practical data concerning external sorting. In this experiment a 4 MB file was sorted on a particular computer. A simple mergesort that did not build initial sorted runs took 451 seconds. A 2-way mergesort that used initial runs of 128 KB took only 160 seconds. A multiway mergesort that used the same initial runs took only 103 seconds. Clearly, using initial sorted runs dramatically speeds up the sorting. References See the references below for more complete information and more advanced methods. Then try writing your own external sort! • A Practical Introduction to Data Structures and Algorithm Analysis. Clifford A. Shaffer. Prentice-Hall (1997). See chapter 9. • Data Structures with C++. William Ford, William Topp. Prentice-Hall (1996). See pages 830 and following. • Data Structures: Form and Function. Harry F. Smith. Harcourt Brace Jovanovich (1987). See pages 712 and following.

Copyright 2008 WordIQ.com - Privacy Policy  ::  Terms of Use  :: Contact Us  :: About Us
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Dictionary of Algorithms and Data Structures".