Creating Your Own File Differentiator and Similarity Finder

The code for this blog has already been uploaded to my GitHub account. Please read the code there! All the code was written in the following configuration: Abstract: File difference and similarity find is a cross-platform C++ program developed to give information about all the lines that are similar and all the lines that are different! The sequential algorithm is itself is very fast and … Continue reading Creating Your Own File Differentiator and Similarity Finder

chintamanibharath19 Apr 201720 Apr 2017Leave a comment

Uncategorized

The Cache/Memory Model in C based Languages for Matrices

Introduction: A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have different independent caches, including instruction … Continue reading The Cache/Memory Model in C based Languages for Matrices

chintamanibharath19 Apr 2017Leave a comment

Uncategorized

Matrix Multiplication – Optimizations and Speed Up!

This blog entry is how about how you can make a naive matrix multiplication cache friendly, improve the speed of divide and Conquer Matrix Multiplication using C’s OpenMP API and Java’s Executor class. All of the code present in this blog has been uploaded to my GitHub account. The link for Naive Matrix Multiplication (with and without parallel) is : Naive Matrix Multiplication in C … Continue reading Matrix Multiplication – Optimizations and Speed Up!

chintamanibharath18 Apr 201720 Apr 2017Leave a comment

Uncategorized

How does OpenMP work?

This blog post is my understanding of how OpenMP works! OpenMP API is one of the easilest ways to improve performance of the code. This post is mostly inspired by the content from OpenMP 3.1. Execution Model: OpenMP API uses the fork-join model. In OpenMP, the program begins as a single thread. When the program thread or the initial thread reaches or encounters a parallel … Continue reading How does OpenMP work?

chintamanibharath8 Mar 201716 Apr 2017Leave a comment

Matrix Multiplication, Parallel Code, Parallel Divide and Conquer Matrix Multiplication

Matrix Multiplication – Using Java

This blog entry is my explanation to my code uploaded on my GitHub account. The link for the code for this article is Link to Matrix Multiplication Folder.
Note: I am explaning how I speed up the code and not the Matrix Multiplication Algorithms.
Please read about Matrix Multiplication algorithms here!
The configuration of the system used to run the codes were as follows :

The notion of parallelism rhymes well with the notation of Divide and Conquer Approach for Algorithms provided each sub-part/task in the corresponding Divide and Conquer Algorithm have no shared data which could possible cause race conditions.

First I would like to compare the running times of all the implemented algorithms and then go on to explain how I speed up the non-parallel implementations.

Running Times for these algorithms:

As we can see, the winner of this race is Brute Force followed by Parallel Strassen’s Matrix Multiplication algorithm closely followed by Parallel Divide and Conquer Matrix Multiplication and Strassen’s Matrix Multiplication (in the specified order). Divide and Conquer Matrix Multiplication Algorithm came last taking over 3 plus hours on the specifications specified.

Orders Of the Algorithms Used:

Brute Force Algorithm: θ(n³)

Divide and Conquer Algorithm: θ(n³)

Strassen’s Algorithm: θ(n^2.80)

So why did Brute Force Come on top?

The justification I came up with was that there are lesser memory checks in the Brute Force implementation of Matrix Multiplication than that of others. Furthermore, perhaps, as in the two Divide and Conquer Algorithm, each time the 2d array is divided into subparts, the Java JVM does memory checks and Java References are pass by value add overhead.

Implementation:

Libraries Used:

import java.util.*;
import java.text.DecimalFormat;
import java.util.concurrent.*;

The java.util.* library contains all the basic utilities such as Scanner Class, System Class, Maps, Queues, etc. Read more about this library here.

The java.text.DecimalFormat library was used to round the value of time with the precision of 6 decimal digits. Read more about this library here.

The java.util.concurrent.* library contains the basic utilities for concurrent programming. Read more about this library here.

Code Explanation:

For Parallel Divide and Conquer Algorithm:

Note: As posting the full code here would make the blog article really big, I am only posting the snippets of my code here. Please refer to the folder or click on this link specified before for the full code.

Assuming that most of us know how to create a class, write the main method, create a 2d array and taking input through scanner object. I will explain few non-trivial parts of my code.

In class Parallel Matrix Multiplication:

private static final int POOL_SIZE = Runtime.getRuntime().availableProcessors();
private final ExecutorService exec = Executors.newFixedThreadPool(POOL_SIZE);

The java.lang.Runtime class allows the application to interface with the environment in which the application is running. This class has a method called availableProcessors() which returns the number of processors available to the Java virtual machine.

The newFixedThreadPool(int numberOfthreads) method of the class Executors creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue. At most numberOfthreads will actively process tasks. The ExecutorService interface allows asynchronous execution of tasks in the background.

public void multiply() 
{
      Future f = exec.submit(new Multiplier(a, b, c, 0, 0, 0, 0, 0, 0, a.length));
      try 
      {
          f.get();
          exec.shutdown();
      } catch (Exception e) 
      {
          System.out.println(e);
      }
}

The submit(Callable task) of the ExecutorService interface takes a task for execution and return a Future reference which represents the pending results of the task.

The get() method of the Future interface waits until the task is completed. It throws an InterrupedException or an ExecutionException which needs to be handled.

Let me now explain about my inner class Multiplier:

Note: As this class is a little large, it better to open this link for a better read.

class Multiplier implements Runnable
{
     private double[][] a;
     private double[][] b;
     private double[][] c;
     private int a_i, a_j, b_i, b_j, c_i, c_j, size;

     Multiplier(double[][] a, double[][] b, double[][] c, int a_i, int a_j, int b_i, int b_j, int c_i, int c_j, int size)
     {
         this.a = a;
         this.b = b;
         this.c = c;
         this.a_i = a_i;
         this.a_j = a_j;
         this.b_i = b_i;
         this.b_j = b_j;
         this.c_i = c_i;
         this.c_j = c_j;
         this.size = size;
     }
     public void multiply(double[][] a, double[][] b, double[][] c, int a_i, int a_j, int b_i, int b_j, int c_i, int c_j, int size)
     {
         if(size==1)
         {
             c[c_i][c_j]+=a[a_i][a_j]*b[b_i][b_j];
         }
         else 
         {
             int m=size/2;
             double a11[][]=copy(a,a_i,a_i+m,a_j,a_j+m);
             double a12[][]=copy(a,a_i,a_i+m,a_j+m,a_j+size);
             double a21[][]=copy(a,a_i+m,a_i+size,a_j,a_j+m);
             double a22[][]=copy(a,a_i+m,a_i+size,a_j+m,a_j+size);
             double b11[][]=copy(b,b_i,b_i+m,b_j,b_j+m);
             double b12[][]=copy(b,b_i,b_i+m,b_j+m,b_j+size);
             double b21[][]=copy(b,b_i+m,b_i+size,b_j+0,b_j+m);
             double b22[][]=copy(b,b_i+m,b_i+size,b_j+m,b_j+size);
             double x1[][]=new double[m][m];
             double x2[][]=new double[m][m];
             double x3[][]=new double[m][m];
             double x4[][]=new double[m][m];
             double x5[][]=new double[m][m];
             double x6[][]=new double[m][m];
             double x7[][]=new double[m][m];
             double x8[][]=new double[m][m];
             multiply(a11,b11,x1,0,0,0,0,0,0,m);
             multiply(a12,b21,x2,0,0,0,0,0,0,m);
             multiply(a11,b12,x3,0,0,0,0,0,0,m);
             multiply(a12,b22,x4,0,0,0,0,0,0,m);
             multiply(a21,b11,x5,0,0,0,0,0,0,m);
             multiply(a22,b21,x6,0,0,0,0,0,0,m);
             multiply(a21,b12,x7,0,0,0,0,0,0,m);
             multiply(a22,b22,x8,0,0,0,0,0,0,m);
             combine(c,add(x1,x2,m),add(x3,x4,m),add(x5,x6,m),add(x7,x8,m),c_i,c_j,size);
         }
     }
     public void combine(double c[][],double c11[][],double c12[][],double c21[][],double c22[][],int c_i,int c_j,int n)
     {
        int m=n/2;
        for(int i=0;i<m;i++)
        {
            for(int j=0;j<m;j++)
            {
                c[c_i+i][c_j+j]+=c11[i][j];
            }
        }
        for(int i=0;i<m;i++)
        {
            for(int j=m;j<2*m;j++)
            { 
                c[c_i+i][c_j+j]+=c12[i][j-m];
            }
        }
        for(int i=m;i<2*m;i++)
        {
            for(int j=0;j<m;j++)
            {
                c[c_i+i][c_j+j]+=c21[i-m][j];
            }
        }
        for(int i=m;i<2*m;i++)
        {
            for(int j=m;j<2*m;j++)
            {
                c[c_i+i][c_j+j]+=c22[i-m][j-m];
            }
        }
    } 
    public double[][] add(double a[][],double b[][],int n)
    {
        double c[][]=new double[n][n];
        for(int i=0;i<n;i++)
        {
            for(int j=0;j<n;j++)
            {
                c[i][j]=a[i][j]+b[i][j];
            }
        }
        return c;
    }
    public double[][] copy(double [][]a,int n1,int n2,int m1,int m2 )
    {
        double c[][]=new double[n2-n1][m2-m1];
        for(int i=n1;i<n2;i++)
        {
            for(int j=m1;j<m2;j++)
            {
                c[i-n1][j-m1]=a[i][j];
            }
        }
        return c;
    }
    public void run() 
    {
        int h = size/2;
        if (size <= 64) 
        {
            multiply(a,b,c,a_i,a_j,b_i,b_j,c_i,c_j,size);
        } 
        else 
        {
            Multiplier[] tasks = {
              new Multiplier(a, b, c, a_i, a_j, b_i, b_j, c_i, c_j, h),
              new Multiplier(a, b, c, a_i, a_j+h, b_i+h, b_j, c_i, c_j, h),
              new Multiplier(a, b, c, a_i, a_j, b_i, b_j+h, c_i, c_j+h, h),
              new Multiplier(a, b, c, a_i, a_j+h, b_i+h, b_j+h, c_i, c_j+h, h),
              new Multiplier(a, b, c, a_i+h, a_j, b_i, b_j, c_i+h, c_j, h),
              new Multiplier(a, b, c, a_i+h, a_j+h, b_i+h, b_j, c_i+h, c_j, h),
              new Multiplier(a, b, c, a_i+h, a_j, b_i, b_j+h, c_i+h, c_j+h, h),
              new Multiplier(a, b, c, a_i+h, a_j+h, b_i+h, b_j+h, c_i+h, c_j+h, h)
           };
           FutureTask[] fs = new FutureTask[tasks.length/2];
           for (int i = 0; i < tasks.length; i+=2) 
           {
               fs[i/2] = new FutureTask(new Sequentializer(tasks[i], tasks[i+1]), null);
               exec.execute(fs[i/2]);
           }
           for (int i = 0; i < fs.length; ++i) 
           {
               fs[i].run();
           }
           try 
           {
               for (int i = 0; i < fs.length; ++i) 
               {
                   fs[i].get();
               }
           } 
           catch (Exception e) 
           {
               System.out.println(e);
           }
       }
   } 
}

Continue reading “Matrix Multiplication – Using Java”

chintamanibharath23 Feb 201718 Apr 2017Leave a comment

Bharath's Blog

See the world in the parallel way!

Creating Your Own File Differentiator and Similarity Finder

The Cache/Memory Model in C based Languages for Matrices

Matrix Multiplication – Optimizations and Speed Up!

How does OpenMP work?

Matrix Multiplication – Using Java

Running Times for these algorithms:

Orders Of the Algorithms Used:

So why did Brute Force Come on top?

Implementation:

Libraries Used:

Code Explanation:

For Parallel Divide and Conquer Algorithm:

Note: As posting the full code here would make the blog article really big, I am only posting the snippets of my code here. Please refer to the folder or click on this link specified before for the full code.

In class Parallel Matrix Multiplication:

Note: As this class is a little large, it better to open this link for a better read.