Monday

"Exploration vs. Exploitation" & "Performance vs. Convenience"

If you are using optimization in your research you probably heard about "Exploration vs. Exploitation". You cant increase both of them together. The exploration's goal is to select the samples that explore and stretch the search space as much as possible. However, in exploitation phase we are trying to reduce  the search space and focus to select samples near the optimizer. Often the cost of evaluating the objective function is high and in many cases we need a trade-off between them.  

This idea is true in programming too. In many researches, we are looking for higher performance, then we need to program in a low level languages (like C or Fortran), anyhow, implementing and working in this languages and testing different ideas are not easy, and the research will turn to be more in computer science and the ability to code in low level languages. This makes many researchers to go for convenience and try to find a trade-off between performance and convenience.

Many researchers are using Matalb and R. In many cases people are using Matlab for linear algebra and also R for statistics. You can use Matlab for statistics too, but to be honest its frustrating (at least in my experience)  because R has many ready to go packages for almost every statistical algorithms, functions, distributions and etc. You can find codes online for all of them which makes it very easy to implement and change these codes to your costume codes. 

Matlab and R have some drawbacks. The most important impediment of both of them is speed. Beside of that Matlab is a commercial software and for R you need to install some IDE environment for coding (like RStudio) for better development and debugging. 
So you may say OK, why not try some tricks in Matlab like creating MEX files, vectorization and parallelization. The truth is, you can do all of that and its different from case to case, however, in some instances will not help that much. 

If you are a Matalb user and you like to keep programming in your syntax, you can try Armadillo, which is a C++ linear algebra library with Matlab's syntax. Its free and it uses LAPACK (so should be fast). Other option could be using ROOT which is developed in CERN (Switzerland). CERN  is famous for its particle physics laboratory. I don't know by using ROOT how much you can  speed up your program, although the ROOT's syntax is not like Matlab and is written in C++.  What about DylanGOO and NewtonScript?
NumPy could be a good program language which combines the functionality of both Matlab and R. 
But which one is better?
The bottom line is, at some point you will need the third language (like C or Fortran) for speed up and higher performance.

In the next post I will talk about new language which seems to be a promising  language for scientists and researchers. 
See you soon!  :)




No comments:

Post a Comment