optimum spectrum frequency : December 2012

Monday

because we are greedy

So in one of previous posts I was talking about "Performance vs. Convenience" and need for a new programming language, but WHY?

because we are greedy.

We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language.
We love all of these languages; they are wonderful and powerful. For the work we do — scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing — each one is perfect for some aspects of the work and terrible for others. Each one is a trade-off.
We are greedy: we want more.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

While we’re being demanding, we want something that provides the distributed power of Hadoop — without the kilobytes of boilerplate Java and XML; without being forced to sift through gigabytes of log files on hundreds of machines to find our bugs. We want the power without the layers of impenetrable complexity. We want to write simple scalar loops that compile down to tight machine code using just the registers on a single CPU. We want to write A*B and launch a thousand computations on a thousand machines, calculating a vast matrix product together.

We never want to mention types when we don’t feel like it. But when we need polymorphic functions, we want to use generic programming to write an algorithm just once and apply it to an infinite lattice of types; we want to use multiple dispatch to efficiently pick the best method for all of a function’s arguments, from dozens of method definitions, providing common functionality across drastically different types. Despite all this power, we want the language to be simple and clean.
All this doesn’t seem like too much to ask for, does it?
Even though we recognize that we are inexcusably greedy, we still want to have it all.

What you just read was the answer for "Why Julia was created".
You may read one of the developers interview's here and a presentation here.

Julia seems to be a promising language for scientists and researchers.
Time will show how performance and convenience will meet each other finally.

SVD film (1976)

"Exploration vs. Exploitation" & "Performance vs. Convenience"

If you are using optimization in your research you probably heard about "Exploration vs. Exploitation". You cant increase both of them together. The exploration's goal is to select the samples that explore and stretch the search space as much as possible. However, in exploitation phase we are trying to reduce the search space and focus to select samples near the optimizer. Often the cost of evaluating the objective function is high and in many cases we need a trade-off between them.

This idea is true in programming too. In many researches, we are looking for higher performance, then we need to program in a low level languages (like C or Fortran), anyhow, implementing and working in this languages and testing different ideas are not easy, and the research will turn to be more in computer science and the ability to code in low level languages. This makes many researchers to go for convenience and try to find a trade-off between performance and convenience.

Many researchers are using Matalb and R. In many cases people are using Matlab for linear algebra and also R for statistics. You can use Matlab for statistics too, but to be honest its frustrating (at least in my experience) because R has many ready to go packages for almost every statistical algorithms, functions, distributions and etc. You can find codes online for all of them which makes it very easy to implement and change these codes to your costume codes.

Matlab and R have some drawbacks. The most important impediment of both of them is speed. Beside of that Matlab is a commercial software and for R you need to install some IDE environment for coding (like RStudio) for better development and debugging.

So you may say OK, why not try some tricks in Matlab like creating MEX files, vectorization and parallelization. The truth is, you can do all of that and its different from case to case, however, in some instances will not help that much.

If you are a Matalb user and you like to keep programming in your syntax, you can try Armadillo, which is a C++ linear algebra library with Matlab's syntax. Its free and it uses LAPACK (so should be fast). Other option could be using ROOT which is developed in CERN (Switzerland). CERN is famous for its particle physics laboratory. I don't know by using ROOT how much you can speed up your program, although the ROOT's syntax is not like Matlab and is written in C++. What about Dylan, GOO and NewtonScript?

SciPy or NumPy?

NumPy could be a good program language which combines the functionality of both Matlab and R.

But which one is better?

The bottom line is, at some point you will need the third language (like C or Fortran) for speed up and higher performance.

In the next post I will talk about new language which seems to be a promising language for scientists and researchers.

See you soon! :)

Sunday

The UNIX Operating System (1982)

Thursday

Matalb "out of memory" problem

If you are working and coding with Matlab, you probably encounter this problem especially when your code has many iterations or you are dealing with libraries you want to use inside of your code.

Here are some suggestions:

1) The interface of Matlab written in Java so one option could be increasing the heap size of "Java Heap Memory" (File->Preferences->Java Heap Memory).

2) Try "-nodesktop" mode. You need to go to cmd and run Matlab in the directory you want to run your program. Also in your command line type: -nodesktop. You will have just a window (like cmd) to type the code you want to run.

3) If the problem you are encountering is related to the library you are calling or using in your Matlab code, the other option could be using the MEX file. You need to create a MEX file in Matlab using that library. After compiling and creating the MEX file , you can use it like other build-in functions in Matlab. Creating the MEX file sometimes is tricky a little bit but I think its completely worth it to have your own costume function instead of loading the library each time.

4) Use machines with bigger RAMs. Many universities have HPC (High Performance Computing) lab. The amount of RAM you can get over there is usually huge. In our university I tried to see how much RAM I can get by creating a vary big matrix (5e4*5e4), and I could get almost 21 GB RAM.

I would be happy if you let me know about other methods to deal with this problem.