• Click here for a list of all my projects.
  • Click here for Touchlib info. Source code here.
  • Click here for our multitouch community site.

Monday, August 11, 2008

Genome Work Log

  • Made a few minor changes to the DNA programming language. It now pretty closely mirrors ARB GPU assembly language (sans dot product and some other things which aren't terribly useful for audio). I'm leaving the door open to someday running plugins on the GPU. I may also do some naive tests to see how well it works and what cpu usage looks like (even though it won't be terribly optmized).
  • Refactored my 'Plugin' classes a bit. Elimited about a dozen classes and replaced them with one class that uses C++ 'templating'.
  • Restructured my plugin directories to make more sense in light of my new scripting languages.
  • Next step will be to automatically read .EEL and .DNA programs and create plugins for them.

Saturday, August 09, 2008

General Purpose GPU roundup

I spent quite a few hours yesterday researching what's currently available utilizing graphics cards to do processing. Graphics cards are essentially stream processors - meaning they are well suited to doing repeated operations on arrays of data. Audio processing is a stream process and could benefit from being run on a GPU. Since GPU's are used to tailored for graphics, essentially we would be using textures as our audio buffers. Currently there are several options available for exploiting GPU's that help make it more like regular programming and hide the complexities of dealing with textures and other strictly graphics based idioms. Here's what I've found:

    • Brook. Brook takes programs written in a special language and converts them into C++ code. That code generates custom GPU instructions at runtime, which are
      tailored to your specific hardware.
    • CUDA. Specific to NVidia GPU's, but a very nice package overall. Uses a custom compiler to compile special programs which are linked into your code. You can call these functions right from your code.
    • Sh. A library that allows you to create GPU programs by interacting with objects just as you would in a regular C++ program (adding, multiplying, etc). It uses overloaded operators and other tricks in order to leverage C++ as a language, rather than making it's own custom language like CUDA and Brook. The advantage of SH is that it's all done in realtime - there is no need for compilation of special programs (like with Brook and CUDA). Sh also does graphics programming if you want to use it for that.
    • And while not really a general purpose library, there's also nvidia Cg which can compile programs in realtime.

Now for my needs, I was looking for something that doesn't require me to compile programs ahead of time. I'd like to be able to take my intermediate audio language (DNA) and either run it on the CPU or on the GPU at runtime. This leaves out Brook and CUDA. Sh might kind of be OK for this, but isn't really aimed at this sort of thing and I would probably be circumventing a lot of stuff to get it to work how I want. Plus it's probably not worth the extra overhead and dependencies if I am just stripping out a lot of the functionality. So, that really leaves me with writing my own backend.

Now the other big downside to using the GPU is that there is a lot of overhead getting data on and off the graphics card. It's best to use the GPU when you can offload all or most of your processing to the card. It's not worth it to process little chunks, one at a time. So, for an audio system what would be cool is if I can offload the entire song (all modules) to the GPU card. This would require a significant reworking of Genome's internals so I am not sure I want to go this route just yet. Plus, it remains to be seen how many people have GPU's worth taking advantage of and how many people would be interested in buying such a product. It might be better suited to a research project some day. For now, it might be better to target Multi-core CPU's as opposed to GPU's.

In the near term though, I can think of a couple audio applications that might be suited to using CUDA (or another of the above libs):

  • My planned Doppelmangler rewrite. Doppelmangler uses FFT's for analysis. CUDA already has an optimized FFT written for it. The Doppelmangler of my dreams is able to take audio samples and manipulate them in fantastic ways without loss of quality. If it can do that in realtime, with low CPU usage then it would be awesome.
  • Physical modelling. True 'physical' modelling using actual 3d models might be interesting (and might be possible to do using Vertex programs), in addition to the more traditional numerical models using waveguides and such (fragment programs). I think it would be awesome if someday I can actually 'build' an instrument in 3D (rather than just dealing with DSP processes). In theory anyway..
  • Reverbs and convolution effects (already been done, I know)

Looking ahead, Stream processing promises to make it's way into everyone's machines eventually. Intel is working on the Larrabee processor which will perform GPU functions in addition to General Purpose streaming functions. Multicore is great for the average user who is running several programs at once, but difficult to program efficiently for. Stream processing can be orders of magnitude faster for the kinds of tasks I'm interested in. Future processors will probably include multiple cores and multiple stream processors - essentially everyone will have a supercomputer on their desk.

Thursday, August 07, 2008

Audio Scripting Experiments

I got the first run of my DSP scripting language (which I am calling DNA) up and running. Initial results were pretty encouraging. The same script that took 15% cpu in EEL only took a couple percent in DNA script. The code is very similar to assembly language, so it would be possible to make a higher level version of the language that compiles down to DNA assembly. Assembly code is very efficient and also is easy to parse/compile - it only took me two days to get the parser and the core VM up and running. It should also be possible to take the same code and run it on on a GPU which would be very cool. I'll be working on optimizing the DNA VM and investigating GPU stuff this weekend.

Wednesday, August 06, 2008

Elgg 1.0 Due August 18th

Elgg one of the Social Networking platforms I've been watching is due to be released august 18th. Elgg is one of the few free / open source options out there for doing a social networking site and it promises to offer a much improved interface and a very nice API for developing plugins and widgets for new types of user generated content. The 18th will be the first day that the new code is available to download, and I expect that lots of user made plugins and mods will follow. There's still a lot of features lacking, but I think the base has been laid down with this new version is pretty strong and it promises to be a good platform to build on. Plus it's free. :) I'll post my impressions of the API once the 18th rolls around.

Monday, August 04, 2008

EEL Scripting for audio

I did my first test getting an EEL DSP script up and running under Genome. Only a few minor syntax changes were necessary to get the script to compile under Visual Studio (though it did take the better part of a day to get it working). Overall I am happy with the features of the language and the fact that it has so few dependencies (and is generally written in ansii C, which hopefully means it's fairly fast). Initial speed tests were a bit disappointing. A simple FM synth (one operator, one modulator) ended up taking about 15-20% of my cpu time. Ideally I'd like a simple EEL module to take a percent or two. A C++ version of the same synth would take 0.1% or less of the cpu. So, I am currently looking into optimizing eel to tailor it to my needs a bit more. The eel core is fairly easy to understand (despite it's overly aggressive use of macros). These days I have been using AMD Code Analyst, which is a nice, free version of VTune, for my optimization needs. The latest version runs right inside visual studio and doesn't require any special instrumentation. You just run the app and it starts collecting results. For those unfamiliar with these kinds of applications, basically they can tell you which functions are consuming cpu resources and then allow you to go line by line and see which lines are taking up the most time. This is essential for helping decide how to go about optimizing a large and complicated program. Obviously it doesn't make sense to optimize everything. Your code will become a mess. You want to target your optimizations to the inner loops - the things that are done the most often.

Most of the EEL execution time is spent looping through instructions, computing arithmetic operations and copying variables. I spotted some areas that could see huge gains in the arithmetic function. Basically there is a gigantic case statement which checks every possible combination of operation and variable type. Instead of one monolithic case statement I could see this being broken down into several nested comparisons and have at most 5 or 6 comparisons per operation, instead of the dozens that an average operation would take now. I also plan on moving up common operations and moving down uncommon ones. I am debating whether the instruction set can be simplified. The smaller the instruction set, the less time I'll be spending figuring out which instruction to execute. For my audio/dsp needs I am going to be operating on floating point numbers most of the time. Some variable types or operations might be removable. So, it should be interesting work to try to speed this up. Already I am learning a lot about VM's and the internals of scripting languages just by going through the code.

UPDATE:
In my searches I found that some compilers will convert a case statement into a what is known as a 'jump table' or 'branch table', which is basically an table of jump locations keyed by the variable you are evaluating in your switch statement. This is much faster than evaluating a huge case statement. Not sure if my compiler is doing that - if it is, then my optimization hopes are dimmed.

Right now I am toying around with making my own ASM-style scripting language with a very reduced opcode and ability set which. The language would be very specifically focused on processing streams of floating point audio data. I am hoping that these two things will make it a lot faster than EEL, which is fairly high level and provides a lot of functionality that isn't important for processing a stream of audio. It might be possible to combine EEL and this ASM language to cover both the high-level and low level bases. The language follows what I posted previously. I coded the 'compiler' last night and I'll work on the VM tonight. I'll post when I get some results.

Labels: , ,