Improve Common Lisp matrix multiplication #1

snunez1 · 2024-10-08T11:39:56Z

As an academic exercise, it would be interesting to see how fast we could get the mm function to run in Common Lisp. This function is where 85% of the application spends its time, and any improvement there will greatly speed up the tokens / second rate.

There's a few ways this could be done. There was a recent reddit thread discussing optimising matrix multiplication, both from an algorithmic perspective and using hardware acceleration. One user (Robert Mayer) implemented some of this and achieved results close to Numpy, and that was without threading.

A second method was suggested by Tim Bradshaw in a CCL mailing list thread when I was implementing pinned arrays in LLA for CCL. His approach uses CFFI to do the heavy lifting. Here's the thread, quoting Tim:

An alternative approach to this is to go the other way: given a blob of memory which you know is an array in the foreign language, write accessors for it in Lisp. That avoids the GC-moving-the-array problem, but gives you in exchange all the problems the GC solves for you. I don't know if CCL has mechanisms which allow dynamically-scoped allocation and freeing of blocks of foreign memory: if it does that makes it less fraught in many cases, I think.

And of course these things aren't arrays to Lisp. But if what you want is matrices and you're using BLAS, you may not really need arrays.

I wrote some (very non-production-ready) code to experiment with this using CFFI, mostly to test whether access to elements of such blobs from Lisp as doubles is fast: it is. In LW it's competitive with arrays, in SBCL it's not stupidly slower. I have not tried CCL.

I asked about lisp-side array operations, to which Tim responded:

For the array operation thing: The things you get aren't arrays in the sense that they are subtypes of ARRAY, but they have accessors which are pretty much the same. So you can happily multiply them by arrays and so on: you just need to write the code in Lisp.

The allocation / freeing thing is a pain. What I've done in my experiments is (a) forbid copying of the objects, and (b) teach the GC to free the foreign memory when it sees one, if it's not already free and if the object 'belongs to Lisp'.

That needs the ability for the GC to do that, and of course adds overhead to GC. MY assumption is that since there will likely be few matrices (or few being GCd) the overhead will be pretty small but i haven't measured it. It also depends on the GC being able to do that: I can't find any mention that the CCL one has this feature.

And a final addition by Matt Emerson:

ccl::make-gcable-record can do that.

https://ccl.clozure.com/docs/build/ccl.html#m_make-gcable-record

CL-USER> (ccl::make-gcable-record (array double-float 16))
#<A Foreign Pointer [gcable] (:* (ARRAY DOUBLE-FLOAT 16)) #x600001EA5980>

There’s also ccl:make-heap-ivector (which gives you both a foreign pointer and lisp reference to the same memory, which must be manually freed with dispose-heap-ivector).

The manual has a section called “allocating foreign data on the lisp heap” that talks about it in some detail.

https://ccl.clozure.com/docs/build/ccl.html#tutorial-allocating-foreign-data-on-the-lisp-heap

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Common Lisp matrix multiplication #1

Improve Common Lisp matrix multiplication #1

snunez1 commented Oct 8, 2024

Improve Common Lisp matrix multiplication #1

Improve Common Lisp matrix multiplication #1

Comments

snunez1 commented Oct 8, 2024