Well, on my G4 powerbook the C version of the brute-force solver took about 40 seconds on the second puzzle.
My Python solver improved in performance more than 50x through optimisation, but didn't come close to solving the second puzzle due to obscene memory usage - it didn't get more than two levels deep.
All I can recommend is:
1. To further improve your current solution, minimise memory usage as much as you can.
2. Think about radically different algorithms for solving the problem - there is one which will solve it in seconds, even in Python.
3. If all else fails, do what I did and write a highly optimised version in a different language.