C3PO is doing laps upon laps in our Robot Lab this afternoon in hopes of finding a walk faster than all previous ones.
The poor guy is trying really hard. We’ve experimented with policy-gradient algorithms, but now we’re running a hill-climber algorithm, which is essentially the easiest machine learning algorithm to digest and write.
Machine Learning works like this: you want to find an optimal solution to a problem, and you want the computer (or dog) to find it for you. Computers are very good at doing things over and over again, and so we can leverage this ability and have the computer try many variations of one solution until we find an optimal one.
Machine Learning algorithms guide these solutions and dictate how and when these variations occur. Flashy algorithms these days have flashy titles like Genetic Algorithms and Downhill Simplex Methods. But ours is much simpler.
Hill Climber algorithms start with a base solution and then vary this solution a certain amount of times, gauge each variation’s success, and then simply chose the best solution so far and repeat. For us, our ‘base solution’ is a walk — a set of inverse kinematic parameters — that we hope to optimize.
Right now C3PO is in lap 258 of possible 800. Come on, C3PO!!