Está en la página 1de 7
 
Model-Free Probabilistic Movement Primitives for Physical Interaction
Alexandros Paraschos
1
, Elmar Rueckert
1
, Jan Peters
1
,
2
and Gerhard Neumann
1
 Abstract
Physical interaction in robotics is a complex prob-lem that requires not only accurate reproduction of the kine-matic trajectories but also of the forces and torques exhibitedduring the movement. We base our approach on MovementPrimitives (MP), as MPs provide a framework for modellingcomplex movements and introduce useful operations on themovements, such as generalization to novel situations, timescaling, and others. Usually, MPs are trained with imitationlearning, where an expert demonstrates the trajectories. How-ever, MPs used in physical interaction either require additionallearning approaches, e.g., reinforcement learning, or are basedon handcrafted solutions. Our goal is to learn and generatemovements for physical interaction that are learned with imi-tation learning, from a small set of demonstrated trajectories.The Probabilistic Movement Primitives (ProMPs) frameworkis a recent MP approach that introduces beneficial properties,such as combination and blending of MPs, and represents thecorrelations present in the movement. The ProMPs providesa variable stiffness controller that reproduces the movementbut it requires a dynamics model of the system. Learning sucha model is not a trivial task, and, therefore, we introduce themodel-free ProMPs, that are learning jointly the movement andthe necessary actions from a few demonstrations. We derivea variable stiffness controller analytically. We further extentthe ProMPs to include force and torque signals, necessary forphysical interaction. We evaluate our approach in simulatedand real robot tasks.
I. I
NTRODUCTION
Developing robots that can operate in the same environ-ment with humans and physically interacting with every-dayobjects requires accurate control of the contact forces thatoccur during the interaction. While non-compliant robots canachieve a great accuracy, the uncertainty of complex and less-structured environment prohibits physical interaction. In thispaper, we focus on providing a compliant control scheme thatcan enable robots to manipulate their environment withoutdamaging it. Typically, force-control requires an accuratedynamics model of the robot and its environment that is noteasy to obtain. Other approaches suggest to learn a dynamicsmodel, however, this process can be time-consuming andis prone to model-errors. We present an approach that can jointly learn the desired movement of the robot and thecontact forces by human demonstrations, without relying ona learned forward or inverse model.
*The research leading to these results has received funding from theEuropean Community’s Seventh Framework Programme (FP7/2007–2013)under grant agreements #600716 (CoDyCo) and #270327 (CompLACS)
1
Intelligent Autonomous Systems, TU Darmstadt, 64289Darmstadt, Germany
 {paraschos,neumann,rueckert}@ias.tu-darmstadt.de
2
Robot Learning Group, Max Planck Institute for Intelligent Systems,Germany
 mail@jan-peters.net
Fig. 1. The
 iCub
 robot is taught by imitation how to tilt a grate that weuse of during the experimental evaluation of our approach. We demonstratedhow to lift a grate from three different positions. Grasping from differentpositions change the dynamics of the task. Our method provides onlineadaptation and generalizes in the area of the grasps
Existing approaches for motor skill learning that are basedon movement primitives [1], [2], [3], [4], [5], often incorpo-rate into the movement primitive representation the forcesneeded for the physical interactions [6], [7], [8]. However,such approaches model a single successful reproduction of the task. Multiple demonstrations are typically averaged,despite that they actually represent similar, but different,solutions of the task. Thus, the applied contact forces arenot correlated with the state of the robot nor sensory valuesthat indicate the state of the environment, e.g. how heavy anobject is.In this paper, we propose learning the coordination of theinteraction forces, with the kinematic state of the system, aswell as the control actions needed to reproduce the movementexclusively from demonstration. Motor skill learning for suchinteraction tasks for high-dimensional redundant robots ischallenging. This task requires real-time feedback controllaws that process sensory data including joint encoders,tactile feedback and force-torque readings. We present amodel-free version of the Probabilistic Movement Primitives(ProMPs) [9] that enables robots to acquire complex motorskills from demonstrations, while it can coordinate the move-ment with force, torque, or tactile sensing. The ProMPs haveseveral beneficial properties, such us generalization to novelsituations, combination of primitives and time-scaling, whichwe inherit in our approach.ProMPs assume a locally linearizable dynamics models to
 
compute time-varying feedback control laws. However, suchdynamics models are hard to obtain for physical interactiontasks. Therefore we obtain a time varying feedback controllerdirectly from the demonstration without requiring such amodel. In the model-free extension of the ProMPs, wecondition the joint distribution over states and controls onthe current state of the system, and obtain a distributionover the controls. We show that this distribution representsa time-varying stochastic linear feedback controller. Due tothe time-varying feedback gains, the controller can exhibitbehavior with variable stiffness and, thus, it is safe to use inphysical interaction. A similar control approach has recentlybeen presented in [10].Our approach inherits many beneficial properties of theoriginal ProMP formulation. We can reproduce the variabilityin the demonstrations and use probabilistic operators forgeneralization to new tasks or the co-activation of learnedprimitives. The resulting feedback controller shows similarproperties as in the model-based ProMP approach, it canreproduce optimal behavior for stochastic systems and ex-actly follow the learned trajectory distribution, at least, if thereal system dynamics are approximately linear for each timestep. For non-linear systems, the estimated variable stiffnesscontroller can get unstable if the robot reaches configurationsthat are far away from the set of demonstrations. To avoid thisproblem, we smoothly switch between a stable PD-controllerand the ProMP controller if the support of the learneddistribution for the current situation is small. We showthat this extension allows us to track trajectory distributionsaccurately even for non-linear systems.The model-free ProMP approach is evaluated in simulationwith linear and non-linear dynamical systems in Section V.In a real task, we tilt a grate which is grasped at differentpositions. We show that the model-free ProMPs can gen-eralize to different grasping locations on a grate throughexploiting the correlations between motor commands andforce feedback.II. R
ELATED
 W
ORK
In this section, we review related work on movementprimitives for imitation learning that combine position andforce tracking, model the coupling between kinematics andforces and are able to capture the correlations between thesetwo quantities.The benefit of an additional feedback controller to track desired reference forces was demonstrated in grasping tasksin [6]. Individual dynamical systems (DMPs) [5] were trainedfor both, position and force profiles in imitation learning. Theforce feedback controller substantially improved the successrate of grasps in tracking demonstrated contact forces underchanging conditions. For manipulation tasks like opening adoor, the authors showed that the learned force profiles canbe further improved through reinforcement learning [7].For many tasks, such as like bi-manual manipulations, thefeedback controller needs to be coupled. Gams et al. [11]proposed
 cooperative
 dynamical systems, where deviationsfrom desired forces modulate the velocity forcing term in theDMPs for position control. This approach was tested on twoindependently operating robot arms solving cooperative taskslike lifting a stick [8]. Deviations in the sensed contact forcesin one robot were used to adapt the DMP of the other robotand the coupling parameters were obtained through iterativelearning control. A related probabilistic imitation learningapproach to capture the couplings in time was proposed in[12]. In this approach, Gaussian mixture models were used torepresent the variance of the demonstrations. The approachwas evaluated successfully on complex physical interactiontasks such as ironing, opening a door, or pushing against awall.Adapting Gaussian Mixture Models (GMMs) [13], [14],[15], [16] have been proposed for use in physical interactiontasks. The major difference to the dynamical systems ap-proach is that GMMs can represent the variance of the move-ment. Closely related to our approach, Evrard et al. in [17]used GMMs to learn joint distributions of positions andforces. Joint distributions capture the correlations betweenpositions and forces and were used to improve adaptationto perturbations in cooperative human robot tasks for objectlifting. In this approach, the control gains were fixed to track the mean of the demonstrated trajectories. In [18], it wasshown that by assuming known forward dynamics, variablestiffness control gains can be derived in closed form to matchthe demonstrations. We address here an important relatedquestion of how these gains can be learned in a model-freeapproach from the demonstrations.III. M
ODEL
-F
REE
 P
ROBABILISTIC
 M
OVEMENT
P
RIMITIVES
We propose a novel framework for robot control whichcan be employed in physical interaction scenarios. In ourapproach, we jointly learn the desired trajectory distributionof the robot’s joints or end-effectors and the correspondingcontrols signals. We train our approach from a limited set of demonstrations. We refer to the joint distribution as state-action distribution. Further, we incorporate proprioceptivesensing, such as force or tactile sensing, into our staterepresentation. The additional sensing capabilities are of highimportance for physical interaction as they can disambiguatekinetically similar states. We present our approach by, first,extending the Probabilistic Movement Primitives (ProMPs)framework [9] to encode the state-action distribution and,second, we derive a stochastic feedback controller withoutthe use of a given system dynamics model. Finally, weextend our control approach for states which are relativelyfar from the vicinity of the learned state-action distribution.In that case, our control approach can no longer producecorrecting actions, and an additional backup controller withhigh gains is needed. Our framework inherits most of thebeneficial properties introduced by the ProMPs that sig-nificantly improved generalization to novel situations andenables the generation of primitives that
 concurrently
 solvemultiple tasks [9].
 
 A. Encoding the Time-Varying State-Action Distribution of the Movement 
We avoid explicitly learning robot and environment mod-els by learning directly the appropriate control inputs, whilekeeping the beneficial properties of the ProMP approach,such us generalization and concurrent execution.In order to simplify the illustration of our approach, wefirst discuss the special case of a single Degree of Freedom(DoF) and, subsequently, we expand our description to thegeneric case of multiple DoF. The description is based in [9],but modified appropriately to clarify how the actions can bemodelled. First, we define the extended state of the systemas
y
t
 = [
t
,
 ˙
t
,u
t
]
,
 (1)where
 
t
 is the position of the joint,
 ˙
t
 the velocity, and
 u
t
the control applied at time-step
 t
. Similar to ProMPs, we usea linear basis function model to encode the trajectory of theextended state
 y
t
. The feature matrix and the weight vectorof the non-linear function approximation model become
y
t
 =
t
˙
t
u
t
 = ˜
Φ
t
w
,
 ˜
Φ
t
 =
φ
t
 0
˙
φ
t
 00
 ψ
t
,
w
 =
w
q
w
u
,
 (2)where the vectors
φ
t
 and
ψ
t
 represent the feature vectors forthe position
 q 
t
 and the control
 u
t
 respectively. The derivativeof the position feature vector
 ˙
φ
t
 is used to compute thevelocity of the joint
 ˙
t
. The weight vector
 w
 contains theweight vector for the position
 w
q
 and the weight vectorfor the control
 w
u
. The dimensionality of the feature
 φ
t
and weight
 w
q
 vectors is
 
 ×
 1
, where
 
 is the numberof features used to encode the joint position. Similarly,the dimensionality of 
 ψ
t
 and
 w
u
 vectors is
 
 ×
 1
. Theremaining entries of 
 ˜
Φ
t
, denoted by
0
, are zero-matrices withthe appropriate dimensionality. In our approach, we distinctbetween the features used to encode the position from thefeatures used to encode the control signal due to the differentproperties of the two signals. The distinction allows us to useof different type of basis functions, different parameters, ora different number of basis functions.We extend our description to the multidimensional case.First, we extend the state of the system from Equation (2) to
y
t
 =
q
t
 ,
 ˙
q
t
 ,
u
t
,
 (3)where the vector
 q
t
 is a concatenation of the positions of all joints of the robot, the vector
 ˙
q
t
 of the velocities of the joints, and
u
t
 of the controls respectively. The feature matrix
˜
Φ
t
 now becomes a block matrix
˜
Φ
t
 =
Φ
t
 ,
 ˙
Φ
t
 ,
Ψ
t
,
 (4)where
Φ
t
 =
φ
t
 ···
 0
.........
 0
0
 ···
 φ
t
,
 (5)
Ψ
t
 =
ψ
t
 ···
 0
0
 .........
0
 ···
 ψ
t
,
 (6)define the features for the joint positions and the jointcontrols. Similarly to the single DoF, the features used forthe joint velocities
 ˙
Φ
t
 are the time derivatives of the featuresof the joint positions
Φ
t
. We use the same features for everyDoF. The dimensionality of the feature matrices
 Φ
t
 and
 Ψ
t
is
 K 
 ×
 ·
(
 +
)
, where
 K 
 denotes the number of DoF.The weight vector
w
has a similar structure to Equation (2) and, for the multi-DoF case, is given by
w
 =
 1
w
q
 ,
···
 ,
w
q
    
weights for joint positions
,
 1
w
u
,
···
 ,
w
u
    
weights for joint controls
,
 (7)where
 i
w
 denotes the weight vector for joint
 i
 ∈
 [1
,
]
.The probability of a single trajectory
 τ 
 =
 {
y
t
,t
 
[1
···
]
}
, composed from states of 
 T 
 subsequent time steps,given the parameters
 w
, is computed by
 p
(
τ 
|
w
) =
t
 N 
 (
y
t
|
Φ
t
w
,
Σ
y
)
,
 (8)where we assume
 i.i.d.
 Gaussian observation noise with zeromean and
Σ
y
 covariance. Representing multiple trajectorieswould require a set of weights
 {
w
}
. Instead of explicitlymaintaining such a set, we introduce a distribution overthe weights
 p
(
w
;
θ
)
, where the parameter vector
 θ
 definesthe parameters of the distribution. Given the distributionparameters
 θ
, the probability of the trajectory becomes
 p
(
τ 
;
θ
) =
ˆ 
 p
(
τ 
|
w
)
 p
(
w
;
θ
)d
w
,
 (9)where we marginalize over the weights
 w
. As in theProMP approach, we use a Gaussian distribution to represent
 p
(
w
;
θ
)
, where
 θ
 =
 {
µ
w
,
Σ
w
}
. Using a Gaussian distribu-tion enables the marginal to be computed analytically andfacilitates learning. The distribution over the weight vector
 p
(
w
;
θ
)
 correlates (couples) the DoFs of the robot to theaction vector at every time-step
 t
. The probability of thecurrent state-action vector
 y
t
 given
 θ
 is computed by
 p
(
y
t
;
θ
) =
ˆ 
 
 (
y
t
|
Φ
t
w
,
Σ
y
)
 N 
 (
w
|
µ
w
,
Σ
w
)
d
w
=
 N 
y
t
Φ
t
µ
w
,
Φ
t
Σ
w
Φ
t
 +
Σ
y
,
 (10)in closed form. We use normalized Gaussian basis functionsas features. Each basis function is defined in the time domainby
φ
i
(
t
) =
 b
i
(
t
)
nj
=1
b
j
(
t
)
,
 (11)
b
i
(
t
) = exp
(
t
c
i
)
2
2
h
,
 (12)

Recompense su curiosidad

Todo lo que desea leer.
En cualquier momento. En cualquier lugar. Cualquier dispositivo.
Sin compromisos. Cancele cuando quiera.
576648e32a3d8b82ca71961b7a986505