A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 8 Issue 8
Aug.  2021

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 6.171, Top 11% (SCI Q1)
    CiteScore: 11.2, Top 5% (Q1)
    Google Scholar h5-index: 51, TOP 8
Turn off MathJax
Article Contents
Akshay Agrawal, Shane Barratt and Stephen Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021. doi: 10.1109/JAS.2021.1004075
Citation: Akshay Agrawal, Shane Barratt and Stephen Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021. doi: 10.1109/JAS.2021.1004075

Learning Convex Optimization Models

doi: 10.1109/JAS.2021.1004075
More Information
  • A convex optimization model predicts an output from an input by solving a convex optimization problem. The class of convex optimization models is large, and includes as special cases many well-known models like linear and logistic regression. We propose a heuristic for learning the parameters in a convex optimization model given a dataset of input-output pairs, using recently developed methods for differentiating the solution of a convex optimization problem with respect to its parameters. We describe three general classes of convex optimization models, maximum a posteriori (MAP) models, utility maximization models, and agent models, and present a numerical experiment for each.

     

  • loading
  • 1 redacted
  • [1]
    A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in Proc. 33rd Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 9558–9570.
    [2]
    A. Agrawal, S. Barratt, S. Boyd, E. Busseti, and W. Moursi, “Differentiating through a cone program,” J. Appl. Numer. Optim., vol. 1, no. 2, pp. 107–115, Aug. 2019.
    [3]
    B. Amos, “Differentiable optimization-based modeling for machine learning,” Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, USA, 2019.
    [4]
    E. Busseti, W. M. Moursi, and S. Boyd, “Solution refinement at regular points of conic problems,” Comput. Optim. Appl., vol. 74, no. 3, pp. 627–643, Aug. 2019. doi: 10.1007/s10589-019-00122-9
    [5]
    S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, USA: Cambridge University Press, 2004.
    [6]
    G. BakIr, T. Hofmann, B. Schölkopf, A. J. Smola, B. Taskar, and S. V. N. V. Vishwanathan, Predicting Structured Data. Cambridge, USA: MIT Press, 2007.
    [7]
    Y. LeCun, S. Chopra, R. Hadsell, M. A. Ranzato, and F. J. Huang, “A tutorial on energy-based learning,” in Predicting Structured Data, G. Bakir, T. Hofman, B. Scholkopf, A. Smola, and B. Taskar, Eds. Cambridge, USA: MIT Press, 2006.
    [8]
    B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning structured prediction models: A large margin approach,” in Proc. 22nd Int. Conf. Machine Learning, Bonn, Germany, 2005, pp. 896–903.
    [9]
    B. Taskar, C. Guestrin, and D. Koller, “Max-margin Markov networks,” in Proc. 16th Int. Conf. Neural Information Processing Systems, Vancouver and Whistler, British Columbia, Canada, 2004, pp. 25–32.
    [10]
    I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” J. Mach. Learn. Res., vol. 6, pp. 1453–1484, Dec. 2005.
    [11]
    S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 539–546.
    [12]
    D. Belanger and A. McCallum, “Structured prediction energy networks,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 983–992.
    [13]
    D. Belanger, B. S. Yang, and A. McCallum, “End-to-end learning for structured prediction energy networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 429–439.
    [14]
    B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 146–155.
    [15]
    J. Peng, L. F. Bo, and J. B. Xu, “Conditional neural fields,” in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1419–1427.
    [16]
    S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Z. Su, D. L. Du, C. Huang, and P. H. S. Torr, “Conditional random fields as recurrent neural networks,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1529–1537.
    [17]
    L. C. Chen, A. G. Schwing, A. L. Yuille, and R. Urtasun, “Learning deep structured models,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1785–1794.
    [18]
    Z. L. Geng, D. Johnson, and R. Fedkiw, “Coercing machine learning to output physically accurate results,” J. Comput. Phys., vol. 406, Article No. 109099, Apr. 2020. doi: 10.1016/j.jcp.2019.109099
    [19]
    R. K. Ahuja and J. B. Orlin, “Inverse optimization,” Oper. Res., vol. 49, no. 5, pp. 771–783, Oct. 2001. doi: 10.1287/opre.49.5.771.10607
    [20]
    C. Heuberger, “Inverse combinatorial optimization: A survey on problems, methods, and results,” J. Comb. Optim., vol. 8, no. 3, pp. 329–361, Sept. 2004. doi: 10.1023/B:JOCO.0000038914.26975.9b
    [21]
    V. Chatalbashev, “Inverse convex optimization,” M.S. thesis, Stanford Univ., Stanford, USA, 2005.
    [22]
    A. Keshavarz, Y. Wang, and S. Boyd, “Imputing a convex objective function,” in Proc. IEEE Int. Symp. Intelligent Control, Denver, USA, 2011, pp. 613–619.
    [23]
    B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 136–145.
    [24]
    A. V. Fiacco and G. P. McCormick, Nonlinear Programming—Sequential Unconstrained Minimization Techniques. New York, USA: John Wiley & Sons Ltd, 1990.
    [25]
    A. V. Fiacco, “Sensitivity analysis for nonlinear programming using penalty methods,” Mathem. Program., vol. 10, no. 1, pp. 287–311, Dec. 1976. doi: 10.1007/BF01580677
    [26]
    B. Amos, I. D. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable MPC for end-to-end planning and control,” in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 8299–8310.
    [27]
    F. de A. Belbute-Peres, K. Smith, K. R. Allen, J. B. Tenenbaum, and J. Z. Kolter, “End-to-end differentiable physics for learning and control,” in Proc. 32nd Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 7178–7189.
    [28]
    S. T. Barratt and S. P. Boyd, “Fitting a Kalman smoother to data,” in Proc. American Control Conf., Denver, USA, 2020.
    [29]
    A. Agrawal, S. Barratt, S. Boyd, and B. Stellato, “Learning convex optimization control policies,” in Proc. 2nd Annu. Conf. Learning for Dynamics and Control, 2020.
    [30]
    C. K. Ling, F. Fang, and J. Z. Kolter, “What game are we playing? end-to-end learning in normal and extensive form games,” in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018.
    [31]
    C. K. Ling, F. Fang, and J. Z. Kolter, “Large scale learning of agent rationality in two-player zero-sum games,” in Proc. AAAI Conf. Artificial Intelligence, Palo Alto, USA, 2019, pp. 6104–6111.
    [32]
    Q. Berthet, M. Blondel, O. Teboul, M. Cuturi, J. P. Vert, and F. Bach, “Learning with differentiable perturbed optimizers,” arXiv preprint arXiv: 2002.08676, Jun. 2020.
    [33]
    S. Barratt, G. Angeris, and S. Boyd, “Automatic repair of convex optimization problems,” Optim. Eng., vol. 22, pp. 247–259, 2021. doi: 10.1007/s11081-020-09508-9
    [34]
    B. Amos and D. Yarats, “The differentiable cross-entropy method,” arXiv preprint arXiv: 1909.12830, Aug. 2020.
    [35]
    S. Barratt and S. Boyd, “Least squares auto-tuning,” Eng. Optim., vol. 53, no. 5, pp. 789–810, May 2021. doi: 10.1080/0305215X.2020.1754406
    [36]
    S. Barratt and R. Sharma, “Optimizing for generalization in machine learning with cross-validation gradients,” arXiv preprint arXiv: 1805.07072, May 2018.
    [37]
    S. Diamond, V. Sitzmann, F. Heide, and G. Wetzstein, “Unrolled optimization with deep priors,” arXiv preprint arXiv: 1705.08041, Dec. 2018.
    [38]
    J. Domke, “Generic methods for optimization-based modeling,” in Proc. 15th Int. Conf. Artificial Intelligence and Statistics, La Palma, Canary Islands, 2012, pp. 318–326.
    [39]
    C. Finn, “Learning to learn with gradients,” Univ. California, Berkeley, USA, Tech. Rep. No. UCB/EECS-2018–105, 2018.
    [40]
    D. Maclaurin, D. Duvenaud, and R. P. Adams, “Gradient-based hyperparameter optimization through reversible learning,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 2113–2122.
    [41]
    A. Agrawal and S. Boyd, “Differentiating through log-log convex programs,” arXiv preprint arXiv: 2004.12553, May 2020.
    [42]
    J. Lorraine, P. Vicol, and D. Duvenaud, “Optimizing millions of hyperparameters by implicit differentiation,” arXiv preprint arXiv: 1911.02590, Nov. 2019.
    [43]
    N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evol. Comput., vol. 9, no. 2, pp. 159–195, Jun. 2001. doi: 10.1162/106365601750190398
    [44]
    J. Močkus, “On Bayesian methods for seeking the extremum,” in Proc. IFIP Technical Conf. Optimization Techniques, Novosibirsk, Russia, 1974, pp. 400–404.
    [45]
    F. J. Solis and R. J. B. Wets, “Minimization by random search techniques,” Math. Oper. Res., vol. 6, no. 1, pp. 19–30, Feb. 1981. doi: 10.1287/moor.6.1.19
    [46]
    A. Beck and M. Teboulle, “Gradient-based algorithms with applications to signal-recovery,” in Convex Optimization in Signal Processing and Communications, D. P. Palomar and Y. C. Eldar, Eds. Cambridge, USA: Cambridge University Press, 2009, pp. 42–88.
    [47]
    L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. COMPSTAT, Y. Lechevallier and G. Saporta, Eds. Heidelberg: Physica-Verlag, 2010, pp. 177–186.
    [48]
    J. Nocedal and S. J. Wright, Numerical Optimization. New York, USA: Springer, 2006.
    [49]
    C. M. Bishop, Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
    [50]
    I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, USA: MIT Press, 2016.
    [51]
    R. E. Barlow and H. D. Brunk, “The isotonic regression problem and its dual,” J. Am. Stat. Assoc., vol. 41, no. 5, pp. 140–147, Mar. 1972.
    [52]
    M. J. Best and N. Chakravarti, “Active set algorithms for isotonic regression; A unifying framework,” Math. Program., vol. 47, no. 1–3, pp. 425–439, May 1990. doi: 10.1007/BF01580873
    [53]
    T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Tokyo, Japan: Springer Science & Business Media, 2009.
    [54]
    S. J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, “ $\ell_1$ trend filtering” SIAM Rev., vol. 51, no. 2, pp. 339–360, May 2009. doi: 10.1137/070690274
    [55]
    D. Bertsekas, Dynamic Programming and Optimal Control, Vol. I. 4th ed. Belmont, USA: Athena Scientific, 2017.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(4)

    Article Metrics

    Article views (792) PDF downloads(73) Cited by()

    Highlights

    • Convex optimization models predict outputs by solving an optimization problem
    • We show how many models are in fact convex optimization models
    • We give a general technique for learning convex optimization models

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return