IEEE/CAA Journal of Automatica Sinica
Citation:  Akshay Agrawal, Shane Barratt and Stephen Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 13551364, Aug. 2021. doi: 10.1109/JAS.2021.1004075 
[1] 
A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in Proc. 33rd Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 9558–9570.

[2] 
A. Agrawal, S. Barratt, S. Boyd, E. Busseti, and W. Moursi, “Differentiating through a cone program,” J. Appl. Numer. Optim., vol. 1, no. 2, pp. 107–115, Aug. 2019.

[3] 
B. Amos, “Differentiable optimizationbased modeling for machine learning,” Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, USA, 2019.

[4] 
E. Busseti, W. M. Moursi, and S. Boyd, “Solution refinement at regular points of conic problems,” Comput. Optim. Appl., vol. 74, no. 3, pp. 627–643, Aug. 2019. doi: 10.1007/s10589019001229

[5] 
S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, USA: Cambridge University Press, 2004.

[6] 
G. BakIr, T. Hofmann, B. Schölkopf, A. J. Smola, B. Taskar, and S. V. N. V. Vishwanathan, Predicting Structured Data. Cambridge, USA: MIT Press, 2007.

[7] 
Y. LeCun, S. Chopra, R. Hadsell, M. A. Ranzato, and F. J. Huang, “A tutorial on energybased learning,” in Predicting Structured Data, G. Bakir, T. Hofman, B. Scholkopf, A. Smola, and B. Taskar, Eds. Cambridge, USA: MIT Press, 2006.

[8] 
B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning structured prediction models: A large margin approach,” in Proc. 22nd Int. Conf. Machine Learning, Bonn, Germany, 2005, pp. 896–903.

[9] 
B. Taskar, C. Guestrin, and D. Koller, “Maxmargin Markov networks,” in Proc. 16th Int. Conf. Neural Information Processing Systems, Vancouver and Whistler, British Columbia, Canada, 2004, pp. 25–32.

[10] 
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” J. Mach. Learn. Res., vol. 6, pp. 1453–1484, Dec. 2005.

[11] 
S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 539–546.

[12] 
D. Belanger and A. McCallum, “Structured prediction energy networks,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 983–992.

[13] 
D. Belanger, B. S. Yang, and A. McCallum, “Endtoend learning for structured prediction energy networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 429–439.

[14] 
B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 146–155.

[15] 
J. Peng, L. F. Bo, and J. B. Xu, “Conditional neural fields,” in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1419–1427.

[16] 
S. Zheng, S. Jayasumana, B. RomeraParedes, V. Vineet, Z. Z. Su, D. L. Du, C. Huang, and P. H. S. Torr, “Conditional random fields as recurrent neural networks,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1529–1537.

[17] 
L. C. Chen, A. G. Schwing, A. L. Yuille, and R. Urtasun, “Learning deep structured models,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1785–1794.

[18] 
Z. L. Geng, D. Johnson, and R. Fedkiw, “Coercing machine learning to output physically accurate results,” J. Comput. Phys., vol. 406, Article No. 109099, Apr. 2020. doi: 10.1016/j.jcp.2019.109099

[19] 
R. K. Ahuja and J. B. Orlin, “Inverse optimization,” Oper. Res., vol. 49, no. 5, pp. 771–783, Oct. 2001. doi: 10.1287/opre.49.5.771.10607

[20] 
C. Heuberger, “Inverse combinatorial optimization: A survey on problems, methods, and results,” J. Comb. Optim., vol. 8, no. 3, pp. 329–361, Sept. 2004. doi: 10.1023/B:JOCO.0000038914.26975.9b

[21] 
V. Chatalbashev, “Inverse convex optimization,” M.S. thesis, Stanford Univ., Stanford, USA, 2005.

[22] 
A. Keshavarz, Y. Wang, and S. Boyd, “Imputing a convex objective function,” in Proc. IEEE Int. Symp. Intelligent Control, Denver, USA, 2011, pp. 613–619.

[23] 
B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 136–145.

[24] 
A. V. Fiacco and G. P. McCormick, Nonlinear Programming—Sequential Unconstrained Minimization Techniques. New York, USA: John Wiley & Sons Ltd, 1990.

[25] 
A. V. Fiacco, “Sensitivity analysis for nonlinear programming using penalty methods,” Mathem. Program., vol. 10, no. 1, pp. 287–311, Dec. 1976. doi: 10.1007/BF01580677

[26] 
B. Amos, I. D. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable MPC for endtoend planning and control,” in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 8299–8310.

[27] 
F. de A. BelbutePeres, K. Smith, K. R. Allen, J. B. Tenenbaum, and J. Z. Kolter, “Endtoend differentiable physics for learning and control,” in Proc. 32nd Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 7178–7189.

[28] 
S. T. Barratt and S. P. Boyd, “Fitting a Kalman smoother to data,” in Proc. American Control Conf., Denver, USA, 2020.

[29] 
A. Agrawal, S. Barratt, S. Boyd, and B. Stellato, “Learning convex optimization control policies,” in Proc. 2nd Annu. Conf. Learning for Dynamics and Control, 2020.

[30] 
C. K. Ling, F. Fang, and J. Z. Kolter, “What game are we playing? endtoend learning in normal and extensive form games,” in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018.

[31] 
C. K. Ling, F. Fang, and J. Z. Kolter, “Large scale learning of agent rationality in twoplayer zerosum games,” in Proc. AAAI Conf. Artificial Intelligence, Palo Alto, USA, 2019, pp. 6104–6111.

[32] 
Q. Berthet, M. Blondel, O. Teboul, M. Cuturi, J. P. Vert, and F. Bach, “Learning with differentiable perturbed optimizers,” arXiv preprint arXiv: 2002.08676, Jun. 2020.

[33] 
S. Barratt, G. Angeris, and S. Boyd, “Automatic repair of convex optimization problems,” Optim. Eng., vol. 22, pp. 247–259, 2021. doi: 10.1007/s11081020095089

[34] 
B. Amos and D. Yarats, “The differentiable crossentropy method,” arXiv preprint arXiv: 1909.12830, Aug. 2020.

[35] 
S. Barratt and S. Boyd, “Least squares autotuning,” Eng. Optim., vol. 53, no. 5, pp. 789–810, May 2021. doi: 10.1080/0305215X.2020.1754406

[36] 
S. Barratt and R. Sharma, “Optimizing for generalization in machine learning with crossvalidation gradients,” arXiv preprint arXiv: 1805.07072, May 2018.

[37] 
S. Diamond, V. Sitzmann, F. Heide, and G. Wetzstein, “Unrolled optimization with deep priors,” arXiv preprint arXiv: 1705.08041, Dec. 2018.

[38] 
J. Domke, “Generic methods for optimizationbased modeling,” in Proc. 15th Int. Conf. Artificial Intelligence and Statistics, La Palma, Canary Islands, 2012, pp. 318–326.

[39] 
C. Finn, “Learning to learn with gradients,” Univ. California, Berkeley, USA, Tech. Rep. No. UCB/EECS2018–105, 2018.

[40] 
D. Maclaurin, D. Duvenaud, and R. P. Adams, “Gradientbased hyperparameter optimization through reversible learning,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 2113–2122.

[41] 
A. Agrawal and S. Boyd, “Differentiating through loglog convex programs,” arXiv preprint arXiv: 2004.12553, May 2020.

[42] 
J. Lorraine, P. Vicol, and D. Duvenaud, “Optimizing millions of hyperparameters by implicit differentiation,” arXiv preprint arXiv: 1911.02590, Nov. 2019.

[43] 
N. Hansen and A. Ostermeier, “Completely derandomized selfadaptation in evolution strategies,” Evol. Comput., vol. 9, no. 2, pp. 159–195, Jun. 2001. doi: 10.1162/106365601750190398

[44] 
J. Močkus, “On Bayesian methods for seeking the extremum,” in Proc. IFIP Technical Conf. Optimization Techniques, Novosibirsk, Russia, 1974, pp. 400–404.

[45] 
F. J. Solis and R. J. B. Wets, “Minimization by random search techniques,” Math. Oper. Res., vol. 6, no. 1, pp. 19–30, Feb. 1981. doi: 10.1287/moor.6.1.19

[46] 
A. Beck and M. Teboulle, “Gradientbased algorithms with applications to signalrecovery,” in Convex Optimization in Signal Processing and Communications, D. P. Palomar and Y. C. Eldar, Eds. Cambridge, USA: Cambridge University Press, 2009, pp. 42–88.

[47] 
L. Bottou, “Largescale machine learning with stochastic gradient descent,” in Proc. COMPSTAT, Y. Lechevallier and G. Saporta, Eds. Heidelberg: PhysicaVerlag, 2010, pp. 177–186.

[48] 
J. Nocedal and S. J. Wright, Numerical Optimization. New York, USA: Springer, 2006.

[49] 
C. M. Bishop, Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.

[50] 
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, USA: MIT Press, 2016.

[51] 
R. E. Barlow and H. D. Brunk, “The isotonic regression problem and its dual,” J. Am. Stat. Assoc., vol. 41, no. 5, pp. 140–147, Mar. 1972.

[52] 
M. J. Best and N. Chakravarti, “Active set algorithms for isotonic regression; A unifying framework,” Math. Program., vol. 47, no. 1–3, pp. 425–439, May 1990. doi: 10.1007/BF01580873

[53] 
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Tokyo, Japan: Springer Science & Business Media, 2009.

[54] 
S. J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, “

[55] 
D. Bertsekas, Dynamic Programming and Optimal Control, Vol. I. 4th ed. Belmont, USA: Athena Scientific, 2017.
