Upserting (Update & Insert) With Pandas

Problem

Those who have been working with pandas and wanted to insert DataFrame values into Relational Database (Postgres, MySQL, etc.), most likely faced the problem of conflicting rows. This conflict occurs when statement tries to insert values with duplicated primary key column. Relational Database offers solution to this with its ON CONFLICT DO UPDATE SET column=EXCLUDED.column command that updates the rows with newly inserted data, while maintaining uniqueness constraint of primary key.

Unfortunately, the pandas's to_sql(...) method doesn't haven this capability built in by default. One solution to this problem is to insert full DataFrame into a temporal table and then execute a SQL statement to insert all rows from this temporal table into main one with ON CONFLICT DO UPDATE SET ... command to update conflicting rows with newly inserted values. This, however, is quite verbose solution which requires writing an additional SQL script. Thus, in this short blog-post I will show and explain how we can insert and update rows in SQL table using a native pandas's to_sql(...) method.

Solution

For simplicity and better readability, I decided to present the solution with self explanatory code and comments in a single GitHub Gist file.

NOTE: This solution was tested just in Postgres, but it should not differ too much from other relational databases.

References