Mixing Pandas with Odoo

This article describes the use of XML-RPC API provided by Odoo, a well-known ERP system. Upgrading to version 11.0 is the occasion to update my python scripts to reduce considerably the number of requests. The improvements were done with the help of pandas, the famous data structures and data analysis library.

Photo credit: Panda_3956, Ken_from_MD
Photo credit: Panda_3956, Ken_from_MD

Introduction

In my spare time, I help a small association, Les Compagnons du CEP, a joint buying organization who buys French wines for its members directly from producers. Since 2014, I set up an Odoo instance to manage the quotations, purchase order and the members. Odoo fulfills all their needs and we are happy to use it daily.

In addition to the user-friendly Web interface, the following management tasks are performed within a custom Django website.

  • Calculate the price of wines based on the seller price, taxes, transport cost, VAT, Fost+.
  • Import wines to database. The list of sellable wines is updated two times a year in order to adapt to the seasonal dishes1. Of course, the vintages are changed each year depending of the wines. The association sells approximately 600 different wines (up to 10000 bottles a year). As a result, I wrote a massive import script. This program reads large Excel files containing the several parameters (name, vintage, seller name, description, etc) and it uses the XML-RPC API offered by Odoo to create the items in the PostgreSQL database. This task is the main subject if this article.
  • Generate the price list based on the newly added wines. A price list Excel file is uploaded by the user. The file is transferred on another machine with the help of Errol. A LaTeX document is generated with python from the Excel file and it is compiled to PDF with TeX Live. Afterward, the PDF is automatically copied on the Django website with Errol.

The Django website is successfully used since 4 years. Unfortunately, I observed slowdown in the process since the implementation of product update from the Excel file.

In order to prepare the upgrade to Odoo 11.0, I decided to update the XML-RPC calls in order to reduce their number and therefore accelerate the import process.

Some code to get your teeth into

Photo credit: Panda, Sue Cantan https://www.flickr.com/photos/suecan/4349221370/
Photo credit: Panda, Sue Cantan

To chose the more suitable strategy, I decided to compare the current code base to a new scenario.

Description of the current code

Each row of the Excel file correspond to a wine. The name, seller code2 and default code are searched in the Odoo database in order to avoid duplicates. If no collision is found with the older products, a product template is created and the supplier information are updated. Since 2014, version 8.0 is used in production with product variant support. Unfortunately, the variant requires to perform an additional search to update the price of the product. In version 11.0, we will get rid of the product variant.

The python code can be summarized as follows.

import xmlrpc.client as xmlrpclib
import pandas as pd
username = 'user'
pwd = 'password'
dbname= 'database_name'

sock_common = xmlrpclib.ServerProxy('http://localhost:8069/xmlrpc/common')
uid = sock_common.login(database, username, pwd)
sock = xmlrpclib.ServerProxy('http://localhost:8069/xmlrpc/object')

def research(default_code, supplier_code, wine_name):
    supplier_code = int(float(supplier_code))
    args_p = [('default_code', '=', default_code)]
    args_s = [('function', '=', supplier_code)]
    args_n = [('name', '=', wine_name)]
    ids_product = None
    n_supplier = None
    n_name = None
    try:
        ids_product = sock.execute(dbname, uid, pwd, 'product.product', 'search', args_p)
        # Rather than retrieve a possibly gigantic list of records and count them, search_count()
        # can be used to retrieve only the number of records matching the query.
        # It takes the same domain filter as search() and no other parameter.
        # https://www.odoo.com/documentation/8.0/api_integration.html
        n_supplier = sock.execute(dbname, uid, pwd, 'res.partner', 'search_count', args_s)
        n_name = sock.execute(dbname, uid, pwd, 'product.product', 'search_count', args_n)
        n_supplier = int(n_supplier)
        n_name = int(n_name)

    except AttributeError:
        start_session()
        return "e_initialization", ids_product, n_supplier, n_name
    if len(ids_product) == 0 and n_name == 0 and n_supplier != 0:
        return 'success', ids_product, n_supplier, n_name
    if len(ids_product) != 0 and n_name != 0:
        # Le code est déjà utilisé et un produit du même nom existe.
        return 'e_code_used_same_name', ids_product, n_supplier, n_name
    if len(ids_product) != 0:  # Le code est déjà utilisé
        return 'e_code_used', ids_product, n_supplier, n_name
    # if len(ids_product) == 0:
    if n_supplier == 0:  # Le fournisseur n existe pas
        return 'e_missing_seller', ids_product, n_supplier, n_name
    if n_name != 0:  # un produit du même nom mais pas le même code existe
        return 'e_code_used_different_name', ids_product, n_supplier, n_name

def import2odoo():
    route_warehouse0_mto = 1
    route_warehouse0_manufacture = 5
    [...]
    # iterate over all rows, read the cells and assign the wine parameters to variables
    # each row correspond to one wine
    for row in rows:
        seller_name = row[0]
        default_code = row[1]
        name = row[2]
        do_import = row[2]
        comment = row[3]
        name = row[4]
        default_code = row[5]
        standard_price = row[6]
        list_price = row[7]
        seller_code = row[8]
        res_search, ids_product, ids_supplier, ids_name = research(default_code, seller_code, name)
        if res_search == 'success' and do_import == "1":
            # Success, the wine may be added
            results_dict['added_wines'].append([default_code, name, standard_price, list_price, comment])
            product_template = {
                'name': name,
                'active': True,
                'state': 'sellable',
                'standard_price': standard_price,
                'list_price': list_price,
                'description': comment ,
                'purchase_ok': 1,
                'sale_ok': 1,
                'uom_id': 1,
                'uom_po_id': 1,
                'type': 'product',
                'cost_method': 'standard',
                'route_ids': [(6, 0, [route_warehouse0_mto, route_warehouse0_manufacture])]
            }
            # For each wine, a template must be created
            template_id = sock.execute(dbname, uid, pwd, 'product.template',
                                            'create',
                                            product_template)


            # Create the supplier information for the wine
            product_supplierinfo = {
                'name': name,
                'product_code': default_code,  # code chez le fournisseur
                'product_name': name,  # name chez le producteur
                'min_qty': 1,
                'delay': 300,
                'product_tmpl_id': template_id,
            }
            # Create the supplier information for the wine
            product_supplierinfo_id = sock.execute(dbname, uid, pwd,
                                                        'product.supplierinfo',
                                                        'create', product_supplierinfo)

            # The product id must be obtained to set the default code
            product_product = {
                'product_tmpl_id': template_id,
                'default_code': default_code,
            }
            args = [('name', '=', name), ]
            product_id = sock.execute(dbname, uid, pwd, 'product.product', 'search',
                                           args)
            result = sock.execute(dbname, uid, pwd, 'product.product', 'write',
                                       product_id,
                                       product_product)

        [...]
        # Here we take into account the exceptions and several cases: the wine is already present, the seller is missing etc.

Metrics

If N is the number of wines to add, x, the number of milliseconds to research an item and y the number of milliseconds to write an item into the database, the previous code is made of:

  • 3 researches per wine : 3N * x
  • templates creation: N y
  • supplierinfo creation: N y
  • research products: N x
  • product update: N y

Total: 4N x + 3N y

This code is not optimized. It runs slowly even if it achieves what we expect from it. Now, we can try to do enhance it.

The new version

As previously mentioned, the product variant suppression will remove one search but there is more room for improvements.

The API call for each row is the main bottleneck of the script. I decided to replace them by a search of all product at the beginning of the script. The list if wine is return in a list of dictionary. Fortunately, a list of dicts is the easiest object to convert to a Pandas Dataframe.

import pandas as pd

def dataframes_generator():

    # Use a search function with empty args to get all ids :
    # https://www.odoo.com/fr_FR/forum/aide-1/question/is-it-possible-to-retrieve-2-fields-of-all-entry-within-a-model-thru-xml-rpc-6886
    # 1st => search all ids 2nd => read the selected fields on the whole list of ids.
    # Only 2 rpc-xml requests :o)
    #1
    args_product = [('default_code', '!=', 'foo')]
    ids_product = sock.execute(dbname, uid, pwd, 'product.product', 'search', args_product)
    # 2
    fields_products = ['name', 'id', 'default_code']
    recordset_products = sock.execute(dbname, uid, pwd, 'product.product', 'read', ids_product, fields_products)
    args_seller = [('name', '!=', 'foo')]
    fields_seller = ['name', 'id', 'function']
    ids_sellers = sock.execute(dbname, uid, pwd, 'res.partner', 'search', args_seller)
    recordset_sellers = sock.execute(dbname, uid, pwd, 'res.partner', 'read', ids_sellers,
                                           fields_seller)
    df_product = pd.DataFrame(recordset_products)
    df_sellers = pd.DataFrame(recordset_sellers)

    return True, ids_product, ids_sellers

For now, the ids list is generated by searching all product with a dummy code ('foo' in the example). My trials to get rid of the args_product variable failed.

As the dataframes are generated, it is easy to search the product and sellers among their respective dataframes:

code_to_find = 'A18'
df_product[df_product['default_code'].str.contains(code_to_find)]
    default_code    id  name
0   A18/999         21  First wine
1   A18/999         22  Second wine

A small function search_string_df can be written to facilitate the future searches in dataframes. It returns the dataframe and its lenght:

def search_df(df=None, col_name='', search_item=None, search_int=None):
    #  if we search an int.
    if search_int:
        df_res = df.loc[df[col_name] == search_item]
    # if we search a string. We must ignore NaN values.
    else:
        df_res = df[df[col_name].str.contains(search_item, na=False)]
    n_res = len(df_res)
    return n_res, df_res

n, df = search_string_df(df=df_product, col_name='default_code', search_str='A18')

Final cut

When the research function is modified, the whole code can also be adapted:

def research(default_code, supplier_code, wine_name):

    # 1) search if default code is used?)
    # 2) search if suppliers is in the database
    # 3) search if the name is already used

    # Retrieve the dataframes. This example comes from a jupyter nootebook.
    # df_product and df_sellers are already defined.
    # In a real case, we should use class variables.

    n_code, df_code = search_df(df=df_product, col_name='default_code', search_item=default_code,
                                     search_int=False)
    n_supplier, df_suppliers = search_df(df=df_suppliers, col_name='function',
                                              search_item=supplier_code, search_int=False)
    n_name, df_name = search_df(df=df_product, col_name='name', search_item=wine_name, search_int=False)

    try:
        # ids_product = list(df_code['id'])
        ids_product = df_code['id'].tolist()
    except AttributeError:
        ids_product = []

    if n_code == 0 and n_name == 0 and n_supplier != 0:
        return 'success', ids_product, n_supplier, n_name
    if n_code != 0 and n_name != 0:
        # Another product uses the same name with another code
        return 'e_code_used_same_name', ids_product, n_supplier, n_name
    if n_code != 0:
        # the code is already used
        return 'e_code_used', ids_product, n_supplier, n_name
    if n_supplier == 0:
        # Cannot find the supplier
        return 'e_missing_seller', ids_product, n_supplier, n_name
    if n_name != 0:
        # A product with the same name and another code exists.
        return 'e_code_used_different_name', ids_product, n_supplier, n_name


def import2odoo():
    route_warehouse0_mto = 1
    route_warehouse0_manufacture = 5
    [...]
    # iterate over all rows, read the cells and assign the wine parameters to variables
    # each row correspond to one wine
    for row in rows:
        seller_name = row[0]
        default_code = row[1]
        name = row[2]
        do_import = row[2]
        comment = row[3]
        name = row[4]
        default_code = row[5]
        standard_price = row[6]
        list_price = row[7]
        seller_code = row[8]
        res_search, ids_product, n_supplier, n_name = research(default_code, seller_code, name)
        if res_search == 'success' and do_import == "1":

            product_template = {
                'name': name,
                'active': True,
                'standard_price': standard_price,
                'list_price': list_price,
                'description': comment ,
                'default_code': default_code,
                'purchase_ok': 1,
                'sale_ok': 1,
                'uom_id': 1,
                'uom_po_id': 1,
                'type': 'product',
                'cost_method': 'standard',
                'route_ids': [(6, 0, [route_warehouse0_mto, route_warehouse0_manufacture])]
            }
            # For each wine, a template must be created
            template_id = sock.execute(dbname, uid, pwd, 'product.template',
                                            'create',
                                            product_template)

            # Create the supplier information for the wine
            product_supplierinfo = {
                'name': name,
                'product_code': default_code,  # code chez le fournisseur
                'product_name': name,  # name chez le producteur
                'min_qty': 1,
                'delay': 300,
                'product_tmpl_id': template_id,
            }
            # Create the supplier information for the wine
            product_supplierinfo_id = sock.execute(dbname, uid, pwd,
                                                        'product.supplierinfo',
                                                        'create', product_supplierinfo)
            logging.info("Wine {} : {} has been added".format(default_code, name))

        [...]
        # Here we take into account the exceptions and several cases: the wine is already present, the seller is missing etc.

Metrics

If we follow the same conventions than before:

The new code is made of:

  • 3 researches: 3 * x
  • templates creation: N y
  • supplierinfo creation: N y

Total: 3 x + 2N y

This result is a huge improvement. The data search in Dataframe is really fast and therefore, the impact of the bottleneck is decreased.

Conclusions

After several tests, the new implementation seems solid. It is not as fast as I would have expected but I am working on it. Some helpful resources are available on the web.

The import campaign of spring is already finished but the new algorithm will be tested with the aim to be ready for the wines coming in autumn. Nevertheless, reducing the number of request from 3N to 3 where N is the number of wines can only have beneficial effects.

The next step will be to watch and analyse the databases requests. The option log-level=debug_rpc will probably be crucial.

The results of these investigations will be shared here.

Stay tuned !

Photo credit: Panda in China, George Lu https://www.flickr.com/photos/gzlu/7708872342/
Photo credit: Panda in China, George Lu

Links

  1. The covered regions are: Champagne, Alsace, Loire, Bourgogne, Beaujolais - Maconnais, Rhône, Provence, Languedoc
  2. The seller code is saved in the field 'function'

blogroll

social