Tag: Python

SDE root takes forever to load? Check Documentation field on GBD_Items table (SDE WORKSPACE METADATA).

THE PROBLEM
Have you ever experienced this? It takes forever for ArcCatalog to load the list of items from a SDE database. In my case, it could be up to 2 minutes.

THE BACKGROUND
We had a database migration a couple of years ago from Oracle to Sql Server. A new SDE was setup for data capture and desktop viewing. We copied about 1500 feature classes, most of them sat in feature dataset, and 30 raster datasets. About 10 percent of these feature classes were versioned. Also we had compress script running on a daily basis, analyze datasets and rebuild indexes running on a weekly basis for performance concerns. After some time, we discovered the SDE was running slower and slower. It froze Catalog for about 90 to 120 seconds every time when the workspace (SDE root) was selected. However when a feature class or a raster dataset was added to ArcMap, it performed well.

THE CAUSE
It took me long time trying to figure out what the issue was. Eventually I found the xml stored in Documentation field on GDB_Items table for Workspace (Type: C673FE0F-7280-404F-8532-20755DD8FC06) was extremely large (27 Mb). When investigated a little further, I found the xml in the documentation column keeps a geoprocessing history. Every time a geoprocessing tool is ran against the SDE workspace, it updates the xml in the documentation field and keeps the detail. So it is now clear, it inserts a big record including all selected feature classes when Analyze datasets or Rebuild indexes tool runs. Over the time, the system builds a huge metadata in the Documentation field. When the SDE workspace is selected, it has to download the items list as well as the metadata, which leads to the lag.

THE SOLUTION
It is easy to fix the issue when the cause is found.
Solution 1. Remove the workspace metadata. I use sql server so the example is for sql server.

UPDATE sde.GDB_ITEMS SET Documentation = NULL WHERE [Type] = 'C673FE0F-7280-404F-8532-20755DD8FC06';

Solution 2. Remove old geoprocessing history. So this code is for Sql Server as well.

import pymssql
import xml.etree.ElementTree as ET
import time
import datetime

def ClearSDEGeoprocessingHistory_administration(host, user, password, database, treshold_days=30):
    conn = pymssql.connect(host=host, user=user, password=password, database=database)
    try:
        cur = conn.cursor()
        cur.execute("select documentation from sde.GDB_Items where type = 'C673FE0F-7280-404F-8532-20755DD8FC06'")
        new_xml_string = None
        for row in cur:
            root = ET.fromstring(row[0])
            lineage = root.findall("./Esri/DataProperties/lineage")[0]
            for process in lineage.findall("Process"):
                date_process = time.strptime(process.attrib["Date"], "%Y%m%d")
                datediff = datetime.datetime.now() - datetime.datetime.fromtimestamp(time.mktime(date_process))
                if datediff.days > treshold_days:
                    lineage.remove(process)
            new_xml_string = ET.tostring(root, encoding="utf8", method="xml")
            break
        if new_xml_string != None:
            #remove xml declaration, replace single quote with two single quotes (escape in sql server query)
            new_xml_string = ("".join(new_xml_string.split("\n")[1:])).replace("'", "''")
            cur.execute("UPDATE sde.GDB_ITEMS SET Documentation = '{0}' WHERE [Type] = 'C673FE0F-7280-404F-8532-20755DD8FC06'".format(new_xml_string))
            #you have to commit to make it actually happen
            conn.commit()
    except Exception as ex:
        print ex
    conn.close()

host = "{SQL SERVER HOST}"
user = "{DB OWNER}"
password = "{DB OWNER PWD}"
database = "{DATABSE NAME}"
#set treshold, geoprocessing history old than this will be remvoed.
treshold_days = 30

ClearSDEGeoprocessingHistory_administration(host, user, password, database)

LESSONS LEARNTED
Esri GDB keeps a track of geoprocessing history in metadata on all types of items. Consider the metadata size if you have any performance issue but are not able to locate the cause.
Esri GDB also keeps a track of copy history.