A common question that we encounter is CRM's capability to scale to an enterprise level solution. Microsoft published a white paper on CRM 3.0 abilities to perform and scale to 6,000 concurrent users with 12.2 million records for a database size of 52GB. This test does show that CRM can scale, but what will happen when just a few tables need to grow to over a combined record count of 1 billion and a total size of over 1 terabyte? I had the pleasure of addressing this question in a recent proof-of-concept (POC).
Proof-of-concept goal
The objectives of this POC were:
- Load over a billion records into three core entities.
- Perform a basic test to see how the CRM application responded to standard operations (quick find, advanced find, etc).
- Stress CRM with 7,500 CRUD transactions per minute.
- Add a custom attribute to an entity with over 100 million records.
Loading the data
Before we could start our tests, we needed to load the data, and a lot of it. This was the goal:
- Contact - 170 million records
- Contract - 500 million records
- Customer Address - 340 million records
Note: For each contact record, two records are automatically created in the customer address entity.
If you have done a data migration or system integration before, you will already know that loading data into CRM can be a time consuming process especially when dealing with millions of records -- loading a billion records is a challenge all by itself. To aid in loading the data, we developed a multi-threaded tool able to generate, depending on hardware and network, roughly 4,000 to 8,000 records per minute using the standard CRM web service method calls. We ran three instances of this tool on different servers bringing the average rate of records created per hour to 800,000. This method was used for the contact entity, but to speed up the data load, we resorted to direct SQL inserts for the contract entity. This method is not supported, but at 800,000 records per hour, it would have taken roughly 26 days to load the contract data.
The final record counts were:
- Contact - 173,087,112
- Contract - 512,396,467
- Customer Address - 346,174,224
The total database size was roughly 1.2 terabytes.
Testing CRM
Our first test was to simply load the contact page with the full data load. Our first attempt timed out -- not a good sign. Digging into this a little bit, I found that the SQL query was taking too long to execute. It turned out that there was a missing index on the contactbase table. Under standard data volumes this index is not really needed, but with this record size it was exceptionally important. In this specific instance, the middle name attribute was being returned on the contact active view, but did not have an index on it. Once the index was added, the contact page loaded as normal and returned within a couple of seconds.
We continued to run various test such as:
- Using quick find and advanced find to find a contract, contact or address.
- Adding a contact, contract and contract item
- Updating an existing contact, contract and contract item
- Deleting a record
Our findings clearly indicate that CRM 4.0 is more than capable of managing at least 7,500 transactions per minute and multiple terabytes of data. Properly indexed, there was no discernable difference for entities that had relatively very little data and those that had hundreds of millions of records.
You can read more about the results of the full scale stress test, published by Microsoft, here.
Adding a custom attribute
In our final test, we wanted to see what would happen if we load an entity with large amounts of data and attempted to add a custom attribute. The results of this test revealed a potential issue.
When the data is loaded into an entity, if custom attribute hasn't been added, only the base table is populated. While this makes sense, it is important to consider when working with large data volumes. For the first custom attribute CRM adds the column to the extension table and then populates the extension table with all the records from the base table. The SQL alter table statement takes less than a second, but the population of the tables can take a couple of hours causing the web call to timeout. To get around this issue, we completed the following steps (note: this is unsupported):
- In SQL Server:
- In the base table of the entity, update the “IsCustomField” to “1” for a non-critical column.
- Alter the extension table adding the previously updated column (using SQL)
- Insert all the records from the base into the extension.
- Take the results of this procedure and execute them in the CRM database:
exec p_genSpecificViewAndTriggers @viewName=N'Contact'
- In CRM:
- Add the custom attribute like normal. Since CRM sees that there is a custom column already in the system, it will not try to insert the base records into the extension table.
- Return to SQL Server:
- Drop the custom added column in the extension table.
- Update the previously updated “IsCustomField” to “0”.
- Take the results of this procedure and execute them in the CRM database:
exec p_genSpecificViewAndTriggers @viewName=N'Contact'
While this is not an ideal approach (and not supported), we were able to get around the problem in the POC. A best practice and supported workaround, is simply to plan for and add any custom column prior to loading the data into the entities that will become very large. Note that the script above is only required if you do not create the dummy custom attribute first.
Testing conclusions
When working with VLDB's, I would normally use various techniques to increase overall performance such as horizontal / vertical partitioning and table filegroups. Although these items are not supported, we found that with some planning, just as with any db design, CRM can support multiple terabytes. Here are a few of the key points we learned:
- Have a solid index strategy. Understand the requirement for searches (quick find and advanced find) and column usage for forms and views.
- Plan for custom attributes. As discussed above, if you believe you are going to need a custom attribute, adding it early in the process can save a headache later on.
- Use index filegroups to help increase disk IO and to spread out the data. All CRM tables are on the primary filegroup, so using filegroups for the clustered indexes allows for better disk IO utilization increasing load times and query performance.
- Increasing the database file size during data loads helps decrease overhead of file growth.
Resources
This posting is provided "AS IS" with no warranties, and confers no rights.