ElasticSearch: Lessons on Migration from MSSQL
This post is a compilation of the (sometimes hard-learned) lessons we experienced while migrating from MSSQL to ElasticSearch in C# using the NEST client. The intent is to help others recognize potential pitfalls and traps during their migration.
Antares maintains a publicly searchable database of about 100 million records. It was originally developed using MSSQL and performed well early on. Business needs changed, as they tend to do, from a simple exists query, into searching on child records, returning the aggregate root and analyzing the matches. As the record count grew, it soon became apparent that things were going south. We followed the typical paths of adding indexes where needed, performance analysis of the queries, and throwing hardware at the problem. This worked up to a certain point.
We had been discussing ElasticSearch internally at this time and were somewhere between awe and disbelief. Our relational data was now more similar to a document store than SQL, and we felt this was a good case for a test. One weekend, after a few frustrating emails about query timeouts, I decided this was my new side project. I provisioned a CentOS7 VM on one of our hypervisors and followed the installation guide. After a few hours, I had migrated a few thousand records and successfully implemented our complex query using NEST. We have many nested compound AND (must) and OR (should). This was the easiest part. The initial effort was simple, but the path to production wasn’t smooth.
At the same time, another of our senior developers had independently started his own testing with Elastic and was equally impressed. The volume was in the few hundred thousand records, but there was a really high latency (7-20 seconds on average) due to a third party data store. Historical data didn’t change, and not every document had the same shape. Elastic appeared to be a good fit, and it was.
Below are some of our combined notes on two very different migrations from SQL to ElasticSearch.
- ElasticSearch is crazy fast! Seriously fast! On a single-node test instance running in a VM with a single vCPU and only 12GB of RAM, we were seeing single digit MS queries in some instances. Most were completing in sub-200 MS. I crafted the most difficult query I could imagine for our documents, and it still only took 1200 MS. That is a query that would typically take minutes on a SQL server with twice the resources. We didn’t even really optimize our index beyond the basic NEST automapping. With the full 100 million records indexed and active searching being performed, the VM was still registering over 8GB of free RAM. The developers over at Elastic are working with some powerful voodoo.
- Hadouken indenting. The tutorials warned me about this, and it is absolutely true! It may be easier, as was in our case, to split it up into sections of building the expressions then combining them afterward. Even then, it still looked very similar to this.
- Index names must be lowercase. This is stated in the documentation, but is easy to overlook. The tutorials also didn’t clearly mention this. We wasted a few hours on this one.
- Build a debug NEST client to catch and log any failed request. This would have helped us out with the index names issue if we had done this first. It is the reason I was able to correct most of the following issues relatively quickly. The error it produces should be easily searchable, and someone else most likely had the same issue.
- If the file system runs low on storage space, your indexes are automatically set to read-only with delete. Make sure your path has sufficient storage. The default storage location for ElasticSearch on CentOS is in the root LVM volume, which was given 50GB by default on our 128GB VHD. The remaining being allocated to the home volume. During migration, I was missing 300k records and struggled with this for a while. The time for this was mostly spent trying to expand a dynamic VHD in CentOS.
- You can’t just have a GUID on your class and expect it to work. The default indexing will remove the dashes (-) from the GUID. You can either remove the dashes (C#
ToString(“N”)) when storing and searching or mark the field as not_analyzed.
- If you need to search for stop words or special characters, it’s not exactly intuitive, but it is easy. Honestly, we wasted more time than I’d like to admit on this one. The answer was simple and right in our face, but we went down a number of wrong paths first. I can’t stress enough how important and useful unit tests are. We could validate our changes almost immediately as we bounced around trying to figure this one out. Specify your keyword normalizer on the property (attribute in our case
[Keyword(Normalizer = "MyLowercase")]) then add that normalizer during your create index lambda.
.Normalizers(n => n.Custom("MyLowercase", l => l.Filters("lowercase")))
- Sorting isn’t what you’re used to in SQL. This one only took a few minutes to figure out because of the debug client. Elastic uses something it calls a multifield. Our issue wasn’t quite the same. The subfield we sorted on was “keyword.” We appended that to our field name using the NEST extension methods.
return f.Field(x => x.Description.Suffix("keyword"));
- If you attempt to index then immediately search for a document, it probably won’t work. There’s a refresh interval between indexing and being available to search. You can force a refresh in-between calls and wait for the response. However, you should analyze the need for this functionality before implementing as it will increase the load on your elastic instance. We were writing unit tests. I would not suggest using it in production.
Now that we have overcome our learning curve for Elastic, we have many plans in the future to utilize this amazing tool.