Database Query Search Time

Hey There,

Let’s say we have a database table with over 50 million entries in it.

We want to return the top 1000 results for some query with this table.

SELECT TOP 1000 attempts to return the top 1000 results for whatever query parameters are set against the table.

The query may execute fast for common entries such as “diamonds”, “shoes”, etc. However, for uncommon entries it may have to traverse the entire table looking for results. When this occurs, the user is setting there waiting for results that may return much later or in a worse case, perhaps there are no results.

To keep the user from waiting forever, I tried passing a “timeout” parameter to the query. When the timeout is exceeded, an exception is thrown and no results are returned.

Is there anyway to setup a query timeout and instead of an exception being thrown, return the results if there are any for that execution timeout duration?

I am going to have a huge product directory and I need to make my searches more efficient and user friendly. Thanks for any advice.

Sincerely,
Travis Walters

Have you considered using a different technology for the index? Not sure how structured your data is but SOLR (Welcome to Apache Solr - Apache Solr) might be an option if you’re doing free-text searches.

Instead of trying to deal with the symptoms, i’d work on the causes of them. But before anyone can help you, we’ll need a bit more info. What dbms are you using? Whats your table structure like (all info if possible)? What is the exact query you’re running to get the results?
I’ve seen some programmers try to create these crazy complex systems to attempt to deal with such slow queries (including a timeout period that kills the query) and usually the slow down is caused by an inefficient/incomplete setup on the database end. I should clarify that this is “usually” the case, at 50 million records you may have to start looking at advanced database management methods like horizontal partitioning/database sharding (though i’d need to see your database structure to see if this is applicable).

Hey Guys,

I am using MS SQL Server 2005 as my DBMS.

You can see the structure for this table here:
http://www.green-watch.org/temp/ProductTableStructure.png

I created a view to run queries against. That structure is here:
http://www.green-watch.org/temp/ProductTableView.png

On the PRODUCT table, I only have the PRODUCT_ID as the primary index.

On the PRODUCT_GREENEST_VIEW table, I have the following indexes:
PRODUCT_PRIORITY_ID (clustered)
BUSINESS_ID (non-unique, non-clustered)
PRODUCT_ID (unique, non-clustered)

As far as the query goes, here is part of the code:

<cfif #timeoutOccurred# EQ 0>

  <cftry>

	<cfset innerSelectRows = #currentPage# * 10>

	<cfif #currentPage# EQ #maxPage#>
      <cfset selectRows = #MAX_PRODUCT# - (#innerSelectRows#-10)>
    <cfelse>
      <cfset selectRows = 10>
    </cfif>
        
    <cfquery datasource="#dsnName#" name="PRODUCT_LIST" timeout="#searchTimeout#">
    SELECT * FROM (
        SELECT TOP #selectRows# * FROM (
            SELECT DISTINCT TOP #innerSelectRows# #URL.TABLE_NAME#.PRODUCT_ID, #URL.TABLE_NAME#.PRODUCT_PRICE, #URL.TABLE_NAME#.PRODUCT_PRIORITY_ID
            FROM #URL.TABLE_NAME#
            
            LEFT JOIN ASSIGN_SUBCAT_PRODUCT ON ASSIGN_SUBCAT_PRODUCT.PRODUCT_ID = #URL.TABLE_NAME#.PRODUCT_ID
            LEFT JOIN PRODUCT_SUBCATEGORY ON PRODUCT_SUBCATEGORY.PRODUCT_SUBCATEGORY_ID = ASSIGN_SUBCAT_PRODUCT.PRODUCT_SUBCATEGORY_ID
            LEFT JOIN PRODUCT_CATEGORY ON PRODUCT_SUBCATEGORY.PRODUCT_CATEGORY_ID = PRODUCT_CATEGORY.PRODUCT_CATEGORY_ID        
            
            WHERE #URL.TABLE_NAME#.PRODUCT_VALIDATED = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="1">
    
            <cfif #URL.businessID# NEQ -1>
            AND #URL.TABLE_NAME#.BUSINESS_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.businessID#">
            </cfif>
    
            <cfif #URL.MIN_PRICE# NEQ -1>
            AND #URL.TABLE_NAME#.PRODUCT_PRICE >= <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.MIN_PRICE#">
            </cfif>
            
            <cfif #URL.MAX_PRICE# NEQ -1>
            AND #URL.TABLE_NAME#.PRODUCT_PRICE <= <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.MAX_PRICE#">
            </cfif>

			<cfif #URL.SEARCH_FOR# NEQ "">
            AND (CONTAINS(#URL.TABLE_NAME#.PRODUCT_NAME, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_DESCRIPTION, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_KEYWORD1, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_KEYWORD2, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_KEYWORD3, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">)
            OR CONTAINS(#URL.TABLE_NAME#.PRODUCT_KEYWORD4, <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#URL.SEARCH_FOR#">))
            </cfif>
    
            <cfif #URL.CAT_ID# NEQ -1>
            AND PRODUCT_CATEGORY.PRODUCT_CATEGORY_ID IN (<cfqueryparam value="#URL.CAT_ID#" list="true" cfsqltype="CF_SQL_NUMERIC">)
            </cfif>
        
            <cfif #URL.SUBCAT_ID# NEQ -1>
            AND PRODUCT_SUBCATEGORY.PRODUCT_SUBCATEGORY_ID IN (<cfqueryparam value="#URL.SUBCAT_ID#" list="true" cfsqltype="CF_SQL_NUMERIC">)
            </cfif>
            
            AND (#URL.TABLE_NAME#.PRODUCT_PHOTO IS NOT NULL OR #URL.TABLE_NAME#.PRODUCT_GOOGLE_IMAGE_URL IS NOT NULL)
        
            ORDER BY #URL.ORDER_BY2#) AS FOO
        ORDER BY #URL.ORDER_BY3#) AS BAR
    ORDER BY #URL.ORDER_BY2#
    </cfquery>
    
    <cfcatch>

      <cfset maxPage = 0>
      <cfset URL.END_ROW = 0>
      <cfset timeoutOccurred = 1>
    
    </cfcatch>
    
  </cftry>
  
</cfif>

There is code before this that gets the max rows, sets the correct variables, checks input against sql injection, etc.

I am using a full-text index for the SEARCH_FOR field.

I am not familiar with SOLR. I am on a Windows Server currently.

Sincerely,
Travis Walters

SOLR is an XML REST interface into the lucene full-text indexer. It’s all on JAVA and should work fine on Windows. While you can use the full-text capabilities of a RDBMS I’ve generally found it a LOT faster to actually use a product designed for full-text indexing and searching.

We have an internal tool that runs on a postgres back-end (a pretty big one) that used to do it’s own full-text searching as well and it had a lot of the kinds of issues you’re talking about (in addition to just being generally slow even when it was successful). We moved the full-text searching to SOLR/Lucene and things got a LOT better.

Thanks,

-Pat

+1 for SOLR/Lucene

Usually I cant stand Java stuff… but i make an exception for SOLR and Lucene. the “Related stories” feature on my site is powered by SOLR using Lucene’s MoreLikeThis functionality.

Thanks once again for the advice Patrick.

Selling products should be one of my most profitable features on my website so I want to make sure everything is super fast.

I just wish everything was done. It feels like I have been working on my website forever :slight_smile:

I will look into the SOLR tomorrow. Think I am going to go enjoy the 4th. Have fun everyone![hr]
I did a quick search on SOLR with Coldfusion.

“Search engine. ColdFusion 9 now includes Apache Solr. Verity is still included, but so is Solr, which is actually even faster than Verity. So you may choose to upgrade to Solr! They even provide a Verity-to-Solr migration utility.”

I have never tried Verity either.

I know my server has coldfusion 8 installed on it so I guess I will have to figure out when our hosting provider plans to upgrade.

I may have found a way to get SOLR working with Coldfusion versions 7 and 8.

http://solcoldfusion.riaforge.org/
http://svn.riaforge.org/solcoldfusion/trunk/org/apache/solr/client/SolColdfusion.cfc

I have not looked at their library yet but since it interacts with the database, that is probably a good idea.

I am still hoping the hosting provider can upgrade us to coldfusion 9 because of the built-in interaction with SOLR that I have read about.

Also looks like coldfusion 9 has better cache control as well.

Looks like the hosting provider can upgrade us to coldfusion 9.

Can I still use MS SQL Server to store my data or does it all have to be transferred to SOLR some how?

Probably an easy question for some one that has used it before :slight_smile:

Not sure about the coldfusion integration but in general you would keep the data in your existing database and SOLR would just be a full-text index for searching against.

Hey Guys,

I am having a bit of trouble with SOLR. Here is my code:

<cfsetting requestTimeOut = "6000">

<cfcollection action="list" name="COLLECTION_LIST" engine="solr"> 

<cfquery name="PRODUCT_COLLECTION_CHECK" dbtype="query"> 
SELECT * FROM COLLECTION_LIST  
WHERE COLLECTION_LIST.NAME = <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="product_collection"> 
</cfquery>  

<cfif PRODUCT_COLLECTION_CHECK.RecordCount NEQ 0> 

  <p>Collection Exists</p>
  
  <cfquery datasource="#dsnName#" name="PRODUCT_LIST">
  SELECT TOP 100 PRODUCT.PRODUCT_ID, PRODUCT.PRODUCT_PRIORITY_ID, PRODUCT.BUSINESS_ID, PRODUCT.PRODUCT_NAME, 
  PRODUCT.PRODUCT_PRICE, PRODUCT.PRODUCT_DESCRIPTION, PRODUCT.PRODUCT_PHOTO, PRODUCT.PRODUCT_KEYWORD1, 
  PRODUCT.PRODUCT_KEYWORD2, PRODUCT.PRODUCT_KEYWORD3, PRODUCT.PRODUCT_KEYWORD4, PRODUCT.PRODUCT_GOOGLE_IMAGE_URL, 
  PRODUCT.PRODUCT_VALIDATED
  FROM PRODUCT
  WHERE PRODUCT.PRODUCT_INDEXED = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="0"> 
  </cfquery>
  
  <cfset myColumns = 'PRODUCT_NAME,PRODUCT_DESCRIPTION,PRODUCT_KEYWORD1,PRODUCT_KEYWORD2,PRODUCT_KEYWORD3,PRODUCT_KEYWORD4'>
  
  <cfindex action="update" collection="product_collection" query="PRODUCT_LIST" type="custom" key="PRODUCT_ID"
  title="PRODUCT_NAME" body="#myColumns#" custom1="BUSINESS_ID" custom2="PRODUCT_PRICE" custom3="PRODUCT_PRIORITY_ID" 
  custom4="PRODUCT_VALIDATED">

  <cfsearch collection="product_collection" criteria="custom2: 0 TO 350" name="results" status="r" suggestions="always" 
  contextPassages="0">

  <cfdump var="#results#">

  <cfsearch collection="product_collection" criteria="" name="results" status="r" suggestions="always" contextPassages="0">

  <cfdump var="#results#">

<cfelse>

  <p>Collection Does Not Exist</p>

</cfif>

Here is my current testing page for it:
http://www.green-watch.org/test.cfm

How do I get it to return products where the product price is between 0 and 350 dollars? I am trying this in the code: criteria=“custom2: 0 TO 350” - but it is not working.

Thanks in advance for any suggestions.

Sincerely,
Travis Walters

I dont know anything abt cold fusion, so cant be much of a help. but in your code above I see SQL queries. are you sure you are running them by solr?

If you can PM me the url of the solr instalation, i can take a look and figure out the paramaters needed for this…

Hey Sajal,

I am positive that the “product_collection” is a SOLR collection. I verified this by looking in the coldfusion administrator. I am not sure what URL you would need? Our hosting company installed coldfusion 9 on the server and SOLR comes with it.

The “PRODUCT_COLLECTION_CHECK” cfquery at the beginning of the code checks to make sure a SOLR collection exists. The “PRODUCT_LIST” cfquery grabs the top 100 products in the database to populate the SOLR index through cfindex. Finally, the cfsearch call searches that index via SOLR.

I know I have to be close to having this right and it is probably something really simple that is needed to get it working correctly.

Sincerely,
Travis Walters

hmm… SOLR is usually an independent web service and applications connect to it using REST…

check out the docs for filter query : http://wiki.apache.org/solr/CommonQueryParameters#fq

Try changing

<cfsearch collection="product_collection" criteria="custom2: 0 TO 350" name="results" status="r" suggestions="always" contextPassages="0">

to

<cfsearch collection="product_collection" criteria="custom2:[0 TO 350]" name="results" status="r" suggestions="always" contextPassages="0">

Hey There,

I tried changing that value to what you said, but it is returning 69 out of the 100 results. It still has results over 350 dollars though as you can see here:

http://www.green-watch.org/test.cfm

Any other suggestions are greatly appreciated. Thanks in advance :slight_smile:

Sincerely,
Travis Walters

Ask the hosting company on how to access SOLR directly…

It should look something like this…

Then play in it… see the records in it… maybe the values ur setting are being stored as string?

I think this is a question for a programing site like http://stackoverflow.com/ or similar…

Hey Sajal,

I ended up finding the location of that utility on my own through the coldfusion administrator.

I found some code on the internet as well.

<cfif isdefined("url.create")>
<cfcollection action="create" categories="yes" collection="sitecontent2" engine="solr" path="#expandpath('./')#">
Collection created OK.
</cfif>
 
<cfif isdefined("url.index")>
<cfset mq = queryNew("id,camera,cameratype,price","integer,varchar,varchar,varchar")>
<cfset tmp = queryAddRow(mq,5)>
 
<cfset tmp = querySetCell(mq,"id",1,1)>
<cfset tmp = querySetCell(mq,"id",2,2)>
<cfset tmp = querySetCell(mq,"id",3,3)>
<cfset tmp = querySetCell(mq,"id",4,4)>
<cfset tmp = querySetCell(mq,"id",5,5)>
 
<cfset tmp = querySetCell(mq,"camera","Olympus 500",1)>
<cfset tmp = querySetCell(mq,"camera","Canon D30",2)>
<cfset tmp = querySetCell(mq,"camera","Konica Minolta Dimage A2",3)>
<cfset tmp = querySetCell(mq,"camera","Sony Cybershot 300X",4)>
<cfset tmp = querySetCell(mq,"camera","Sony Cybershot 50D2",5)>
 
<cfset tmp = querySetCell(mq,"price","1499",1)>
<cfset tmp = querySetCell(mq,"price","1500",2)>
<cfset tmp = querySetCell(mq,"price","700",3)>
<cfset tmp = querySetCell(mq,"price","200",4)>
<cfset tmp = querySetCell(mq,"price","1400",5)>
 
<cfset tmp = querySetCell(mq,"cameratype","Professional",1)>
<cfset tmp = querySetCell(mq,"cameratype","Pro",2)>
<cfset tmp = querySetCell(mq,"cameratype","SLR",3)>
<cfset tmp = querySetCell(mq,"cameratype","SLR",4)>
<cfset tmp = querySetCell(mq,"cameratype","Pocket",5)>
 
Query to be indexed:
<cfdump var="#mq#">
 
Indexing...
<cfindex action="update" collection="sitecontent2" body="camera,cameratype,price" key="id" type="custom" query="mq" title="camera" custom1="price" custom2="cameratype">
OK<BR>
</cfif>
 
<cfif isdefined("url.search")>
    <cfsearch collection="sitecontent2" name="mysearch1" criteria="custom1:700 TO 1500">
    Search 1:
    <cfdump var="#mysearch1#">
    <cfsearch collection="sitecontent2" name="mysearch2" criteria="custom2:pro">
    Search 2:
    <cfdump var="#mysearch2#">
    <cfsearch collection="sitecontent2" name="mysearch3" criteria="custom1:700 TO 1500 AND custom2:pro">
    Search 3:
    <cfdump var="#mysearch3#">
</cfif>

You can see how the “key” is set to an “integer” here. However, under that SOLR utility, it says “Field Type: STRING”. For custom1, it says the field type is TEXT. I am not sure if there is any way to change a field to have an integer type?

I found it strange how the code example’s price column was set to varchar instead of integer.

I noticed “custom1:700 TO 1500” will return keys 3 and 2. However, “custom1:700 TO 1499” will return keys 3 and 1. Obviously the 1499 key should be returned when the range is 700 to 1500 but it’s not. Why is this?

Thanks for anymore information. I would really love to figure this out so I can finish my product section.

By the way, I am willing to pay for assistance with this:
http://www.vworker.com/RentACoder/misc/BidRequests/ShowBidRequest.asp?lngBidRequestId=1474266

Sincerely,
Travis Walters

Hey There,

Many somebody will know these answers to these questions:

  1. When I am indexing a collection of data for SOLR, for the product price field, should I input price ranges instead of the actual product price? For example, if a product costs 393.54, the product price could be [100_400][100_500][100_600][200_400][200_500] etc. Then when I use a product search I could look for the range within that list of strings. The problem I see with this is there are so many different combinations of price ranges that the collection may get bloated?

  2. Look at this solr query string:

http://localhost:8983/solr/sitecontent2/select/?q=custom1:700%20TO%201500&version=2.2&start=0&rows=10&indent=on

If I have a product priced 1499, why is that product not returned in the results?
http://www.green-watch.org/test2.cfm?index=yes (thats whats in this index)

Thanks for any suggestions.

Sincerely,
Travis Walters

Hey Guys,

I figured out how to do price ranges.

After I finish up the script, I will write a little tutorial on here to help other people out that come across this thread.

Sincerely,
Travis Walters

Awesome. How does the end result look for performance compared to when you started the thread?

I’m in the process of setting up a performance knowledgebase here: http://www.webperformancecentral.com/wiki/Main_Page so if you want to document it there and link to it from here that would be great (or I can clone it over there if you’d like). There are a lot of gems of knowledge that are starting to turn up as people work through issues and I thought it would be good to have an actual home for them.

Thanks,

-Pat