Database Query Search Time

Hey Patrick,

I have only indexed a handful of products. I wanted to be sure I set the index up correctly before I indexed millions and millions of products. After I index all the products, and get the script up and running I will compare the performances and let everyone know. I have to remember though my old script was searching a database with 2.6 million products and now I have 5.6 million products and growing. I have my fingers crossed that all this work will be worth it. Will let you guys know when its all done :slight_smile:

Sincerely,
Travis Walters

Greetings,

At this time, I am going to elaborate a bit on how I setup the price facet. My solution was inspired by an article I found on the net:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr

I am taking a quote out of the article:

[b]If we request field faceting on the “price” field, we get back counts for individual prices. However, we want price ranges, not individual prices. One workaround is to index another field that contains the ranges that the prices fall into (for example 100_200, 200_300, 300_400) and use field faceting on that field. A more flexible solution is to utilize query facets that provide the ability to retrieve facet counts for arbitrary queries.

Let’s assume that we have an indexed “price” field and we want to get the facet counts for the following ranges of prices: $100 or less, $100-$200, $200-$300, $300-$400, $400-$500, and $500 or over. We simply add a facet.query command to our query request for each desired range[/b]

I want two drop down lists of prices on our product search page containing the following values for minimum and maximum price:
0,10,20,30,40,50,75,100,150,200,250,300,350,400,450,500,600,700,800,900,1000,1250,1500,1750,2000,3000,4000,5000,1000000000

Let’s say we have a product whose price is $549.99. With the custom2 field, I can setup price ranges that this product falls into:

02p 102p 202p 302p 402p 502p 752p 1002p 1502p 2002p 2502p 3002p 3502p 4002p 4502p 5002p p2600 p2700 p2800 p2900 p21000 p21250 p21500 p21750 p22000 p23000 p24000 p25000

The value “2” separates the price and p. If I want to search for products with prices between 300 and 750 dollars, I can set the criteria like this: criteria=“custom2:p2750 and custom2:3002p”

I tried different delimiters in place of the “2” above only to find that all results were returned to me instead of filtered results. I even tried to leave a delimiter out to save some space in the cfcollection. I am not sure if something was being cached or what but I did not retrieve the results I expected. However, when I leave the “2” in there, I get exactly what I want. I have tested this many times on a cfcollection of 100 products. I have the mindset on this case if it is not broke do not fix it or improve it for that matter.

I believe that this method is more difficult than it should be. I do not see why there can not be a way we can just store prices in a custom field and have some sort of range search within the coldfusion cfsearch wrapper. I am a bit disappointed with that.

My initials thoughts on SOLR so far is that it seems very touchy when using cfsearch. I find documentation hard to find when it comes to coldfusion and SOLR. Adobe has some basic tutorials but nothing advanced like price facets. I found that the skillset for coldfusion and SOLR development really hard to find.

I am not sure if anyone else has this problem. For cfsearch, if I try to set the “type” parameter to any value thats in the documentation, coldfusion throws an error. I am not sure if this is a problem with coldfusion in general or if an error occurred when updating my server. I intend to look into this more.

I am going to play around with categories / subcategories next. After that, I should be able to import all products into the cfcollection. Then I can see how fast SOLR is compared to just using CFQUERY.

I will release a bit of code as well to help people that come across this. Will do that after the script is completed.

Sincerely,
Travis Walters

Hello Again,

One thing about the SOLR searches is that there are only four custom fields. The Coldfusion documentation states that custom1 through custom4 do not work with the SOLR engine. However, that is not true.

I have more than four fields I want to do narrow down the results with. With custom4, I am setting this field to do boolean searches on different variables.

PRODUCT_VALIDATED = Has the product been validated by the admin staff?
PRODUCT_PHOTO = Does this product have a local image?
PRODUCT_GOOGLE_IMAGE_URL = Does this product have an external image?

All I need is one sequence of numbers for each boolean variable I want to do searches with. Because I have three boolean variables, I need three sequences of numbers.

The first number in the sequence will be 122 or 022. The second number in the sequence will be 212 or 202. The last number in the sequence will be 221 or 220.

To find products that have been validated and have photos on the local server I can set the criteria to this:

criteria=“custom4:122 AND custom4:212”

To find products that have been validated and have photos on an external server I can set the criteria to this:

criteria=“custom4:122 AND custom4:221”

More to come…

Sincerely,
Travis Walters

Hey Guys,

The following code uses cfcollection and cfindex to create and populate a SOLR collection of data. 10,000 products are added to the collection every 90 seconds. The script is called from a scheduled task setup using the coldfusion administrator.

<cfsilent>

  <cfset myOutput = '<p>START: #NOW()#</p>'>

  <cfsetting requestTimeOut = "6000">

  <cfset collectionName = 'products_id'>
  <cfset collectionPath = 'F:\collections\'>

  <cfcollection action="list" name="COLLECTION_LIST" engine="solr"> 

  <cfquery name="COLLECTION_CHECK" dbtype="query"> 
  SELECT * FROM COLLECTION_LIST  
  WHERE COLLECTION_LIST.NAME = <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#collectionName#"> 
  </cfquery>  

  <cfif #COLLECTION_CHECK.RecordCount# NEQ 0> 

    <cfset myOutput = '#myOutput#<p>Collection Exists</p>'>

    <cfquery datasource="#dsnName#" name="PRODUCT_LIST">
    SELECT TOP 10000 PRODUCT.PRODUCT_ID, PRODUCT.PRODUCT_PRIORITY_ID, PRODUCT.BUSINESS_ID, PRODUCT.PRODUCT_NAME, 
    PRODUCT.PRODUCT_PRICE, PRODUCT.PRODUCT_DESCRIPTION, PRODUCT.PRODUCT_PHOTO, PRODUCT.PRODUCT_KEYWORD1, 
    PRODUCT.PRODUCT_KEYWORD2, PRODUCT.PRODUCT_KEYWORD3, PRODUCT.PRODUCT_KEYWORD4, PRODUCT.PRODUCT_GOOGLE_IMAGE_URL,
    PRODUCT.PRODUCT_VALIDATED
    FROM PRODUCT
    WHERE PRODUCT.PRODUCT_INDEXED = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="0"> 
    </cfquery>
    
    <cfset myOutput = '#myOutput#<p>#PRODUCT_LIST.RecordCount# Product(s) Need Updated</p>'>
    
    <cfset priceRangeArray = ArrayNew(1)>
    <cfset productBoolArray = ArrayNew(1)>
    <cfset productSubcatIDArray = ArrayNew(1)>
  
    <cfif #PRODUCT_LIST.RecordCount# NEQ 0>

	  <cfset priceList = '0,10,20,30,40,50,75,100,150,200,250,300,350,400,450,500,600,700,800,900,1000'>
      <cfset priceList = '#priceList#,1250,1500,1750,2000,3000,4000,5000,1000000000'>
    
      <cfset priceArray = '#ListToArray(priceList)#'>
      <cfset loopLength = #ArrayLen(priceArray)# - 1>
  
      <cfloop index="i" from="1" to="#PRODUCT_LIST.RecordCount#">
            
        <cfset productGooglePhoto = '#PRODUCT_LIST.PRODUCT_GOOGLE_IMAGE_URL[i]#'>
        <cfset productValidated = '#PRODUCT_LIST.PRODUCT_VALIDATED[i]#'>
        <cfset productPhoto = '#PRODUCT_LIST.PRODUCT_PHOTO[i]#'>
        <cfset productPrice = '#PRODUCT_LIST.PRODUCT_PRICE[i]#'>  
        <cfset productID = '#PRODUCT_LIST.PRODUCT_ID[i]#'>

        <cfset priceRangeArray[i] = ''>      
        <cfset productBoolArray[i] = ''>
        <cfset productSubcatIDArray[i] = ''>
      
        <cfquery datasource="#dsnName#" name="SUBCAT_LIST">
        SELECT ASSIGN_SUBCAT_PRODUCT.PRODUCT_SUBCATEGORY_ID
        FROM ASSIGN_SUBCAT_PRODUCT
        WHERE ASSIGN_SUBCAT_PRODUCT.PRODUCT_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#productID#"> 
        </cfquery>      
        
        <cfif #SUBCAT_LIST.RecordCount# NEQ 0>
        
          <cfloop index="j" from="1" to="#SUBCAT_LIST.RecordCount#">
          
            <cfset subcatID = '#SUBCAT_LIST.PRODUCT_SUBCATEGORY_ID[j]#'>
          
            <cfif #ListLen(productSubcatIDArray[i])# NEQ 0>
              <cfset productSubcatIDArray[i] = '#productSubcatIDArray[i]#,#subcatID#'>
            <cfelse>
              <cfset productSubcatIDArray[i] = '#subcatID#'>
            </cfif>
          
          </cfloop>
        
        </cfif>
         
		<cfif #productValidated# EQ 1>        
          <cfset productBoolArray[i] = '122'>                
        <cfelse>
          <cfset productBoolArray[i] = '022'> 
		</cfif>

		<cfif #productGooglePhoto# NEQ "">        
          <cfset productBoolArray[i] = '#productBoolArray[i]# 212'>                
        <cfelse>
          <cfset productBoolArray[i] = '#productBoolArray[i]# 202'> 
		</cfif>

		<cfif #productPhoto# NEQ "">        
          <cfset productBoolArray[i] = '#productBoolArray[i]# 221'>                
        <cfelse>
          <cfset productBoolArray[i] = '#productBoolArray[i]# 220'> 
		</cfif>

        <cfloop index="j" from="1" to="#loopLength#">
        
          <cfset myPrice = '#priceArray[j]#'>
          <cfset loopFrom = #i#+j>
          
          <cfif #productPrice# GTE #myPrice#>

			<cfif #ListLen(priceRangeArray[i])# NEQ 0>
              <cfset priceRangeArray[i] = '#priceRangeArray[i]# #myPrice#2p'>
            <cfelse>
              <cfset priceRangeArray[i] = ' #myPrice#2p'>
            </cfif>
          
          </cfif>
          
          <cfif #productPrice# LTE #myPrice#>

			<cfif #ListLen(priceRangeArray[i])# NEQ 0>
              <cfset priceRangeArray[i] = '#priceRangeArray[i]# p2#myPrice#'>
            <cfelse>
              <cfset priceRangeArray[i] = ' p2#myPrice#'>
            </cfif>
          
          </cfif>          
                                     
        </cfloop>
                
      </cfloop>
    
    </cfif>
  
    <cfset nColumnNumber = QueryAddColumn(PRODUCT_LIST,"PRICE_RANGE","VarChar",priceRangeArray)>    
    <cfset nColumnNumber2 = QueryAddColumn(PRODUCT_LIST,"BOOL_VALUES","VarChar",productBoolArray)>  
    <cfset nColumnNumber3 = QueryAddColumn(PRODUCT_LIST,"PRODUCT_SUBCATEGORIES","VarChar",productSubcatIDArray)>  
    
    <cfset myColumns = 'PRODUCT_NAME,PRODUCT_DESCRIPTION,PRODUCT_KEYWORD1,PRODUCT_KEYWORD2'>
    <cfset myColumns = '#myColumns#,PRODUCT_KEYWORD3,PRODUCT_KEYWORD4'>  

    <cfindex action="update" collection="#collectionName#" query="PRODUCT_LIST" type="custom" key="PRODUCT_ID" 
    body="#myColumns#" title="PRODUCT_NAME" custom1="BUSINESS_ID" custom2="PRICE_RANGE" custom3="PRODUCT_PRIORITY_ID" 
    custom4="BOOL_VALUES" category="PRODUCT_SUBCATEGORIES">

    <cfif #PRODUCT_LIST.RecordCount# NEQ 0>
  
      <cfloop index="i" from="1" to="#PRODUCT_LIST.RecordCount#">
  
        <cfset productID = '#PRODUCT_LIST.PRODUCT_ID[i]#'>
      
        <cfquery datasource="#dsnName#" maxrows="1">
        UPDATE PRODUCT
        SET PRODUCT.PRODUCT_INDEXED = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="1">
        WHERE PRODUCT.PRODUCT_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#productID#">
        </cfquery>
    
      </cfloop>
  
    </cfif>
      
  <cfelse>

    <cfset myOutput = '#myOutput#<p>Collection Does Not Exist</p>'>
    
    <cfcollection action="create" categories="yes" collection="#collectionName#" engine="solr" path="#collectionPath#">
    
    <cfset myOutput = '#myOutput#<p>Collection Created</p>'>

  </cfif>

  <cfset myOutput = '#myOutput#<p>END: #NOW()#</p>'>

</cfsilent>

<cfoutput>#myOutput#</cfoutput>

I plan to add some code so that the script can not be called while one is already executing. Besides that, I think the script should be pretty solid. This should take about 19 hours to execute for about 5.6 million products. After that, I should be able to tell you how well the cfsearch performs in comparison with the cfquery search.

Sincerely,
Travis Walters

since you are adding 10,000 products in a batch, i recommend you optimize after each batch… UpdateXmlMessages - Solr - Apache Software Foundation

Hey Guys,

If anyone is working with MS SQL Server, I recommend creating a view like this:

SELECT     PRODUCT_ID, PRODUCT_INDEXED
FROM         dbo.PRODUCT
WHERE     (PRODUCT_INDEXED = 0)

For faster execution, the PRODUCT_LIST cfquery can be updated to this:

<cfquery datasource="#dsnName#" name="PRODUCT_LIST">
SELECT PRODUCT.PRODUCT_ID, PRODUCT.PRODUCT_PRIORITY_ID, PRODUCT.BUSINESS_ID, PRODUCT.PRODUCT_NAME, PRODUCT.PRODUCT_PRICE, 
PRODUCT.PRODUCT_DESCRIPTION, PRODUCT.PRODUCT_PHOTO, PRODUCT.PRODUCT_KEYWORD1, PRODUCT.PRODUCT_KEYWORD2, PRODUCT.PRODUCT_KEYWORD3, PRODUCT.PRODUCT_KEYWORD4, PRODUCT.PRODUCT_GOOGLE_IMAGE_URL, PRODUCT.PRODUCT_VALIDATED
FROM PRODUCT
WHERE PRODUCT.PRODUCT_ID IN (SELECT TOP 5000 PRODUCT_ID_COLLECTION.PRODUCT_ID FROM PRODUCT_ID_COLLECTION)
</cfquery>

The change above allows the products that need indexed to be gathered more quickly. Without this change, performance degrades as more and more products are indexed.

To optimize a collection as suggested above, the following code can be used:

<cfcollection action="optimize" collection="#collectionName#">

I think performance suffers a little as the collection grows larger as well. Not sure if anyone else has experienced this while indexing?

Sincerely,
Travis Walters

Hey There,

Has anybody come across this error before:

[b]Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers4_try_again_later

request:
http://localhost:8983/solr/products_id/update?commit=true&waitFlush=false&waitSearcher=false&wt=javabin
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Mailer: ColdFusion 9 Application Server[/b]

Seems the server is throwing a lot of these errors as it indexes the collection. Indexing is going a lot slower than expected.

Sincerely,
Travis Walters

Hello Again,

I found an important security bulletin about coldfusion 9 and SOLR:

Without taking care of this update, the SOLR collections are exposed and risk being downloaded through external URLs. Basically, the data could be stolen right from your collections.

Update: I found some information on the error I found listed above concerning the “maxWarmingSearchers”:

The guy claims its due to a faulty collection. Possible reasons could include:

#1 - The script could have executed while the same script was already executing. This could have occurred because the script got called every 90 seconds. When the view was not being used for that one query, it caused the script performance to degrade. In my next attempt, this issue will be taken care of.

#2 - Sajal mentioned he would optimize the collection frequently after batch inserts. I am not sure if this would help or if it is something I could do once all products have been inserted. I am going to avoid this for now.

Sincerely,
Travis Walters

Hey Guys,

I think it is an important note for SOLR users to understand when they add, edit, or delete products in their database, they should also update the collection.

To delete a product from the collection, you can do something similar to this:

<cfindex collection="#collectionName#" action="delete" key="#URL.productID#" type="custom">

To add or edit products in the collection, the script I posted in this thread will do that. However, when you edit a product in the source database, you should set the PRODUCT_INDEXED value to 0. When you add a product, the PRODUCT_INDEXED should also be set to 0.

The “maxWarmingSearchers” error is no longer appearing now that I am repopulating the cfcollection.

After the mass product add takes place, it may take several minutes for the cfcollection doccount to update. I set the time interval between massive adds to 4 minutes. This is a bit slower but the script is executing better. I have about 2 million products in the collection right now and 3.6 million to go.

Sincerely,
Travis Walters

Hello Again,

To improve performance with SOLR, there is a small tweak that can be made that is described here:
http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WS9c725d03408c25f4-7613171712234bed95d-8000.html

This increases the amount of memory that SOLR can use, which leads to better performance if you have a system with a bit of available memory.

One of my next goals is as follows: My custom3 field contains a “PRODUCT_PRIORITY_ID”, which indicates how well the product ranks in terms of “greenness”. If I can use SOLR to order by the custom3 field, then results should be sorted by greenness.

Edit: It would also be nice if there were a way to specify which columns should be returned in the query when using cfsearch. I find the summary field useless and would rather display my own description. Having it return this column wastes memory that I could be using to retrieve even more results.

Sincerely,
Travis Walters

Hey Guys,

I know sorting has to be possible. There is documentation to sorting here:

This query here works for sorting:
http://localhost:8983/solr/products_id2/select/?q=gold&version=2.2&start=0&rows=10&indent=on&sort=custom3+desc

The cfsearch wrapper that is used to communicate with the SOLR interface does not look like it supports sorting, which is very odd - at least i can not find any documentation on this.

#1 - I am wondering if it is possible to use the value in a cfcollection custom3 column to influence the SCORE value. If this is possible, I might be able to force sorting the score a particular way instead of by relevancy.

#2 - I wonder if anybody has created a custom wrapper that communicates with the SOLR interface - an alternative to cfsearch.

My problems with cfsearch right now are sorting and returning columns in the result query that are not needed. I wonder if I can file suggestions for the next version of coldfusion.

Sincerely,
Travis Walters

Hey Everyone,

I have spent the last couple days creating my own function that communicates with the SOLR interface. This tag can be used as an alternative to cfsearch. Here is version 1.0 of my SOLR_SEARCH code:

<cfsetting requestTimeOut = "60">

<cfset SOLR_SERVER_HOST = 'localhost'>
<cfset SOLR_SERVER_PORT = '8983'>
<cfset SOLR_SERVER_VERSION = '2.2'>

<cffunction name="SOLR_SEARCH" returnType="struct">

  <cfargument name="collection" type="string" required="true" default="">
  <cfargument name="criteria" type="string" required="false" default="*:*">
  <cfargument name="startrow" type="string" required="false" default="0">
  <cfargument name="maxrows" type="string" required="false" default="10">
  <cfargument name="sort" type="string" required="false" default="">
  <cfargument name="columns" type="string" required="false" 
  default="category,custom1,custom2,custom3,custom4,key,score,size,summary,title">
  <cfargument name="facet" type="string" required="false" default="">
  <cfargument name="timeout" type="string" required="false" default="10">
  <cfargument name="timeAllowed" type="string" required="false" default="10">
  
  <cfset returnStruct = StructNew()>
  
  <cfif NOT IsDefined("timeout") OR NOT IsNumeric(timeout) OR #timeout# LT 1>
    <cfset timeout = 10>
  </cfif>

  <cfif NOT IsDefined("timeAllowed") OR NOT IsNumeric(timeAllowed) OR #timeAllowed# LT 1>
    <cfset timeAllowed = 10>
  </cfif>

  <cfif NOT IsDefined("startrow") OR NOT IsNumeric(startrow) OR #startrow# LT 0>
    <cfset startrow = 0>
  </cfif>

  <cfif NOT IsDefined("maxrows") OR NOT IsNumeric(maxrows) OR #maxrows# LT 1>
    <cfset maxrows = 1>
  </cfif>

  <cfif NOT IsDefined("criteria")>
    <cfset criteria = "*:*">
  </cfif>

  <cfset myURL = 'http://#SOLR_SERVER_HOST#:#SOLR_SERVER_PORT#/solr/#collection#/select/'>
  <cfset myURL = '#myURL#?q=#criteria#&version=#SOLR_SERVER_VERSION#&start=#startrow#&rows=#maxrows#&indent=off'>

  <cfif IsDefined("sort") AND #sort# NEQ "">
    <cfset myURL = '#myURL#&sort=#sort#'>
  </cfif>

  <cfif IsDefined("columns") AND #columns# NEQ "">
    <cfset myURL = '#myURL#&fl=#columns#'>
  </cfif>

  <cfif IsDefined("facet") AND #facet# NEQ "">
    <cfset myURL = '#myURL#&facet=true&#facet#'>
  </cfif>  

  <cfif IsDefined("timeAllowed") AND #timeAllowed# NEQ "">
    <cfset myURL = '#myURL#&timeAllowed=#timeAllowed#'>
  </cfif>  

  <cfhttp method="Get" url="#myURL#" resolveurl="Yes" timeout="#timeout#" throwonerror="yes" />
 
  <cfset xmlfile=XMLParse(cfhttp.FileContent)>

  <cfset numResultsFound = #xmlfile.XmlRoot.XmlChildren[2].XmlAttributes.numFound#>
  <cfset queryStatus = #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[1].XmlText#>
  <cfset queryTime = #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[2].XmlText#>
  <cfset numResultsReturned = #ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren)#>

  <cfset queryFieldList = ''>

  <cfloop index="i" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[1].XmlChildren)#">  
    <cfif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlAttributes.name# EQ "params">
      <cfloop index="j" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlChildren)#">
        <cfif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlChildren[j].XmlAttributes.name# EQ "fl">
          <cfset queryFieldList = #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlChildren[j].XmlText#>
        </cfif>
      </cfloop>
    </cfif>  
  </cfloop>

  <cfif #queryFieldList# NEQ "">     
       
    <cfset xmlqry = QueryNew("#queryFieldList#")>  
        
    <cfif #numResultsReturned# NEQ 0>

      <cfset xmlqryRows = QueryAddRow(xmlqry,numResultsReturned)>

	  <cfloop index="i" from="1" to="#numResultsReturned#">
          
        <cfif #ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren)# NEQ 0>    
          
          <cfloop index="j" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren)#">
            
            <cfloop list="#queryFieldList#" index="k">
             
              <cfset myElement = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren[j].XmlAttributes#'>
              
   		      <cfif #ArrayLen(StructFindValue(myElement,k))# NEQ 0>
            
                <cfif #k# NEQ "category">
                  <cfset myElement = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren[j].XmlText#'>
                  <cfset xmlqryCell = QuerySetCell(xmlqry,"#k#",myElement,"#i#")>
                <cfelse>
                  <cfset catList = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren[j].XmlChildren[1].XmlText#'>
                  <cfset xmlqryCell = QuerySetCell(xmlqry,"#k#",catList,"#i#")>
                </cfif>
              
              </cfif>         
      
            </cfloop>
      
          </cfloop>
        
        </cfif>
    
      </cfloop>
      
    </cfif>
  
    <cfset xmlqry2 = QueryNew("RESULTS_FOUND,QUERY_STATUS,QUERY_TIME,RESULTS_RETURNED")>
    <cfset xmlqryRows = QueryAddRow(xmlqry2,1)> 
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"RESULTS_FOUND","#numResultsFound#",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"QUERY_STATUS","#queryStatus#",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"QUERY_TIME","#queryTime#",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"RESULTS_RETURNED","#numResultsReturned#",1)>

    <cfset xmlqry3 = QueryNew("QUERY_TIMEOUT,NULL_FIELDLIST")>
    <cfset xmlqryRows = QueryAddRow(xmlqry3,1)> 
    <cfset xmlqryCell = QuerySetCell(xmlqry3,"QUERY_TIMEOUT","0",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry3,"NULL_FIELDLIST","0",1)>
     
    <cfset returnStruct[0] = xmlqry3>
    <cfset returnStruct[1] = xmlqry2>
    <cfset returnStruct[2] = xmlqry> 
     
  <cfelse> 

    <cfset xmlqry3 = QueryNew("QUERY_TIMEOUT,NULL_FIELDLIST")>
    <cfset xmlqryRows = QueryAddRow(xmlqry3,1)> 
    <cfset xmlqryCell = QuerySetCell(xmlqry3,"QUERY_TIMEOUT","0",1)>    
    <cfset xmlqryCell = QuerySetCell(xmlqry3,"NULL_FIELDLIST","1",1)> 
    <cfset returnStruct[0] = xmlqry3>
        
  </cfif> 
  
  <cfreturn returnStruct>
  
</cffunction>

<cfparam name="URL.SEARCH_FOR" default="" type="string">

<cfset args = StructNew()>
<cfset args.collection = "products_id2">
<cfset args.startrow = 0>
<cfset args.maxrows = 10>
<cfset args.columns = 'key,custom1,custom2,custom3,custom4,summary,title'>
<cfset args.sort = 'score+desc'>
<cfset args.criteria = '#URL.SEARCH_FOR#'>
<cfset args.timeout = 5>
<cfset args.timeAllowed = 3000>

<cfinvoke method="SOLR_SEARCH" argumentCollection="#args#" returnVariable="res">

<cfdump var="#res#">

This code has a few advantages over cfsearch:

  • You can specify which columns should be in the result query
  • You can specify a maximum time to search for results
  • The results can be sorted numerous ways

Please note, I never looked for documentation on the exact format SOLR returns its results with. Thus, the XML document interpretation may have a few kinks in it. However, what I have is a great starting basis. Coders can feel free to use my code above. It must remain free. Use the code at your own risk; you are passing URL variables to your SOLR Server so there are security implications that could arise. I suggest making sure that you use HTMLEditFormat before passing any variables to the function that are retrieved from user input. If by any chance a coldfusion developer comes across employed by Adobe, I strongly urge you to expand your cfsearch tag using the same concepts provided by my code. Lastly, I am using SOLR version 2.2. The code may not work with other versions of SOLR.

I am estimating a working product search by the beginning of September. I will post a link to the feature at that time. I am very excited to get everything up and running for green products.

Sincerely,
Travis Walters

Hey Guys,

Be careful when sorting by a column when dealing with numbers. I tried to sort by custom3+desc and got these results:

10002,10004,10006,10008,10010,10016,10019,1002141,10022,10023

The incorrect sort in this sequence is obviously the 1002141. It appears to be reading left to right as if it were comparing strings. I believe adding “00” in front of all products except the 1002141 term and then sorting will yield correct results. I wonder if there is anyway to tell SOLR that a field is suppose to be numeric as opposed to text?

One thing I like most about SOLR is the timeAllowed parameter, which I started discussing a little in the last post.

To search for “gold” for no longer than 2 seconds, you can do this:
http://localhost:8983/solr/products_id2/select/?q=gold&version=2.2&start=0&rows=10&indent=on&timeAllowed=2000

With CFQUERY, you can specify a timeout. However, if the search time exceeds the timeout value, it throws an error instead of returning the results it found in the maximum time specified. CFQUERY is rather common with coldfusion. I really think they should add a TIMEALLOWED variable to CFQUERY. As soon as I can figure out where to make suggestions to the Adobe Coldfusion team, I have a few ideas I think they would like.

Update: I found that I can make suggestions to Adobe here:
http://cfbugs.adobe.com/cfbugreport/flexbugui/cfbugtracker/main.html

I opened reports 83899 and 83898.

Sincerely,
Travis Walters

Hey Everyone,

In my last post, I questioned if it was possible to tell SOLR that a field should be numeric as opposed to text. The answer is yes and here is how:

You need to find the path where you store the collection. For example, mine is:
F:\collections\products_id2\conf

In this directory, there is a file called schema.xml. Open that with notepad. Do a search for custom3 for example and you will see where the value is set to text. I changed this to integer. I also changed indexed=“false” to indexed=“true” for the key value. It appears SOLR only allows sorting by indexed fields. Once I changed the index to true, then I could sort by the key.

I remember earlier I was having trouble with the custom2 field where I was storing the price. I bet if I changed that to integer and reindexed my products I could use those ranges I was playing around with that would not work before. I plan to create a second collection later this week with prices instead of price ranges and see if it will work.

Update: It appears if you change the schema values, your collection may need reindexed before fields are sorted correctly. For example, I could sort by key ascending, but not key descending. Once I indexed a test collection, it sorted correctly.

Sincerely,
Travis Walters

For the prices, unless everything is whole numbers you probably want to use a float type of some kind instead of integer (not sure what SOLR refers to it as) or even better would be a currency type if it supports one natively. Just wanted to point that out quickly so you don’t have to re-index too often as you sort it out.

Hey Guys,

From what I have seen SOLR has float, double, integer, text, string, and other types. For product prices, I am now using custom3 to store price ranges and the title tag to store the actual prices. If I want to sort by price, I can just sort the title field.

It would be interesting to see if you can create your own fields in SOLR such as custom5, custom6, custom7, etc. However, if it is possible, CFINDEX only allows up to custom4. I suppose it would be possible to create a wrapper to do custom indexing.

I have written Adobe and said it would be nice if the CFCOLLECTION tag allowed more control over how the schema.xml file is setup. It would be nice if coldfusion could set field types, whether they are indexable, and create new fields if necessary. Instead of CFINDEX having a parameter for each field type, they could just have one parameter that accepts a CFQUERY that contains field name x value.

Product searches are usually pretty advanced. Coldfusion is only allowing enough fields to do searching on instead of fields that will be displayed in the search results. For example, I do not have enough variables to store a photo URL, the actual product description vs the summary, and the product name. When I am using coldfusion, I am focusing on returning only the KEY field which contain PRODUCT_ID numbers. From there, I can use CFQUERY to retrieve the product info I want displayed directly from the database. If coldfusion allowed more variables, that would reduce the number of round trips to the database.

On another note, it is probably a good idea to have two cfcollections of products. While one is getting updating using CFINDEX, the other one is searchable and will not get time out errors quite as easily.

I find that when using CFINDEX, it is best to have the indexing script called every 5 to 10 minutes. This supercedes what I said earlier in this thread. When I had times set less than 5 minutes, some problems occurred on the server. It is running more efficiently between 5 to 10 minutes. This would depend on your server setup.

Sincerely,
Travis Walters

Hey Guys,

I made a few updates to the SOLR_SEARCH function. Here is version 1.2 of my code:

<cfsetting requestTimeOut = "60">

<cfset SOLR_SERVER_HOST = 'localhost'>
<cfset SOLR_SERVER_PORT = '8983'>
<cfset SOLR_SERVER_VERSION = '2.2'>

<cffunction name="SOLR_SEARCH" returnType="struct">

  <cfargument name="collection" type="string" required="true" default="">
  <cfargument name="criteria" type="string" required="false" default="*:*">
  <cfargument name="startrow" type="string" required="false" default="0">
  <cfargument name="maxrows" type="string" required="false" default="10">
  <cfargument name="sort" type="string" required="false" default="">
  <cfargument name="columns" type="string" required="false" 
  default="category,custom1,custom2,custom3,custom4,key,score,size,summary,title">
  <cfargument name="facet" type="string" required="false" default="">
  <cfargument name="timeout" type="string" required="false" default="10">
  <cfargument name="timeAllowed" type="string" required="false" default="10">
  
  <cfset returnStruct = StructNew()>
  
  <cfif NOT IsDefined("timeout") OR NOT IsNumeric(timeout) OR #timeout# LT 1>
    <cfset timeout = 10>
  </cfif>

  <cfif NOT IsDefined("timeAllowed") OR NOT IsNumeric(timeAllowed) OR #timeAllowed# LT 1>
    <cfset timeAllowed = 10>
  </cfif>

  <cfif NOT IsDefined("startrow") OR NOT IsNumeric(startrow) OR #startrow# LT 0>
    <cfset startrow = 0>
  </cfif>

  <cfif NOT IsDefined("maxrows") OR NOT IsNumeric(maxrows) OR #maxrows# LT 1>
    <cfset maxrows = 1>
  </cfif>

  <cfif NOT IsDefined("criteria")>
    <cfset criteria = "*:*">
  </cfif>

  <cfset myURL = 'http://#SOLR_SERVER_HOST#:#SOLR_SERVER_PORT#/solr/#collection#/select/'>
  <cfset myURL = '#myURL#?q=#criteria#&version=#SOLR_SERVER_VERSION#&start=#startrow#&rows=#maxrows#&indent=off'>

  <cfif IsDefined("sort") AND #sort# NEQ "">
    <cfset myURL = '#myURL#&sort=#sort#'>
  </cfif>

  <cfif IsDefined("columns") AND #columns# NEQ "">
    <cfset myURL = '#myURL#&fl=#columns#'>
  </cfif>

  <cfif IsDefined("facet") AND #facet# NEQ "">
    <cfset myURL = '#myURL#&facet=true&#facet#'>
  </cfif>  

  <cfif IsDefined("timeAllowed") AND #timeAllowed# NEQ "">
    <cfset myURL = '#myURL#&timeAllowed=#timeAllowed#'>
  </cfif>  

  <cfhttp method="Get" url="#myURL#" resolveurl="Yes" timeout="#timeout#" throwonerror="yes" />

  <cfset xmlfile=XMLParse(cfhttp.FileContent)>

  <cfset numResultsFound = #xmlfile.XmlRoot.XmlChildren[2].XmlAttributes.numFound#>
  <cfset numResultsReturned = #ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren)#>

  <cfset partialResults = 'false'>
  <cfset queryFieldList = ''>
  <cfset queryStatus = ''>
  <cfset queryTime = ''>

  <cfloop index="i" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[1].XmlChildren)#">  
    <cfif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlAttributes.name# EQ "params">
      <cfloop index="j" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlChildren)#">
        <cfif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlChildren[j].XmlAttributes.name# EQ "fl">
          <cfset queryFieldList = '#xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlChildren[j].XmlText#'>
        </cfif>
      </cfloop>
    <cfelseif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlAttributes.name# EQ "status">
      <cfset queryStatus = '#xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlText#'>
    <cfelseif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlAttributes.name# EQ "QTime">
      <cfset queryTime = '#xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlText#'>
    <cfelseif #xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlAttributes.name# EQ "partialResults">
      <cfset partialResults = '#xmlfile.XmlRoot.XmlChildren[1].XmlChildren[i].XmlText#'>      
    </cfif>       
  </cfloop>

  <cfif #queryFieldList# NEQ "">     
       
    <cfset xmlqry = QueryNew("#queryFieldList#")>  
        
    <cfif #numResultsReturned# NEQ 0>

      <cfset xmlqryRows = QueryAddRow(xmlqry,numResultsReturned)>

      <cfloop index="i" from="1" to="#numResultsReturned#">
          
        <cfif #ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren)# NEQ 0>    
          
          <cfloop index="j" from="1" to="#ArrayLen(xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren)#">
            
            <cfloop list="#queryFieldList#" index="k">
             
              <cfset myElement = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren[j].XmlAttributes#'>
              
                 <cfif #ArrayLen(StructFindValue(myElement,k))# NEQ 0>
            
                <cfif #k# NEQ "category">
                  <cfset myElement = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren[j].XmlText#'>
                  <cfset xmlqryCell = QuerySetCell(xmlqry,"#k#",myElement,"#i#")>
                <cfelse>
                  <cfset catList = '#xmlfile.XmlRoot.XmlChildren[2].XmlChildren[i].XmlChildren[j].XmlChildren[1].XmlText#'>
                  <cfset xmlqryCell = QuerySetCell(xmlqry,"#k#",catList,"#i#")>
                </cfif>
              
              </cfif>         
      
            </cfloop>
      
          </cfloop>
        
        </cfif>
    
      </cfloop>
      
    </cfif>
  
    <cfset xmlqry2 = QueryNew("RESULTS_FOUND,QUERY_STATUS,QUERY_TIME,RESULTS_RETURNED,PARTIAL_RESULTS")>
    <cfset xmlqryRows = QueryAddRow(xmlqry2,1)> 
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"RESULTS_FOUND","#numResultsFound#",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"QUERY_STATUS","#queryStatus#",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"QUERY_TIME","#queryTime#",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"RESULTS_RETURNED","#numResultsReturned#",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry2,"PARTIAL_RESULTS","#partialResults#",1)>

    <cfset xmlqry3 = QueryNew("QUERY_TIMEOUT,NULL_FIELDLIST")>
    <cfset xmlqryRows = QueryAddRow(xmlqry3,1)> 
    <cfset xmlqryCell = QuerySetCell(xmlqry3,"QUERY_TIMEOUT","0",1)>
    <cfset xmlqryCell = QuerySetCell(xmlqry3,"NULL_FIELDLIST","0",1)>
     
    <cfset returnStruct[0] = xmlqry3>
    <cfset returnStruct[1] = xmlqry2>
    <cfset returnStruct[2] = xmlqry> 
     
  <cfelse> 

    <cfset xmlqry3 = QueryNew("QUERY_TIMEOUT,NULL_FIELDLIST")>
    <cfset xmlqryRows = QueryAddRow(xmlqry3,1)> 
    <cfset xmlqryCell = QuerySetCell(xmlqry3,"QUERY_TIMEOUT","0",1)>    
    <cfset xmlqryCell = QuerySetCell(xmlqry3,"NULL_FIELDLIST","1",1)> 
    <cfset returnStruct[0] = xmlqry3>
        
  </cfif> 
  
  <cfreturn returnStruct>
  
</cffunction>

<cfparam name="URL.SEARCH_FOR" default="" type="string">

<cfset args = StructNew()>
<cfset args.collection = "products_id2">
<cfset args.startrow = 0>
<cfset args.maxrows = 10>
<cfset args.columns = 'key,custom1,custom2,custom3,custom4,summary,title'>
<cfset args.sort = 'score+desc'>
<cfset args.criteria = '#URL.SEARCH_FOR#'>
<cfset args.timeout = 5>
<cfset args.timeAllowed = 3000>

<cfinvoke method="SOLR_SEARCH" argumentCollection="#args#" returnVariable="res">

<cfdump var="#res#">

SOLR returns a variable called “partialResults” when the “timeAllowed” is used up before the search reaches the end of the collection. This is useful if you wish to inform the user that increasing the search time may lead to more and / or better search results.

Sincerely,
Travis Walters

SOLR 1.4 has native replication support: SolrReplication - Solr - Apache Software Foundation and you can munge something together for earlier releases with ssh and rsync (if you are looking to separate the indexing and reading collections). That may be getting more advanced than your out-of-the-box configuration warrants though.

Hey Guys,

I want to thank you all for recommending SOLR to me. Most searches are returning results to me within a fraction of a second and that is with searching millions of records. It is so much more advanced than CFQUERY that really there is no comparison.

I have read about so many more features that SOLR supports that I have not talked about. For example, here are a few documents from the (Resources - Apache Solr) documentation:

http://wiki.apache.org/solr/DistributedSearch
http://wiki.apache.org/solr/FunctionQuery
http://wiki.apache.org/solr/StatsComponent
http://wiki.apache.org/solr/ClusteringComponent

I doubt that I will ever need to distribute my collections among different servers. It is nice knowing that SOLR has this functionality just in case. I have checked back on the Adobe recommendations I have submitted. Thanks in advance to all those who vote to expand Coldfusion SOLR capabilities. It is nice hearing about people coming from there to read this thread. I have submitted report 83909 to Adobe just now that is a suggestion for coldfusion to add distributed search support. There is a lot that coldfusion needs before this with their SOLR integration, but this support should be there as well.

Sincerely,
Travis Walters

Hey Guys,

Good news for any coldfusion SOLR developers out there - Adobe verified the bugs / enhancements that I submitted to them. It looks like they are targeting the release of the improvements for Coldfusion 10 Alpha.

It is great to see something nice to come out of all this work.

Sincerely,
Travis Walters