Whether a particular number is a S-number, in base b, for functions f(x) and g(x,y) can be ascertained by dividing it into digits, and recursively applying the substitutions x®f(x) and x,y®g(x,y) until the value of the expression exceeds the number, whereupon we can conclude from f(x) ³ x and g(x,y)=x+y, that further substitutions must exceed the number. If the value of the expression equals the number we have found a solution, and we can either abandon the search of the implicit tree, or continue on other branches to find other solutions. (For some S-numbers there is more than one solution.)
If f(x)=x we also must terminate the recursion at this point, as otherwise we would have an infinite regress. With the functions I have investigated this only occurs for x=0,1,2.
To avoid the risk of arithmetic overflow, one should precalculate the largest number v, f(v) less than target, and check against this for terminating the recursion, rather than applying f(v) and then comparing with the target. (This is also marginally faster.)
The practical limit of this algorithm depends on how fast f(x) grows, and for typical function, for a search of all numbers up to a limit, the limit is ~104.
It may be observed that given the sequence x,y by substitution we obtain the sequences (x),y and x,(y), and from both of these sequence we obtain the sequence (x),(y). Similar x,y,z leads us to x·y·z via x·y,z and x,y·z. That is the candidate lists, related by substitution operations, form a directed acyclic graph, rather than a tree. We can modify the algorithm to use a hash function to determine when we are revisting a node, and terminate the recursion when we find we are revisting a node.
This modification extends the practical limit of the algorithm to ~105.
It may also be observed that any solution may be expressed as fn(g(l,r)), where l is derived from the left end of the sequences of digits, and r from the right end of the sequence of digits. Furthermore, when investigation numbers in sequence, the same left end sequences are repeatedly evaluated. From these observation is can be seen the caching the results for left end sequences would improve performance.
This is exploited by calculating for each possible pair l,r for a number (a n-digit number has n-1 possible such pairs) a list of expression values less that the target number, and then evaluating fn(g(l,r)) for each pair of values. Let li be the list of values for the i leftmost digits of the target number. This need only be recalculated (with appropriate increase in the maximum expression value excepted to cover all possible target numbers, of that length, with the same i leftmost digits) when these digits change.
This revision of the algorithm extends the practical range to ~106.
Range filtering: 8! is 40,320. The largest sum not using 8! therefore must be 5*7!, or noting that the number must be less than 40,320, 3!+4*7!, which is 20,166. Therefore no number between 20,167 and 40,319 can be a S-number for f(x)=x!, g(x,y)=x+y, in base 10. The omitted range can be widened by further analysis. In general, for fast growing functions, ranges which cannot contain S-numbers can be identified analytically, drastically reducing the search space.