Markov algorithm to compute f(x)=x/2 - turing-machines

I want to write a Markov algorithm to compute f(x)=x/2 with remainder in set А={|, , =, /}. For example if the input is |||||/||= output should be |||||/||=||*|.
Best I could get was a simple algorithm that shows the result and the remainder, but it's missing the first part where the numerator should be.
Input is|||||/||= and output is /||=||*|

Do not delete the | but rather mark them and then move their info to the right:
The rules | -> 1# , #| -> |#, #1 -> 1# can take your initial |||||/||= to 11111#####/||=
Now use your rules but consume # instead of |; I do not fully understand your notation though.
If the output 11111/||=||*| is good enough for you, you can stop here. Otherwise the question is quite complicated, because the initial |||||.../|| can become arbitrarily long, and the system will not terminate if any rule is applicable to it.


Guidelines for applying DRY in Haskell function definitions

I have a question about whether or not a specific way of applying of the DRY principle is considered a good practice in Haskell.I'm going to present an example, and then ask whether the approach I'm taking is considered good Haskell style. In a nutshell, the question is this: when you have a long formula, and then you find yourself needing to repeat some small subsets of that formula elsewhere, do you always put that repeated subset of the formula into a variable so you can stay DRY? Why or why not?
The Example:
Imagine we're taking a string of digits, and converting that string into its corresponding Int value. (BTW, this is an exercise from "Real World Haskell").
Here's a solution that works except that it ignores edge cases:
asInt_fold string = fst (foldr helper (0,0) string)
helper char (sum,place) = (newValue, newPlace)
newValue = (10 ^ place) * (digitToInt char) + sum
newPlace = place + 1
It uses foldr, and the accumulator is a tuple of the next place value and the sum so far.
So far so good. Now, when I went to implement the edge case checks, I found that I needed little portions of the "newValue" formula in different places to check for errors. For example, on my machine, there would be an Int overflow if the input was larger than (2^31 - 1), so the max value I could handle is 2,147,483,647. Therefore, I put in 2 checks:
If the place value 9 (the billions place) and the digit value is > 2, there's an error.
If sum + (10 ^ place) * (digitToInt char) > maxInt, there's an error.
Those 2 checks caused me to repeat part of the formula, so I introduced the following new variables:
digitValue = digitToInt char
newPlaceComponent = (10^place) * digitValue
The reason I introduced those variables is merely an automatic application of the DRY principle: I found myself repeating those portions of the formula, so I defined them once and only once.
However, I wonder if this is considered good Haskell style. There are obvious advantages, but I see disadvantages as well. It definitely makes the code longer, whereas much of the Haskell code I've seen is pretty terse.
So, do you consider this good Haskell style, and do you follow this practice, or not? Why / why not?
And for what it's worth, here's my final solution that deals with a number of edge cases and therefore has quite a large where block. You can see how large the block became due to my application of the DRY principle.
asInt_fold "" = error "You can't be giving me an empty string now"
asInt_fold "-" = error "I need a little more than just a dash"
asInt_fold string | isInfixOf "." string = error "I can't handle decimal points"
asInt_fold ('-':xs) = -1 * (asInt_fold xs)
asInt_fold string = fst (foldr helper (0,0) string)
helper char (sum,place) | place == 9 && digitValue > 2 = throwMaxIntError
| maxInt - sum < newPlaceComponent = throwMaxIntError
| otherwise = (newValue, newPlace)
digitValue = (digitToInt char)
placeMultiplier = (10 ^ place)
newPlaceComponent = placeMultiplier * digitValue
newValue = newPlaceComponent + sum
newPlace = place + 1
maxInt = 2147483647
throwMaxIntError =
error "The value is larger than max, which is 2147483647"
As noted by bdonlan, your algorithm could be cleaner---it's especially useful that the language itself detects overflow. As for your code itself and the style, I think the main tradeoff is that each new name imposes a small cognitive burden on the reader. When to name an intermediate result becomes a judgment call.
I personally would not have chosen to name placeMultiplier, as I think the intent of place ^ 10 is much clearer. And I would look for maxInt in the Prelude, as you run the risk of being terribly wrong if run on 64-bit hardware. Otherwise, the only thing I find objectionable in your code are the redundant parentheses. So what you have is an acceptable style.
(My credentials: At this point I have written on the order of 10,000 to 20,000 lines of Haskell code, and I have read perhaps two or three times that. I also have ten times that much experience with the ML family of languages, which require the programmer to make similar decisions.)
DRY is just as good of a principle in Haskell as it is anywhere else :)
A lot of the reason behind the terseness you speak of in haskell is that many idioms are lifted out into libraries, and that often those examples you look at have been considered very carefully to make them terse :)
For example, here's an alternate way to implement your digit-to-string algorithm:
asInt_fold ('-':n) = negate (asInt_fold n)
asInt_fold "" = error "Need some actual digits!"
asInt_fold str = foldl' step 0 str
step _ x
| x < '0' || x > '9'
= error "Bad character somewhere!"
step sum dig =
case sum * 10 + digitToInt dig of
n | n < 0 -> error "Overflow!"
n -> n
A few things to note:
We detect overflow when it happens, not by deciding arbitrary-ish limits on what digits we allow. This signifigantly simplifies the overflow detection logic - and makes it work on any integer type from Int8 to Integer [as long as overflow results in wraparound, doesn't occur, or results in an assertion from the addition operator itself]
By using a different fold, we don't need two seperate states.
No repeating ourselves, even without going out of our way to lift things out - it falls naturally out of re-stating what we're trying to say.
Now, it's not always possible to just reword the algorithm and make the duplication go away, but it's always useful to take a step back and reconsider how you've been thinking about the problem :)
I think the way you've done it makes sense.
You should certainly always break repeated computations out into separately defined values if avoiding repeated computation is important, but in this case that doesn't look necessary. Nevertheless, the broken out values have easy to understand names, so they make your code easier to follow. I don't think the fact that your code is a bit longer as a result is a bad thing.
BTW, instead hardcoding the maximum Int, you can use (maxBound :: Int) which avoids the risk of you making a mistake or another implementation with a different maximum Int breaking your code.

Are there any O(1/n) algorithms?

Are there any O(1/n) algorithms?
Or anything else which is less than O(1)?
This question isn't as stupid as it might seem. At least theoretically, something such as O(1/n) is completely sensible when we take the mathematical definition of the Big O notation:
Now you can easily substitute g(x) for 1/x … it's obvious that the above definition still holds for some f.
For the purpose of estimating asymptotic run-time growth, this is less viable … a meaningful algorithm cannot get faster as the input grows. Sure, you can construct an arbitrary algorithm to fulfill this, e.g. the following one:
def get_faster(list):
how_long = (1 / len(list)) * 100000
Clearly, this function spends less time as the input size grows … at least until some limit, enforced by the hardware (precision of the numbers, minimum of time that sleep can wait, time to process arguments etc.): this limit would then be a constant lower bound so in fact the above function still has runtime O(1).
But there are in fact real-world algorithms where the runtime can decrease (at least partially) when the input size increases. Note that these algorithms will not exhibit runtime behaviour below O(1), though. Still, they are interesting. For example, take the very simple text search algorithm by Horspool. Here, the expected runtime will decrease as the length of the search pattern increases (but increasing length of the haystack will once again increase runtime).
There is precisely one algorithm with runtime O(1/n), the "empty" algorithm.
For an algorithm to be O(1/n) means that it executes asymptotically in less steps than the algorithm consisting of a single instruction. If it executes in less steps than one step for all n > n0, it must consist of precisely no instruction at all for those n. Since checking 'if n > n0' costs at least 1 instruction, it must consist of no instruction for all n.
Summing up:
The only algorithm which is O(1/n) is the empty algorithm, consisting of no instruction.
That's not possible. The definition of Big-O is the not greater than inequality:
A(n) = O(B(n))
exists constants C and n0, C > 0, n0 > 0 such that
for all n > n0, A(n) <= C * B(n)
So the B(n) is in fact the maximum value, therefore if it decreases as n increases the estimation will not change.
sharptooth is correct, O(1) is the best possible performance. However, it does not imply a fast solution, just a fixed time solution.
An interesting variant, and perhaps what is really being suggested, is which problems get easier as the population grows. I can think of 1, albeit contrived and tongue-in-cheek answer:
Do any two people in a set have the same birthday? When n exceeds 365, return true. Although for less than 365, this is O(n ln n). Perhaps not a great answer since the problem doesn't slowly get easier but just becomes O(1) for n > 365.
From my previous learning of big O notation, even if you need 1 step (such as checking a variable, doing an assignment), that is O(1).
Note that O(1) is the same as O(6), because the "constant" doesn't matter. That's why we say O(n) is the same as O(3n).
So if you need even 1 step, that's O(1)... and since your program at least needs 1 step, the minimum an algorithm can go is O(1). Unless if we don't do it, then it is O(0), I think? If we do anything at all, then it is O(1), and that's the minimum it can go.
(If we choose not to do it, then it may become a Zen or Tao question... in the realm of programming, O(1) is still the minimum).
Or how about this:
programmer: boss, I found a way to do it in O(1) time!
boss: no need to do it, we are bankrupt this morning.
programmer: oh then, it becomes O(0).
No, this is not possible:
As n tends to infinity in 1/n we eventually achieve 1/(inf), which is effectively 0.
Thus, the big-oh class of the problem would be O(0) with a massive n, but closer to constant time with a low n. This is not sensible, as the only thing that can be done in faster than constant time is:
void nothing() {};
And even this is arguable!
As soon as you execute a command, you're in at least O(1), so no, we cannot have a big-oh class of O(1/n)!
What about not running the function at all (NOOP)? or using a fixed value. Does that count?
I often use O(1/n) to describe probabilities that get smaller as the inputs get larger -- for example, the probability that a fair coin comes up tails on log2(n) flips is O(1/n).
O(1) simply means "constant time".
When you add an early exit to a loop[1] you're (in big-O notation) turning an O(1) algorithm into O(n), but making it faster.
The trick is in general the constant time algorithm is the best, and linear is better then exponential, but for small amounts of n, the exponential algorith might actually be faster.
1: Assuming a static list length for this example
I believe quantum algorithms can do multiple computations "at once" via superposition...
I doubt this is a useful answer.
For anyone whose reading this question and wants to understand what the conversation is about, this might help:
| |constant |logarithmic |linear| N-log-N |quadratic| cubic | exponential |
| n | O(1) | O(log n) | O(n) |O(n log n)| O(n^2) | O(n^3) | O(2^n) |
| 1 | 1 | 1 | 1| 1| 1| 1 | 2 |
| 2 | 1 | 1 | 2| 2| 4| 8 | 4 |
| 4 | 1 | 2 | 4| 8| 16| 64 | 16 |
| 8 | 1 | 3 | 8| 24| 64| 512 | 256 |
| 16 | 1 | 4 | 16| 64| 256| 4,096 | 65536 |
| 32 | 1 | 5 | 32| 160| 1,024| 32,768 | 4,294,967,296 |
| 64 | 1 | 6 | 64| 384| 4,069| 262,144 | 1.8 x 10^19 |
many people have had the correct answer (No) Here's another way to prove it: In order to have a function, you have to call the function, and you have to return an answer. This takes a certain constant amount of time. EVEN IF the rest of the processing took less time for larger inputs, printing out the answer (Which is we can assume to be a single bit) takes at least constant time.
If solution exists, it can be prepared and accessed in constant time=immediately. For instance using a LIFO data structure if you know the sorting query is for reverse order. Then data is already sorted, given that the appropriate model (LIFO) was chosen.
Which problems get easier as population grows? One answer is a thing like bittorrent where download speed is an inverse function of number of nodes. Contrary to a car, which slows down the more you load it, a file-sharing network like bittorrent speeds the more nodes connected.
You can't go below O(1), however O(k) where k is less than N is possible. We called them sublinear time algorithms. In some problems, Sublinear time algorithm can only gives approximate solutions to a particular problem. However, sometimes, an approximate solutions is just fine, probably because the dataset is too large, or that it's way too computationally expensive to compute all.
O(1/n) is not less then O(1), it basically means that the more data you have, the faster algorithm goes. Say you get an array and always fill it in up to a 10100 elements if it has less then that and do nothing if there's more. This one is not O(1/n) of course but something like O(-n) :) Too bad O-big notation does not allow negative values.
As has been pointed out, apart from the possible exception of the null function, there can be no O(1/n) functions, as the time taken will have to approach 0.
Of course, there are some algorithms, like that defined by Konrad, which seem like they should be less than O(1) in at least some sense.
def get_faster(list):
how_long = 1/len(list)
If you want to investigate these algorithms, you should either define your own asymptotic measurement, or your own notion of time. For example, in the above algorithm, I could allow the use of a number of "free" operations a set amount of times. In the above algorithm, if I define t' by excluding the time for everything but the sleep, then t'=1/n, which is O(1/n). There are probably better examples, as the asymptotic behavior is trivial. In fact, I am sure that someone out there can come up with senses that give non-trivial results.
Most of the rest of the answers interpret big-O to be exclusively about the running time of an algorithm. But since the question didn't mention it, I thought it's worth mentioning the other application of big-O in numerical analysis, which is about error.
Many algorithms can be O(h^p) or O(n^{-p}) depending on whether you're talking about step-size (h) or number of divisions (n). For example, in Euler's method, you look for an estimate of y(h) given that you know y(0) and dy/dx (the derivative of y). Your estimate of y(h) is more accurate the closer h is to 0. So in order to find y(x) for some arbitrary x, one takes the interval 0 to x, splits it up until n pieces, and runs Euler's method at each point, to get from y(0) to y(x/n) to y(2x/n), and so on.
So Euler's method is then an O(h) or O(1/n) algorithm, where h is typically interpreted as a step size and n is interpreted as the number of times you divide an interval.
You can also have O(1/h) in real numerical analysis applications, because of floating point rounding errors. The smaller you make your interval, the more cancellation occurs for the implementation of certain algorithms, more loss of significant digits, and therefore more error, which gets propagated through the algorithm.
For Euler's method, if you are using floating points, use a small enough step and cancellation and you're adding a small number to a big number, leaving the big number unchanged. For algorithms that calculate the derivative through subtracting from each other two numbers from a function evaluated at two very close positions, approximating y'(x) with (y(x+h) - y(x) / h), in smooth functions y(x+h) gets close to y(x) resulting in large cancellation and an estimate for the derivative with fewer significant figures. This will in turn propagate to whatever algorithm you require the derivative for (e.g., a boundary value problem).
OK, I did a bit of thinking about it, and perhaps there exists an algorithm that could follow this general form:
You need to compute the traveling salesman problem for a 1000 node graph, however, you are also given a list of nodes which you cannot visit. As the list of unvisitable nodes grows larger, the problem becomes easier to solve.
What about this:
void FindRandomInList(list l)
int rand =;
if (l.contains(rand))
as the size of the list grows, the expected runtime of the program decreases.
I see an algorithm that is O(1/n) admittedly to an upper bound:
You have a large series of inputs which are changing due to something external to the routine (maybe they reflect hardware or it could even be some other core in the processor doing it.) and you must select a random but valid one.
Now, if it wasn't changing you would simply make a list of items, pick one randomly and get O(1) time. However, the dynamic nature of the data precludes making a list, you simply have to probe randomly and test the validity of the probe. (And note that inherently there is no guarantee the answer is still valid when it's returned. This still could have uses--say, the AI for a unit in a game. It could shoot at a target that dropped out of sight while it was pulling the trigger.)
This has a worst-case performance of infinity but an average case performance that goes down as the data space fills up.
In numerical analysis, approximation algorithms should have sub-constant asymptotic complexity in the approximation tolerance.
class Function
public double[] ApproximateSolution(double tolerance)
// if this isn't sub-constant on the parameter, it's rather useless
I guess less than O(1) is not possible. Any time taken by algo is termed as O(1). But for O(1/n) how about the function below. (I know there are many variants already presented in this solution, but I guess they all have some flaws (not major, they explain the concept well). So here is one, just for the sake of argument:
def 1_by_n(n, C = 10): #n could be float. C could be any positive number
if n <= 0.0: #If input is actually 0, infinite loop.
while True:
sleep(1) #or pass
return #This line is not needed and is unreachable
delta = 0.0001
itr = delta
while delta < C/n:
itr += delta
Thus as n increases the function will take less and less time. Also it is ensured that if input actually is 0, then the function will take forever to return.
One might argue that it will be bounded by precision of machine. thus sinc eit has an upper bound it is O(1). But we can bypass that as well, by taking inputs of n and C in string. And addition and comparison is done on string. Idea is that, with this we can reduce n arbitrarily small. Thus upper limit of the function is not bounded, even when we ignore n = 0.
I also believe that we can't just say that run time is O(1/n). But we should say something like O(1 + 1/n)
I had such a doubt way back in 2007, nice to see this thread, i came to this thread from my reddit thread ->
It may be possible to construct an algorithm that is O(1/n). One example would be a loop that iterates some multiple of f(n)-n times where f(n) is some function whose value is guaranteed to be greater than n and the limit of f(n)-n as n approaches infinity is zero. The calculation of f(n) would also need to be constant for all n. I do not know off hand what f(n) would look like or what application such an algorithm would have, in my opinion however such a function could exist but the resulting algorithm would have no purpose other than to prove the possibility of an algorithm with O(1/n).
I don't know about algorithms but complexities less than O(1) appear in randomized algorithms. Actually, o(1) (little o) is less than O(1). This kind of complexity usually appears in randomized algorithms. For example, as you said, when the probability of some event is of order 1/n they denote it with o(1). Or when they want to say that something happens with high probability (e.g. 1 - 1/n) they denote it with 1 - o(1).
If the answer is the same regardless of the input data then you have an O(0) algorithm.
or in other words - the answer is known before the input data is submitted
- the function could be optimised out - so O(0)
Big-O notation represents the worst case scenario for an algorithm which is not the same thing as its typical run time. It is simple to prove that an O(1/n) algorithm is an O(1) algorithm . By definition,
O(1/n) --> T(n) <= 1/n, for all n >= C > 0
O(1/n) --> T(n) <= 1/C, Since 1/n <= 1/C for all n >=C
O(1/n) --> O(1), since Big-O notation ignores constants (i.e. the value of C doesn't matter)
Nothing is smaller than O(1)
Big-O notation implies the largest order of complexity for an algorithm
If an algorithm has a runtime of n^3 + n^2 + n + 5 then it is O(n^3)
The lower powers dont matter here at all because as n -> Inf, n^2 will be irrelevant compared to n^3
Likewise as n -> Inf, O(1/n) will be irrelevant compared to O(1) hence 3 + O(1/n) will be the same as O(1) thus making O(1) the smallest possible computational complexity
inline void O0Algorithm() {}

Why use “or” instead of “xor” in definitions?

This might be a trivial question but I really can't find the answer anywhere. There is a convention in computer science which I find peculiar.
In haskell datatypes can be defined like this:
data Bool = False | True
In xml qualified names are defined like this:
QName ::= PrefixedName | UnprefixedName
There are probably more similar examples but this should suffice.
Usually it is well understood that | (pipe or bar) should be read as or. But this seems strange. A or B is true also when both A and B are true. While it makes sense in the first example (there is a possibility that something is True and False at the same time, but we implicitly assume the law of non-contradiction), it doesn't in the second: something is either a PrefixedName or an UnprefixedName it can't be both.
So why is this often put like this? Why not use exclusive or? Are there any non-conventional reasons?
This data X = A | B notation should not really be understood as a logical OR at all (though that corresponds quite well to the intuitive meaning). What it really means is that X is a sum type of A and B, i.e. a coproduct. Now, the product operation on booleans is in fact AND, and so the dual would quite naturally be OR.
Though then again, the sum operation on a vector space of booleans is actually XOR so we're turning circles a bit...
I just wouldn't read too much into this. | is simply a symbol; in C-like languages it happens to also mean bitwise OR, but the actual logical OR is generally denoted differently, be it || or ∨.
| has a long history of being used to separate items in a list of mutually exclusive choices:
Regular expressions: a* | b* means a string can either be 0 or more as or 0 or more bs, but not both.
Backus-Naur form for representing context-free grammars:
Expr ::= Term | Expr AddOp Term
in which an expression can either be a single term, or another expression combined with a term with an addition operator. (It cannot be both simultaneously.)
Usage messages for command-line programs:
git branch (-d | -D) [-r] <branchname>...
Here the git branch command can take either the -d or the -D option, but not both in the same invocation.
The data statement in Haskell continues this tradition; it is unrelated to the use of | as a logical or bit-wise operator.
(If anything, it is possible that the use of | for bitwise OR was inspired by Backus-Naur form, in which case you could be asking why | is used for OR instead of XOR.)
I think the meaning of "or" here is in a literal sense as opposed to a mathematical sense. Using the term "or" means that the value can have one value, or another. It is not an operator which aims to determine a truth value.
While it may not be entirely logical to use an "or" operator sign for this, it does a good enough job at getting the point across for a reader. This clarity is ultimately the thing a language is striving for. As long as a greater proportion of readers is capable of interpreting it as a "literal" usage as opposed to a "mathematical" operator usage, using or in a literal sense means you, as a language, are doing a better job at getting your point across, making the method superior to using something such as XOR.

PushDown Automaton (PDA) for L={a^(n)b^(n)c^(n)|n>=1}

I am on a fool's errand trying to construct a Pushdown automaton for the non-context-free language L={a^(n)b^(n)c^(n)|n>=1} and thought of two approaches.
First approach:-
I thought that for every 'a' in string I will push 3 'a' into the stack and for every 'b' in the string, I will pop 2 'a' from the stack now for every 'c' in the string I will still have 1 'a' in the stack.
Problem with the First approach:- the language generated becomes something like this L={a^(p)b^(m)c^(n)| p>=1 and could not determine how m and n can be defined}
Second approach:-
We know that L={ a^(n)b^(m)c^(m)d^(n) | n>=0 } is a context-free language and L={ wxw | w∈(a,b)* } is also context-free language.
So, I thought L={ a^(n)b^(m)b^(m)c^(n) | n>=1 and m=floor((n+1)/2) }
Problem with the Second approach:- don't know if we can calculate floor(n+1/2) in the PDA without disturbing the elements of the stack.
Please help in determining how m and n can be defined in the first approach and how can I find floor((n+1)/2) in the PDA.
JFLAP files available for both if needed.
As Ami Tavory points out, there is no PDA for this language because this language is not context-free. It is easy to recognize this language if you use a queue instead of a stack, use two stacks, or use a Turing machine (all equivalent).
Queue machine:
Enqueue as as long as you see as, until you see a b.
Dequeue as and enqueue bs as long as you see bs, until you see a c
Dequeue bs as long as you see cs.
Accept if you end this process with no additional input and an empty queue.
Two-stack PDA:
Use the first stack to make sure a^n b^n by pushing a when you see an a and popping a when you see a b;
Use the second stack to make sure b^n c^n by pushing b when you see a b and popping b when you see a c;
Accept if both stacks are empty at the end of this process.
Turing machine:
Ensure a^n ... c^n by replacing each a with A and erasing a matching c;
Ensure A^n b^n by erasing matching pairs of A and b;
Accept if at the end of this process you have no more A and no more b, i.e., the tape has been completely cleared.
One reason you've not managed to construct a pushdown automaton for this language, is because there isn't any. The Bar Hillel pumping lemma shows this.
To outline the proof, suppose it can be done. Then, for some p, each string larger than p can be partitioned to uvwxy, s.t.,
|vwx| < p
|vx| > 1
uvnwxny is also accepted by the automaton, for any n.
The first rule implies that vwx can't span the three regions, only at most two (for large enough strings). The second and third rules now imply that you can pump so that the un-spanned region is smaller than the at least one of the other regions.

Count negative numbers in list using list comprehension

Working through the first edition of "Introduction to Functional Programming", by Bird & Wadler, which uses a theoretical lazy language with Haskell-ish syntax.
Exercise 3.2.3 asks:
Using a list comprehension, define a function for counting the number
of negative numbers in a list
Now, at this point we're still scratching the surface of lists. I would assume the intention is that only concepts that have been introduced at that point should be used, and the following have not been introduced yet:
A function for computing list length
List indexing
Pattern matching i.e. f (x:xs) = ...
Infinite lists
All the functions and operators that act on lists - with one exception - e.g. ++, head, tail, map, filter, zip, foldr, etc
What tools are available?
A maximum function that returns the maximal element of a numeric list
List comprehensions, with possibly multiple generator expressions and predicates
The notion that the output of the comprehension need not depend on the generator expression, implying the generator expression can be used for controlling the size of the generated list
Finite arithmetic sequence lists i.e. [a..b] or [a, a + step..b]
I'll admit, I'm stumped. Obviously one can extract the negative numbers from the original list fairly easily with a comprehension, but how does one then count them, with no notion of length or indexing?
The availability of the maximum function would suggest the end game is to construct a list whose maximal element is the number of negative numbers, with the final result of the function being the application of maximum to said list.
I'm either missing something blindingly obvious, or a smart trick, with a horrible feeling it may be the former. Tell me SO, how do you solve this?
My old -- and very yellowed copy of the first edition has a note attached to Exercise 3.2.3: "This question needs # (length), which appears only later". The moral of the story is to be more careful when setting exercises. I am currently finishing a third edition, which contains answers to every question.
By the way, did you answer Exercise 1.2.1 which asks for you to write down all the ways that
square (square (3 + 7)) can be reduced to normal form. It turns out that there are 547 ways!
I think you may be assuming too many restrictions - taking the length of the filtered list seems like the blindingly obvious solution to me.
An couple of alternatives but both involve using some other function that you say wasn't introduced:
sum [1 | x <- xs, x < 0]
maximum (0:[index | (index, ()) <- zip [1..] [() | x <- xs, x < 0]])