Turing machine: such that for every word w in {a,b}*, it will change every a to b and b to a, and then halt - turing-machines

I need to describe formally (by means of transition function) a Turing Machine such that every word w in {a,b}* , it will change every a to b, and every b to a.
I have had a go, and this is my solution:
(s,a) -> (s,b,R)
(s,b) -> (s,a,R)
(s,blank) -> (n,blank)
where n is the halting state and s is the starting state
Does this work?
Thanks!

The usual approach to questions of this kind is either "test" or "proof". Here, I show how you can easily test if your approach succeeds:
GHCi, version 8.2.2: http://www.haskell.org/ghc/
:? for help
Prelude> :{
Prelude| cnv ('a':xs) = 'b':cnv xs
Prelude| cnv ('b':xs) = 'a':cnv xs
Prelude| cnv [] = []
Prelude| :}
Prelude> cnv "abaaab"
"babbba"
Prelude>
At least in my eyes, this haskell code looks similar enough to your transition specification. The [] case in the third line of the definition of the cnv function stands for "empty list", i.e. it is your halting state. And for the recursion of this function it is the base case where recursion stops.
As for how to formally proof if your automata ends, I am not enough of a computer science guy to help you with that. Someone else might.

Your TM works correctly. If you are comfortable with the notation of TMs it's almost obvious that yours works, so formal proof seems like overkill, but if you really insist a proof by mathematical induction on the length of the input is straightforward.
Claim: the TM given performs the function described.
Proof: the proof is by strong mathematical induction on the length m of the input string initially recorded on the TM's tape.
Base case: if the input tape contains the empty string then the TM executes transition (s,blank) -> (n,blank) and therefore halts without changing any tape symbols. Because the resulting string left on the tape is unchanged, it is the empty string. It is vacuously true that the empty string is equivalent to the empty string after replacing all a's with b's and vice versa; there are no symbols in the string to contradict this.
Induction hypothesis: assume that the TM correctly processes all input strings of length up to and including k.
Induction step: we must show that the TM correctly processes all input strings of length equal to k + 1. Note that any input string of length k + 1 over the alphabet {a, b} is equal to some input string of length k, with either the symbol a or the symbol b appended to it. We have already found that the TM processed strings of length k correctly; that is, it flipped all a's to b's and vice versa. Furthermore, because the TM must have halted, the last transition it executed was (s,blank) -> (n,blank). If instead of having seen blank on the tape at this point, it had instead seen a or b, we would be in the present case: that is, the TM would have made it to the same position by processing the prefix of length k of our string of length k + 1. Instead of executing the transition (s,blank) -> (n,blank) it would be forced to execute one of the transitions (s,a) -> (s,b,R) or (s,b) -> (s,a,R), depending on whether the (k + 1)th symbol is a or b. These transitions will correctly flip the symbol at position k + 1, leaving a string of k + 1 where all symbols were correctly exchanged. The next transition will see blank and transition to the halting state. This concludes the proof.

Related

Split string on multiple delimiters of any length in Haskell

I am attempting a Haskell coding challenge where, given a certain string with a prefix indicating which substrings are delimiting markers, a list needs to be built from the input.
I have already solved the problem for multiple single-length delimiters, but I am stuck with the problem where the delimiters can be any length. I use splitOneOf from Data.List.Split, but this works for character (length 1) delimiters only.
For example, given
input ";,\n1;2,3,4;10",
delimiters are ';' and ','
splitting the input on the above delivers
output [1,2,3,4,10]
The problem I'm facing has two parts:
Firstly, a single delimiter of any length, e.g.
"****\n1****2****3****4****10" should result in the list [1,2,3,4,10].
Secondly, more than one delimiter can be specified, e.g.
input "[***][||]\n1***2||3||4***10",
delimiters are "***" and "||"
splitting the input on the above delivers
output [1,2,3,4,10]
My code for retrieving the delimiter in the case of character delimiters:
--This gives the delimiters as a list of characters, i.e. a String.
getDelimiter::String->[Char]
getDelimiter text = head . splitOn "\n" $ text
--drop "[delimiters]\n" from the input
body::String->String
body text = drop ((length . getDelimiter $ text)+1)) $ text
--returns tuple with fst being the delimiters, snd the body of the input
doc::String->(String,String)
doc text = (getDelimiter text, body text)
--given the delimiters and the body of the input, return a list of strings
numbers::(String,String)->[String]
numbers (delim, rest) = splitOneOf delim rest
--input ",##\n1,2#3#4" gives output ["1","2","3","4"]
getList::String->[String]
getList text = numbers . doc $ text
So my question is, how do I do the processing for when the delimiters are e.g. "***" and "||"?
Any hints are welcome, especially in a functional programming context.
If you don't mind making multiple passes over the input string, you can use splitOn from Data.List.Split, and gradually split the input string using one delimiter at a time.
You can write this fairly succinctly using foldl':
import Data.List
import Data.List.Split
splitOnAnyOf :: Eq a => [[a]] -> [a] -> [[a]]
splitOnAnyOf ds xs = foldl' (\ys d -> ys >>= splitOn d) [xs] ds
Here, the accumulator for the fold operation is a list of strings, or more generally [[a]], so you have to 'lift' xs into a list, using [xs].
Then you fold over the delimiters ds - not the input string to be parsed. For each delimiter d, you split the accumulated list of strings with splitOn, and concatenate them. You could also have used concatMap, but here I arbitrarily chose to use the more general >>= (bind) operator.
This seems to do what is required in the OP:
*Q49228467> splitOnAnyOf [";", ","] "1;2,3,4;10"
["1","2","3","4","10"]
*Q49228467> splitOnAnyOf ["***", "||"] "1***2||3||4***10"
["1","2","3","4","10"]
Since this makes multiple passes over temporary lists, it's most likely not the fastest implementation you can make, but if you don't have too many delimiters, or extremely long lists, this may be good enough.
This problem has two kinds of solutions: the simple, and the efficient. I will not cover the efficient (because it is not simple), though I will hint on it.
But first, the part where you extract the delimiter and body parts of the input, may be simplified with Data.List.break:
delims = splitOn "/" . fst . break (== '\n') -- Presuming the delimiters are delimited with
-- a slash.
body = snd . break (== '\n')
In any way, we may reduce this problem to finding the positions of all the given patterns in a given string. (By saying "string", I do not mean the haskell String. Rather, I mean an arbitrarily long sequence (or even an infinite stream) of any symbols for which an Equality relation is defined, which is typed in Haskell as Eq a => [a]. I hope this is not too confusing.) As soon as we have the positions, we may slice the string to our hearts' content. If we want to deal with an infinite stream, we must obtain the positions incrementally, and yield the results as we go, which is a restriction that must be kept in mind. Haskell is equipped well enough to handle the stream case as well as the finite string.
A simple approach is to cast isPrefixOf on the string, for each of the patterns.
If some of them matches, we replace it with a Nothing.
Otherwise we mark the first symbol as Just and move to the next position.
Thus, we will have replaced all the different delimiters by a single one: Nothing. We may then readily slice the string by it.
This is fairly idiomatic, and I will bring the code to your judgement shortly. The problem with this approach is that it is inefficient: in fact, if a pattern failed to match, we would rather advance by more than one symbol.
It would be more efficient to base our work on the research that has been made into finding patterns in a string; this problem is well known and there are great, intricate algorithms that solve it an order of magnitude faster. These algorithms are designed to work with a single pattern, so some work must be put into adapting them to our case; however, I believe they are adaptable. The simplest and eldest of such algorithms is the KMP, and it is already encoded in Haskell. You may wish to take arms and generalize it − a quick path to some amount of fame.
Here is the code:
module SplitSubstr where
-- stackoverflow.com/questions/49228467
import Data.List (unfoldr, isPrefixOf, elemIndex)
import Data.List.Split (splitWhen) -- Package `split`.
import Data.Maybe (catMaybes, isNothing)
-- | Split a (possibly infinite) string at the occurrences of any of the given delimiters.
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] "la||la***fa"
-- ["la","la","fa"]
--
-- λ take 10 $ splitOnSubstrs ["||", "***"] (cycle "la||la***fa||")
-- ["la","la","fa","la","la","fa","la","la","fa","la"]
--
splitOnSubstrs :: [String] -> String -> [String]
splitOnSubstrs delims
= fmap catMaybes -- At this point, there will be only `Just` elements left.
. splitWhen isNothing -- Now we may split at nothings.
. unfoldr f -- Replace the occurences of delimiters with a `Nothing`.
where
-- | This is the base case. It will terminate the `unfoldr` process.
f [ ] = Nothing
-- | This is the recursive case. It is divided into 2 cases:
-- * One of the delimiters may match. We will then replace it with a Nothing.
-- * Otherwise, we will `Just` return the current element.
--
-- Notice that, if there are several patterns that match at this point, we will use the first one.
-- You may sort the patterns by length to always match the longest or the shortest. If you desire
-- more complicated behaviour, you must plug a more involved logic here. In any way, the index
-- should point to one of the patterns that matched.
--
-- vvvvvvvvvvvvvv
f body#(x:xs) = case elemIndex True $ (`isPrefixOf` body) <$> delims of
Just index -> return (Nothing, drop (length $ delims !! index) body)
Nothing -> return (Just x, xs)
It might happen that you will not find this code straightforward. Specifically, the unfoldr part is somewhat dense, so I will add a few words about it.
unfoldr f is an embodiment of a recursion scheme. f is a function that may chip a part from the body: f :: (body -> Maybe (chip, body)).
As long as it keeps chipping, unfoldr keeps applying it to the body. This is called recursive case.
Once it fails (returning Nothing), unfoldr stops and hands you all the chips it thus collected. This is called base case.
In our case, f takes symbols from the string, and fails once the string is empty.
That's it. I hope you send me a postcard when you receive a Turing award for a fast splitting algorithm.

Lemma as a type in a record

Beginner here!
How do we I interpret a record that looks something like this?
Record test A B :=
{
CA: forall m, A m;
CB: forall a b m, CA m ==> B(a,b);
}
I am trying to get a sense of what an instance of this record would look like and moreover, what it means to have a quantified lemma as a type.
What you are writing cannot make sense because the notation _ ==> _ is supposed to link two boolean values. But CA has type A m, which is itself a type, not a boolean value.
One possibility to go forward would be to make CA a boolean function that could represent the A predicate.
Another difficulty with your hypothetical record is that we don't know what are the input types for A and B, so I will assume we live in an ambient type T over which quantifications appear. So here is a variant:
Record test (T : Type) (A : T -> Prop) (B : T * T -> bool) :=
{
CA : T -> bool;
CA_A : forall m, CA m = true -> A m;
CB : forall a b m, (CA m ==> B(a, b)) = true
}.
This example forces you to understand that there are two distinct concepts in this logical language: bool values and Prop values. They represent different things, which can sometimes be amalgamated but you need to make the distinction clear in your head to leave the category of beginner.
For your last sentence "what it means to have a quantified lemma as a type" here is another explanation.
When programming with a conventional programming language, you can return arrays of integers. However, you cannot be explicit and say that you want to return an array of integers of a specific length. In Gallina (the basic programming language of Coq), you can define a type of arrays of length 10. Let us assume that such a type would be written array n. So array 10 and array 11 would be two different types. A function that takes as input a number n and return as output an array of length n would have the following type:
forall n, array n
So an object that has a quantified formula as a type simply is a function.
From a logical point of view, the statement forall n, array n is usually read as for every n there exists an array of size n. This statement is probably no surprise to you.
So the type of an array depends on an indice. Now we can think of another type, for example, the type of proofs that n is prime. Let's assume this type is written prime n. Surely, there are numbers that are not prime, so for example the type prime 4 should not contain any proof at all. Now I may write a function called test_prime : nat -> bool with the property that when it returns true I have the guarantee that the input is prime. This would be written as such:
forall n, test_prime n = true -> prime n
Now, if I want to define a collection of all correct prime testing functions, I would want to associate in one piece of data the function and the proof that it is correct, so I would define the following data type.
Record certified_prime_test :=
{
test_prime : nat -> bool;
certificate : forall n, test_prime n = true -> prime nat
}.
So records that contain universally quantified formulas can be in one of these two categories: either one component is one of this function whose output varies across several types of the same family (like in the example of array) or one of the components actually brings more logical information about another component which is a function. In the certified_prime_test example, the certificate component brings more information about the test_prime function.

How to efficiently find identical substrings of a specified length in a collection of strings?

I have a collection S, typically containing 10-50 long strings. For illustrative purposes, suppose the length of each string ranges between 1000 and 10000 characters.
I would like to find strings of specified length k (typically in the range of 5 to 20) that are substrings of every string in S. This can obviously be done using a naive approach - enumerating every k-length substring in S[0] and checking if they exist in every other element of S.
Are there more efficient ways of approaching the problem? As far as I can tell, there are some similarities between this and the longest common subsequence problem, but my understanding of LCS is limited and I'm not sure how it could be adapted to the situation where we bound the desired common substring length to k, or if subsequence techniques can be applied to finding substrings.
Here's one fairly simple algorithm, which should be reasonably fast.
Using a rolling hash as in the Rabin-Karp string search algorithm, construct a hash table H0 of all the |S0|-k+1 length k substrings of S0. That's roughly O(|S0|) since each hash is computed in O(1) from the previous hash, but it will take longer if there are collisions or duplicate substrings. Using a better hash will help you with collisions but if there are a lot of k-length duplicate substrings in S0 then you could end up using O(k|S0|).
Now use the same rolling hash on S1. This time, look each substring up in H0 and if you find it, remove it from H0 and insert it into a new table H1. Again, this should be around O(|S1|) unless you have some pathological case, like both S0 and S1 are just long repetitions of the same character. (It's also going to be suboptimal if S0 and S0 are the same string, or have lots of overlapping pieces.)
Repeat step 2 for each Si, each time creating a new hash table. (At the end of each iteration of step 2, you can delete the hash table from the previous step.)
At the end, the last hash table will contain all the common k-length substrings.
The total run time should be about O(Σ|Si|) but in the worst case it could be O(kΣ|Si|). Even so, with the problem size as described, it should run in acceptable time.
Some thoughts (N is number of strings, M is average length, K is needed substring size):
Approach 1:
Walk through all strings, computing rolling hash for k-length strings and storing these hashes in the map (store tuple {key: hash; string_num; position})
time O(NxM), space O(NxM)
Extract groups with equal hash, check step-by-step:
1) that size of group >= number of strings
2) all strings are represented in this group 3
3) thorough checking of real substrings for equality (sometimes hashes of distinct substrings might coincide)
Approach 2:
Build suffix array for every string
time O(N x MlogM) space O(N x M)
Find intersection of suffix arrays for the first string pair, using merge-like approach (suffixes are sorted), considering only part of suffixes of length k, then continue with the next string and so on
I would treat each long string as a collection of overlapped short strings, so ABCDEFGHI becomes ABCDE, BCDEF, CDEFG, DEFGH, EFGHI. You can represent each short string as a pair of indexes, one specifying the long string and one the starting offset in that string (if this strikes you as naive, skip to the end).
I would then sort each collection into ascending order.
Now you can find the short strings common to the first two collection by merging the sorted lists of indexes, keeping only those from the first collection which are also present in the second collection. Check the survivors of this against the third collection, and so on and the survivors at the end correspond to those short strings which are present in all long strings.
(Alternatively you could maintain a set of pointers into each sorted list and repeatedly look to see if every pointer points at short strings with the same text, then advancing the pointer which points at the smallest short string).
Time is O(n log n) for the initial sort, which dominates. In the worst case - e.g. when every string is AAAAAAAA..AA - there is a factor of k on top of this, because all string compares check all characters and take time k. Hopefully, there is a clever way round this with https://en.wikipedia.org/wiki/Suffix_array which allows you to sort in time O(n) rather than O(nk log n) and the https://en.wikipedia.org/wiki/LCP_array, which should allow you to skip some characters when comparing substrings from different suffix arrays.
Thinking about this again, I think the usual suffix array trick of concatenating all of the strings in question, separated by a character not found in any of them, works here. If you look at the LCP of the resulting suffix array you can split it into sections, splitting at points where where the difference between suffixes occurs less than k characters in. Now each offset in any particular section starts with the same k characters. Now look at the offsets in each section and check to see if there is at least one offset from every possible starting string. If so, this k-character sequence occurs in all starting strings, but not otherwise. (There are suffix array constructions which work with arbitrarily large alphabets so you can always expand your alphabet to produce a character not in any string, if necessary).
I would try a simple method using HashSets:
Build a HashSet for each long string in S with all its k-strings.
Sort the sets by number of elements.
Scan the first set.
Lookup the term in the other sets.
The first step takes care of repetitions in each long string.
The second ensures the minimum number of comparisons.
let getHashSet k (lstr:string) =
let strs = System.Collections.Generic.HashSet<string>()
for i in 0..lstr.Length - k do
strs.Add lstr.[i..i + k - 1] |> ignore
strs
let getCommons k lstrs =
let strss = lstrs |> Seq.map (getHashSet k) |> Seq.sortBy (fun strs -> strs.Count)
match strss |> Seq.tryHead with
| None -> [||]
| Some h ->
let rest = Seq.tail strss |> Seq.toArray
[| for s in h do
if rest |> Array.forall (fun strs -> strs.Contains s) then yield s
|]
Test:
let random = System.Random System.DateTime.Now.Millisecond
let generateString n =
[| for i in 1..n do
yield random.Next 20 |> (+) 65 |> System.Convert.ToByte
|] |> System.Text.Encoding.ASCII.GetString
[ for i in 1..3 do yield generateString 10000 ]
|> getCommons 4
|> fun l -> printfn "found %d\n %A" l.Length l
result:
found 40
[|"PPTD"; "KLNN"; "FTSR"; "CNBM"; "SSHG"; "SHGO"; "LEHS"; "BBPD"; "LKQP"; "PFPH";
"AMMS"; "BEPC"; "HIPL"; "PGBJ"; "DDMJ"; "MQNO"; "SOBJ"; "GLAG"; "GBOC"; "NSDI";
"JDDL"; "OOJO"; "NETT"; "TAQN"; "DHME"; "AHDR"; "QHTS"; "TRQO"; "DHPM"; "HIMD";
"NHGH"; "EARK"; "ELNF"; "ADKE"; "DQCC"; "GKJA"; "ASME"; "KFGM"; "AMKE"; "JJLJ"|]
Here it is in fiddle: https://dotnetfiddle.net/ZK8DCT

Concatenation of a infinite language and a finite language

Why is it that the concatenation of a infinite language and a finite language always finite iff the language is not the empty set? I thought concatenating an infinite language with the empty set would just be the infinite language.
This statement is false. Try concatenating Σ* and Σ. This gives back Σ+, which is infinite.
I think that you meant
The concatenation of an infinite language I and a finite language F is infinite iff F ≠ &emptyset;
This statement is true. If F = ∅ then IF is the empty set because the concatenation of any language and the empty language is the empty language. Specifically, IF = { wx | w in I and x in F }, so if F is empty, there are no x's that satisfy the condition x in F.
On the other hand, if x ≠ ∅ we can prove that IF is infinite. Consider any string w &in; I whose length is longer than any string in F. Then the set wF = { wx | x in F } is infinite because there's at least one string in wF in-between any two multiples of |w|. Since wF &subseteq; IF, this means that IF is infinite.
Hope this helps!

Convert CHAR to ASCII in LISP [closed]

I want to write a program in LISP to get a string from the user and return the string formed by adding 1 to each char-code of the string. For example:
input: "hello123"
output: "ifmmp234"
I thought maybe I should convert the characters one by one to ASCII and then do what I want to do.
Any help with this will be so much appreciated..
Thanks
This is the code I developed. It gives me NIL in the output however. Can you help me with this:
(defun esi (n)
(setf m 0)
(loop (when (< m (length n))
(return))
(code-char (+ 1 (char-code (char n m))))
(+ 1 m)))
Look at the functions char-code and code-char.
EDIT: Regarding your code sample:
It seems that the input to your function should be a string. Name it such, e.g. string.
That (setf m 0) is setting a free variable. In this context, I must assume that m is never defined anywhere, so the behaviour is undefined. You should use, for example, let instead to establish a local binding. Hint: most looping constructs also give ways to establish local bindings.
The only exit out of your loop is that (return). Since it does not get any parameters, it will always return nil. You need to accumulate the new string somewhere and finally return it.
Functions in Lisp mostly do not modify their arguments. (+ 1 m) does not modify m. It just returns a value that is one greater than m. Likewise, code-char does not modify its argument, but returns a new value that is the character corresponding to the argument. You need to bind or assign these values.
That finishing condition is wrong. It will either terminate directly or, if the input string is empty, never terminate.
There are quite a few ways of doing what you want. Let's start with a function that returns a character one code-point later (there are some boundary issues here, let's ignore that for now).
(defun next-codepoint (char) (code-char (1+ (char-code char))))
Now, this operates on characters. Happily, a string is, essentially, a sequence of characters. Sequence operations should, in general, send you in the direction of the MAP family.
So, we have:
(defun nextify-string (string) (map 'string #'next-codepoint string))
Taking what's happening step by step:
For each character in an input stringm we do:
We convert a character to a code attribute.
We increment this
We convert it back for a character
Then we assemble all of these into a return value.

Resources