Optaplanner Team Rankings - optaplanner

I am trying to rank teams Team1,...,TeamN with Rank1,...,RankN. I am approaching this similarly to the Cloud Balancing example, where processes are assigned to computers, but with N teams being assigned to N ranks. I am using a hard score to say each rank can only have 1 team (similar to the computers having no more than 4 processes), and a soft score related to the rank of other teams each team has won a game against.
1) In my phrasing of the problem, the search space is N^N, as each of N teams can be assigned to each of N ranks. However, I am using a hard constraint to say that the team<->rank relationship should be 1-to-1. Is there a different way I should be structuring my approach to reflect this, so the search space becomes N! instead?
2) At a high level, how is Optaplanner reducing the search space without having domain knowledge of the question? My soft score is being calculated in the Team class and is only being retrieved in the .drl, so how does Optaplanner have the domain knowledge to know when to prune a possible solution? I might know that no possible combination will work with Team1->Rank1, but how can Optaplanner know if it doesn't understand exactly how the score is calculated?


Scrum and Project estimated time [closed]

IF the client asks me for a estimated time for completion for the whole project can this be given using Scrum?
Using for instance the (dreaded) waterfall methodology I will have a technical spec to use to give a half decent estimation.
Yes, you can (and do) estimate Scrum projects. [Note "Scrum" is an English word, not an acronym or abbereviation. It should not be in all-capitals.]
You estimate a Scrum project this way.
Write down the backlog.
Break the backlog into sprints.
Prioritize the sprints from most valuable to least valuable.
Provide this prioritized list of sprints to the customer.
They are free to stop work at any time, since each sprint is useful and the first sprint is the most useful and important part of the system.
That's the estimate. It's theirs to manage.
For a given budget you know how much iterations can be done. The Product Owner should then prioritize the work to get the most value of the product backlog. This is how Agile works, fixed time and team size with variable scope (Agile is about scope management). And once the team velocity is known, you can forecast how much work (in points) can be done (# of sprints x velocity = size of the work that can be achieved).
Often, customer don't get it and wants "everything they think they need at a given time" (i.e. fixed scope). In this situation, you end up doing some kind of upfront analysis to brake down everything into small enough items to estimate them. Once this work has been done, you can forecast how much sprints you'll need by guessing on the velocity (# of sprints = total size / velocity). This is a very common situation for people with a waterfall background and will often lead to inaccurate end date (fixed scope and team size with variable time) because you can't really guess the velocity and the start of a project is the worst moment to do estimates.
In both cases, you'll need the velocity. The problem is that the velocity is actually 1) unknown before the team start to work and 2) will vary over time.
To solve 1), you could estimate guess the velocity as discussed in the second situation but this isn't very "Agile". Ideally, you should instead get the team start working to measure the actual velocity (which will be likely inaccurate during early iterations). An intermediary scenario is to give a first very rough estimate and to come back to the customer after a few iterations with a more precise one, once you've gathered more knowledge about the project and reduce the uncertainty.
To solve 2), I track measured velocity over time and use the highest and lowest velocity and the average velocity of the last 3 sprints as work hypothesis. This allows me to do optimistic, pessimistic and realistic forecasts (respectively).
Absolutely, in scrum you fix cost and time. Then you let the features vary. So you can tell the customer it will be done on XX/XX/XXXX at a cost of $YYY.YY. It is up to them to then prioritize the features they want to ensure the most important ones get done under those constraints.
Yes and No. I believe that Scrum is a great approach in getting the owner involved in the planning of the sprints.
So in the case of estimation, it would be difficult to say "we'll get the project done in 30 days". Instead, the owner will have an expectation on what will get done say in the first and second given week, and a perception of what will get done in 30 days.
In my opinion, this is more valuable than giving an estimate of 30 days and then being lucky, or way off the mark.
Also, you'll have a WAY better estimation of what will be done in the near future. Another great thing about Scrum is that you can tailor your deployment to possible alter or remove items to have a more usable product after 30 days, than potentially something completely unusable using waterfall.
It depends on what you want to predict.
You can promise that a n-week sprint takes exactly n weeks and m sprints take n*m weeks. Therefore schedule estimation is easy. Cost/effort estimation is also easy, given a certain team size and project duration. What you cannot reliably promise is what features eventually get delivered and what don't.
There are four main control variables for a project: scope, cost/effort, schedule, quality. You can choose which variables are the drivers (i.e. fixed) for your project and which are not. You cannot have them all as drivers at the same time at least one variable needs to be kept free to allow for balancing the project.
With traditional waterfall you have fixed scope (the spec) and usually a fixed schedule (target date). You can balance the project by increasing cost/effort (e.g. adding more people, working overtime) or taking shortcuts in quality. These balancing factors do not behave linearly and you get problems in other areas if you push them too far.
With agile including Scrum you have fixed schedule (iterations or sprints) and fixed quality level (definition of done). Cost/effort is proportional to how many people you have in the team. Scope is the main balancing factor. It has the nice property of behaving quite linearly: increases or decreases in scope do not cause hugely non-linear changes in the other drivers. The key to success is feature prioritization to get the maximum value out of the scope you can deliver.
As Steve McConnell says in Software Estimation, whenever you need to provide an estimate:
If you can't count, then compute
If you can't compute, then judge
"Expert judging" or reckoning is a last resource. Try counting real things and/or referring to historical data to compute a meaningful figure.
This is applicable regardless of the method you use, be it Scrum or whatever else.

Predict customer’s payments. [Data science interview case]

I had an interview for a junior data science position for one European bank and I got this case:
We want to develop model that will be able to predict customers future
expenses. Assume we have data about all transactions that were made
by clients (Time, amount, recipient etc) for several years.
I suppose firstly we should try to predict monthly payments, such as insurance, water or the internet, but I'm completely clueless as to which algorithm to use.
Could you kindly help me where to start or what to read?
The term future expenses is ambiguous, in particular it could mean NPV, which is sum of all future weight adjusted cashflows. That would be a bit more complicated, because the discount factor is another random variable. I suppose we should talk about payment estimate over some period, e.g. a month.
I think you were on the right track initially: regular payments (mortgage, internet, utilities, etc) are easier to predict, some of them may even have very strict schedule. In addition, there are random payments that certainly have a different distribution.
I would approach it this way (not 100% sure it's the best way, but at least seems reasonable): fit regular and irregular payments into two different distributions, i.e. different models.
Irregular payments can be modeled via Poisson distribution, which parameters can be inferred from the history. This distribution has high variance, but assuming the distribution won't change drastically in the future, in the long run the computed expectation will be close to the true mean (LLN).
Regular payments over a period can be viewed as a time series. My first idea would be to predict it with machine learning (regression problem, e.g. x(n) -> x(n+1) or [x(n-1), x(n)] -> x(n+1)), because in general there seems to be a pattern. But in reality a choice of algorithm greatly depends on the data. It's possible to have seasonality present in the time series, which of course would affect the choice.
Having the model for both sources of payments, we can estimate total expenses. By the way, if the employer had any comments about this question, please share them.

OptaPlanner for large data sets

I have been asked by a customer to work on a project using Drools. Looking at the Drools documentation I think they are talking about OptaPlanner.
The company takes in transport orders from many customers and links these to bookings on multiple carriers. Orders last year exceeded 100,000. The "optimisation" that currently takes place is based on service, allocation and rate and is linear (each order is assigned to a carrier using the constraints but without any consideration of surrounding orders). The requirement is to hold non-critical orders in a pool for a number of days and optimize the orders in the pool for lowest cost using the same constraints.
Initially they want to run "what if's" over last year's orders to fine-tune the constraints. If this exercise is successful they want to use it in their live system.
My question is whether OptaPlanner is the correct tool for this task, and if so, if there is an example that I can use to get me started.
Take a look at the vehicle routing videos, as it sounds like you have a vehicle routing problem.
If you use just Drools to assign orders, you basically build a Construction Heuristic (= a greedy algorithm). If you use OptaPlanner to assign the orders (and Drools to calculate the quality (= score) of a solution), then you get a better solution. See false assumptions on vehicle routing to understand why.
To scale to 100k orders (= planning entities), use Nearby Selection (which is good up to 10k) and Partitioned Search (which is a sign of weakness but needed above 10k).

OptaPlanner deploy multiple vehicles to same location

I have taken OptaPlanner VRP web example and customized it to my needs. It is working fine except in the below scenario:
Number of vehicles available : 2.
Each vehicle capacity is 6.
And customer demand is 7.
In the above scenario, OptaPlanner is not able to solve the problem. I think it should deploy 2 vehicles to the same customer location, but it is not working as expected.
I am not able to figure out how to configure OptaPlanner rules to make it work.
One way to fix is it to split up the customer with demand 7:
into 2 customers of demand 3 and 4 (all at the same location).
or into 3 customers of demand 3, 2 and 2 (all at the same location).
You 'll see that whenever possible, the same vehicle visits all customers at the same location. For a nicer design, you might even want to refactor Customer into Customer (only 1 per location) and CustomerPart (1 per separate demand of a customer).
Notice that in the original requirements, a demand cannot be split up over 2 vehicles (not because of the constraint rules, but because of the domain design). So using the original implementation to solve your requirements, naturally excludes a number of feasible and potentially more optimal solutions.
The more your split up, the more you open new feasible and potentially new optimal solutions. Of course, the more you split up the demand per customer, the more the search space increases. And it increases heavily. Replacing that customer with 7 customers of demand 1 (and doing that for all customers) is going to be perfect but suffer from major scalability issues.
To be practical, I 'd split up every demand that is higher than half the smallest vehicle's capacity (or even a third of that capacity), but no more. Use the OptaPlanner Benchmarker to measure (instead of geussing) what the result quality and dataset scalability when the split up limit parameter changes, so you can tweak it. (oh and if you end up doing those benchmarks, do share your best parameter value here.)

Machine Learning challenge: technique for collect the coins

Suppose that there is a company that own a couple of vending machines that collect coins. When the coin safe is full, the machine can not sell any new item. To prevent this, the company must collect the coins before that. But if the company send the technician too early, the company loses money because he made an unnecessary trip. The challenge is to predict the right time to collect coins to minimize the cost of operation.
At each visit (to collect or other operations), a reading of the level of coins in the safe is performed. This data contains historical information regarding the safe filling for each machine.
What is the best ML technique, approach to this problem computationally?
This is the two parts to the problem I see:
1) vending machine model
I would probably build a model for each machine using the historic data. Since you said a linear approach is probably not good, you need to think about things that have influence on the filling of a machine, i.e. time related things like week-day dependency, holiday dependency, etc., other influences like the weather maybe? So you need to attach these factors to the historic data to make a good predictive model. Many machine learning techniques can help creating a model and finding real data correlations. Maybe you should create despriptors from your historical data and try to correlate these to the filling state of a machine. PLS can help reducing the descriptor space and find relevant ones. Neuronal Networks are great if you really have no clue about the underlying math of a correlation. Play around with it. But pretty much any machine learning technique should be able to come up with a decent model
2) money collection
Model the cost for a random trip of the technician to a machine. Take into account the filling grade of the machines and the cost of the trip. You can send the technician on virtual collecting tours and calculate the total cost of collecting the money and the revenues from the machine. Use again maybe a neuronal network with some evolutionary strategy to find an optimum of trips and times. you can use the model of the filling grade of the machines during the virtual optimization, since you probably need to estimate the filling grade of the machines in these virtual collection rounds.
Interesting problems you have...