Optaplanner VRP incremental score - tour length - optaplanner

I want to create incremental score calculation for VRP to minimalize cost, where cost is the variable cost of the vehicle (usd/km) plus the driver cost (usd/h) and I also have a hard constraint which is the total tour length (8h/driver), (for which I added driving speed parameter to all vehicle and also driver usd/h).
I've already implemented it in easy score calculation, but with big dataset it has to run lot more then I want, so I tried the incremental way.
I have tried to simply insert additional lines to the original incremental score calculation, but it seems to fail.
As I read other sources and i think the problem may comes from the update sequence of previousStandstill, vehicle and nextCustomer. So when it updates previousStandstill the customers sometimes doesn't have a vehicle added yet, so i can't pair the driving time (distance from previousStandstill / driving speed) to a given vehicle.
I have already checked that the Customer domain has the #PlanningEntity annotation and the getVehicle getter has the #AnchorShadowVariable annotation.
I really got stuck here. Any help would be appreciated.
Update:
I've added some code, to the incrementalScore -> insertPreviousStandstill to get the vehicle of the previousStandstill:
Vehicle vehicle = null;
Vehicle vehicle2 = null;
Standstill previousStandstill2 = customer.getPreviousStandstill();
while (vehicle == null) {
if (previousStandstill2 instanceof Vehicle) {
vehicle = previousStandstill2.getVehicle();
} else {
Customer customer2 = (Customer) previousStandstill2;
vehicle2 = customer2.getVehicle();
if (vehicle2 == null) {
previousStandstill2 = customer2.getPreviousStandstill();
} else {
vehicle = vehicle2;
}
}
}
Long vehicle_varcost = vehicle.getVarCost();
softScore -= customer.getDistanceFromPreviousStandstill() * vehicle_varcost / 1000;
(also added vehicle_varcost for all before and after change code)
It works fine until Construction Heuristic (i get the same result with EasyScore), but if fails afterwards.

Related

What are good design practices when working with Entity Framework

This will apply mostly for an asp.net application where the data is not accessed via soa. Meaning that you get access to the objects loaded from the framework, not Transfer Objects, although some recommendation still apply.
This is a community post, so please add to it as you see fit.
Applies to: Entity Framework 1.0 shipped with Visual Studio 2008 sp1.
Why pick EF in the first place?
Considering it is a young technology with plenty of problems (see below), it may be a hard sell to get on the EF bandwagon for your project. However, it is the technology Microsoft is pushing (at the expense of Linq2Sql, which is a subset of EF). In addition, you may not be satisfied with NHibernate or other solutions out there. Whatever the reasons, there are people out there (including me) working with EF and life is not bad.make you think.
EF and inheritance
The first big subject is inheritance. EF does support mapping for inherited classes that are persisted in 2 ways: table per class and table the hierarchy. The modeling is easy and there are no programming issues with that part.
(The following applies to table per class model as I don't have experience with table per hierarchy, which is, anyway, limited.) The real problem comes when you are trying to run queries that include one or many objects that are part of an inheritance tree: the generated sql is incredibly awful, takes a long time to get parsed by the EF and takes a long time to execute as well. This is a real show stopper. Enough that EF should probably not be used with inheritance or as little as possible.
Here is an example of how bad it was. My EF model had ~30 classes, ~10 of which were part of an inheritance tree. On running a query to get one item from the Base class, something as simple as Base.Get(id), the generated SQL was over 50,000 characters. Then when you are trying to return some Associations, it degenerates even more, going as far as throwing SQL exceptions about not being able to query more than 256 tables at once.
Ok, this is bad, EF concept is to allow you to create your object structure without (or with as little as possible) consideration on the actual database implementation of your table. It completely fails at this.
So, recommendations? Avoid inheritance if you can, the performance will be so much better. Use it sparingly where you have to. In my opinion, this makes EF a glorified sql-generation tool for querying, but there are still advantages to using it. And ways to implement mechanism that are similar to inheritance.
Bypassing inheritance with Interfaces
First thing to know with trying to get some kind of inheritance going with EF is that you cannot assign a non-EF-modeled class a base class. Don't even try it, it will get overwritten by the modeler. So what to do?
You can use interfaces to enforce that classes implement some functionality. For example here is a IEntity interface that allow you to define Associations between EF entities where you don't know at design time what the type of the entity would be.
public enum EntityTypes{ Unknown = -1, Dog = 0, Cat }
public interface IEntity
{
int EntityID { get; }
string Name { get; }
Type EntityType { get; }
}
public partial class Dog : IEntity
{
// implement EntityID and Name which could actually be fields
// from your EF model
Type EntityType{ get{ return EntityTypes.Dog; } }
}
Using this IEntity, you can then work with undefined associations in other classes
// lets take a class that you defined in your model.
// that class has a mapping to the columns: PetID, PetType
public partial class Person
{
public IEntity GetPet()
{
return IEntityController.Get(PetID,PetType);
}
}
which makes use of some extension functions:
public class IEntityController
{
static public IEntity Get(int id, EntityTypes type)
{
switch (type)
{
case EntityTypes.Dog: return Dog.Get(id);
case EntityTypes.Cat: return Cat.Get(id);
default: throw new Exception("Invalid EntityType");
}
}
}
Not as neat as having plain inheritance, particularly considering you have to store the PetType in an extra database field, but considering the performance gains, I would not look back.
It also cannot model one-to-many, many-to-many relationship, but with creative uses of 'Union' it could be made to work. Finally, it creates the side effet of loading data in a property/function of the object, which you need to be careful about. Using a clear naming convention like GetXYZ() helps in that regards.
Compiled Queries
Entity Framework performance is not as good as direct database access with ADO (obviously) or Linq2SQL. There are ways to improve it however, one of which is compiling your queries. The performance of a compiled query is similar to Linq2Sql.
What is a compiled query? It is simply a query for which you tell the framework to keep the parsed tree in memory so it doesn't need to be regenerated the next time you run it. So the next run, you will save the time it takes to parse the tree. Do not discount that as it is a very costly operation that gets even worse with more complex queries.
There are 2 ways to compile a query: creating an ObjectQuery with EntitySQL and using CompiledQuery.Compile() function. (Note that by using an EntityDataSource in your page, you will in fact be using ObjectQuery with EntitySQL, so that gets compiled and cached).
An aside here in case you don't know what EntitySQL is. It is a string-based way of writing queries against the EF. Here is an example: "select value dog from Entities.DogSet as dog where dog.ID = #ID". The syntax is pretty similar to SQL syntax. You can also do pretty complex object manipulation, which is well explained [here][1].
Ok, so here is how to do it using ObjectQuery<>
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance));
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
The first time you run this query, the framework will generate the expression tree and keep it in memory. So the next time it gets executed, you will save on that costly step. In that example EnablePlanCaching = true, which is unnecessary since that is the default option.
The other way to compile a query for later use is the CompiledQuery.Compile method. This uses a delegate:
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
ctx.DogSet.FirstOrDefault(it => it.ID == id));
or using linq
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
(from dog in ctx.DogSet where dog.ID == id select dog).FirstOrDefault());
to call the query:
query_GetDog.Invoke( YourContext, id );
The advantage of CompiledQuery is that the syntax of your query is checked at compile time, where as EntitySQL is not. However, there are other consideration...
Includes
Lets say you want to have the data for the dog owner to be returned by the query to avoid making 2 calls to the database. Easy to do, right?
EntitySQL
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance)).Include("Owner");
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
CompiledQuery
static readonly Func<Entities, int, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, Dog>((ctx, id) =>
(from dog in ctx.DogSet.Include("Owner") where dog.ID == id select dog).FirstOrDefault());
Now, what if you want to have the Include parametrized? What I mean is that you want to have a single Get() function that is called from different pages that care about different relationships for the dog. One cares about the Owner, another about his FavoriteFood, another about his FavotireToy and so on. Basicly, you want to tell the query which associations to load.
It is easy to do with EntitySQL
public Dog Get(int id, string include)
{
string query = "select value dog " +
"from Entities.DogSet as dog " +
"where dog.ID = #ID";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>(query, EntityContext.Instance))
.IncludeMany(include);
oQuery.Parameters.Add(new ObjectParameter("ID", id));
oQuery.EnablePlanCaching = true;
return oQuery.FirstOrDefault();
}
The include simply uses the passed string. Easy enough. Note that it is possible to improve on the Include(string) function (that accepts only a single path) with an IncludeMany(string) that will let you pass a string of comma-separated associations to load. Look further in the extension section for this function.
If we try to do it with CompiledQuery however, we run into numerous problems:
The obvious
static readonly Func<Entities, int, string, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) =>
(from dog in ctx.DogSet.Include(include) where dog.ID == id select dog).FirstOrDefault());
will choke when called with:
query_GetDog.Invoke( YourContext, id, "Owner,FavoriteFood" );
Because, as mentionned above, Include() only wants to see a single path in the string and here we are giving it 2: "Owner" and "FavoriteFood" (which is not to be confused with "Owner.FavoriteFood"!).
Then, let's use IncludeMany(), which is an extension function
static readonly Func<Entities, int, string, Dog> query_GetDog =
CompiledQuery.Compile<Entities, int, string, Dog>((ctx, id, include) =>
(from dog in ctx.DogSet.IncludeMany(include) where dog.ID == id select dog).FirstOrDefault());
Wrong again, this time it is because the EF cannot parse IncludeMany because it is not part of the functions that is recognizes: it is an extension.
Ok, so you want to pass an arbitrary number of paths to your function and Includes() only takes a single one. What to do? You could decide that you will never ever need more than, say 20 Includes, and pass each separated strings in a struct to CompiledQuery. But now the query looks like this:
from dog in ctx.DogSet.Include(include1).Include(include2).Include(include3)
.Include(include4).Include(include5).Include(include6)
.[...].Include(include19).Include(include20) where dog.ID == id select dog
which is awful as well. Ok, then, but wait a minute. Can't we return an ObjectQuery<> with CompiledQuery? Then set the includes on that? Well, that what I would have thought so as well:
static readonly Func<Entities, int, ObjectQuery<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, ObjectQuery<Dog>>((ctx, id) =>
(ObjectQuery<Dog>)(from dog in ctx.DogSet where dog.ID == id select dog));
public Dog GetDog( int id, string include )
{
ObjectQuery<Dog> oQuery = query_GetDog(id);
oQuery = oQuery.IncludeMany(include);
return oQuery.FirstOrDefault;
}
That should have worked, except that when you call IncludeMany (or Include, Where, OrderBy...) you invalidate the cached compiled query because it is an entirely new one now! So, the expression tree needs to be reparsed and you get that performance hit again.
So what is the solution? You simply cannot use CompiledQueries with parametrized Includes. Use EntitySQL instead. This doesn't mean that there aren't uses for CompiledQueries. It is great for localized queries that will always be called in the same context. Ideally CompiledQuery should always be used because the syntax is checked at compile time, but due to limitation, that's not possible.
An example of use would be: you may want to have a page that queries which two dogs have the same favorite food, which is a bit narrow for a BusinessLayer function, so you put it in your page and know exactly what type of includes are required.
Passing more than 3 parameters to a CompiledQuery
Func is limited to 5 parameters, of which the last one is the return type and the first one is your Entities object from the model. So that leaves you with 3 parameters. A pitance, but it can be improved on very easily.
public struct MyParams
{
public string param1;
public int param2;
public DateTime param3;
}
static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) =>
from dog in ctx.DogSet where dog.Age == myParams.param2 && dog.Name == myParams.param1 and dog.BirthDate > myParams.param3 select dog);
public List<Dog> GetSomeDogs( int age, string Name, DateTime birthDate )
{
MyParams myParams = new MyParams();
myParams.param1 = name;
myParams.param2 = age;
myParams.param3 = birthDate;
return query_GetDog(YourContext,myParams).ToList();
}
Return Types (this does not apply to EntitySQL queries as they aren't compiled at the same time during execution as the CompiledQuery method)
Working with Linq, you usually don't force the execution of the query until the very last moment, in case some other functions downstream wants to change the query in some way:
static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) =>
from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog);
public IEnumerable<Dog> GetSomeDogs( int age, string name )
{
return query_GetDog(YourContext,age,name);
}
public void DataBindStuff()
{
IEnumerable<Dog> dogs = GetSomeDogs(4,"Bud");
// but I want the dogs ordered by BirthDate
gridView.DataSource = dogs.OrderBy( it => it.BirthDate );
}
What is going to happen here? By still playing with the original ObjectQuery (that is the actual return type of the Linq statement, which implements IEnumerable), it will invalidate the compiled query and be force to re-parse. So, the rule of thumb is to return a List<> of objects instead.
static readonly Func<Entities, int, string, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, int, string, IEnumerable<Dog>>((ctx, age, name) =>
from dog in ctx.DogSet where dog.Age == age && dog.Name == name select dog);
public List<Dog> GetSomeDogs( int age, string name )
{
return query_GetDog(YourContext,age,name).ToList(); //<== change here
}
public void DataBindStuff()
{
List<Dog> dogs = GetSomeDogs(4,"Bud");
// but I want the dogs ordered by BirthDate
gridView.DataSource = dogs.OrderBy( it => it.BirthDate );
}
When you call ToList(), the query gets executed as per the compiled query and then, later, the OrderBy is executed against the objects in memory. It may be a little bit slower, but I'm not even sure. One sure thing is that you have no worries about mis-handling the ObjectQuery and invalidating the compiled query plan.
Once again, that is not a blanket statement. ToList() is a defensive programming trick, but if you have a valid reason not to use ToList(), go ahead. There are many cases in which you would want to refine the query before executing it.
Performance
What is the performance impact of compiling a query? It can actually be fairly large. A rule of thumb is that compiling and caching the query for reuse takes at least double the time of simply executing it without caching. For complex queries (read inherirante), I have seen upwards to 10 seconds.
So, the first time a pre-compiled query gets called, you get a performance hit. After that first hit, performance is noticeably better than the same non-pre-compiled query. Practically the same as Linq2Sql
When you load a page with pre-compiled queries the first time you will get a hit. It will load in maybe 5-15 seconds (obviously more than one pre-compiled queries will end up being called), while subsequent loads will take less than 300ms. Dramatic difference, and it is up to you to decide if it is ok for your first user to take a hit or you want a script to call your pages to force a compilation of the queries.
Can this query be cached?
{
Dog dog = from dog in YourContext.DogSet where dog.ID == id select dog;
}
No, ad-hoc Linq queries are not cached and you will incur the cost of generating the tree every single time you call it.
Parametrized Queries
Most search capabilities involve heavily parametrized queries. There are even libraries available that will let you build a parametrized query out of lamba expressions. The problem is that you cannot use pre-compiled queries with those. One way around that is to map out all the possible criteria in the query and flag which one you want to use:
public struct MyParams
{
public string name;
public bool checkName;
public int age;
public bool checkAge;
}
static readonly Func<Entities, MyParams, IEnumerable<Dog>> query_GetDog =
CompiledQuery.Compile<Entities, MyParams, IEnumerable<Dog>>((ctx, myParams) =>
from dog in ctx.DogSet
where (myParams.checkAge == true && dog.Age == myParams.age)
&& (myParams.checkName == true && dog.Name == myParams.name )
select dog);
protected List<Dog> GetSomeDogs()
{
MyParams myParams = new MyParams();
myParams.name = "Bud";
myParams.checkName = true;
myParams.age = 0;
myParams.checkAge = false;
return query_GetDog(YourContext,myParams).ToList();
}
The advantage here is that you get all the benifits of a pre-compiled quert. The disadvantages are that you most likely will end up with a where clause that is pretty difficult to maintain, that you will incur a bigger penalty for pre-compiling the query and that each query you run is not as efficient as it could be (particularly with joins thrown in).
Another way is to build an EntitySQL query piece by piece, like we all did with SQL.
protected List<Dod> GetSomeDogs( string name, int age)
{
string query = "select value dog from Entities.DogSet where 1 = 1 ";
if( !String.IsNullOrEmpty(name) )
query = query + " and dog.Name == #Name ";
if( age > 0 )
query = query + " and dog.Age == #Age ";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext );
if( !String.IsNullOrEmpty(name) )
oQuery.Parameters.Add( new ObjectParameter( "Name", name ) );
if( age > 0 )
oQuery.Parameters.Add( new ObjectParameter( "Age", age ) );
return oQuery.ToList();
}
Here the problems are:
- there is no syntax checking during compilation
- each different combination of parameters generate a different query which will need to be pre-compiled when it is first run. In this case, there are only 4 different possible queries (no params, age-only, name-only and both params), but you can see that there can be way more with a normal world search.
- Noone likes to concatenate strings!
Another option is to query a large subset of the data and then narrow it down in memory. This is particularly useful if you are working with a definite subset of the data, like all the dogs in a city. You know there are a lot but you also know there aren't that many... so your CityDog search page can load all the dogs for the city in memory, which is a single pre-compiled query and then refine the results
protected List<Dod> GetSomeDogs( string name, int age, string city)
{
string query = "select value dog from Entities.DogSet where dog.Owner.Address.City == #City ";
ObjectQuery<Dog> oQuery = new ObjectQuery<Dog>( query, YourContext );
oQuery.Parameters.Add( new ObjectParameter( "City", city ) );
List<Dog> dogs = oQuery.ToList();
if( !String.IsNullOrEmpty(name) )
dogs = dogs.Where( it => it.Name == name );
if( age > 0 )
dogs = dogs.Where( it => it.Age == age );
return dogs;
}
It is particularly useful when you start displaying all the data then allow for filtering.
Problems:
- Could lead to serious data transfer if you are not careful about your subset.
- You can only filter on the data that you returned. It means that if you don't return the Dog.Owner association, you will not be able to filter on the Dog.Owner.Name
So what is the best solution? There isn't any. You need to pick the solution that works best for you and your problem:
- Use lambda-based query building when you don't care about pre-compiling your queries.
- Use fully-defined pre-compiled Linq query when your object structure is not too complex.
- Use EntitySQL/string concatenation when the structure could be complex and when the possible number of different resulting queries are small (which means fewer pre-compilation hits).
- Use in-memory filtering when you are working with a smallish subset of the data or when you had to fetch all of the data on the data at first anyway (if the performance is fine with all the data, then filtering in memory will not cause any time to be spent in the db).
Singleton access
The best way to deal with your context and entities accross all your pages is to use the singleton pattern:
public sealed class YourContext
{
private const string instanceKey = "On3GoModelKey";
YourContext(){}
public static YourEntities Instance
{
get
{
HttpContext context = HttpContext.Current;
if( context == null )
return Nested.instance;
if (context.Items[instanceKey] == null)
{
On3GoEntities entity = new On3GoEntities();
context.Items[instanceKey] = entity;
}
return (YourEntities)context.Items[instanceKey];
}
}
class Nested
{
// Explicit static constructor to tell C# compiler
// not to mark type as beforefieldinit
static Nested()
{
}
internal static readonly YourEntities instance = new YourEntities();
}
}
NoTracking, is it worth it?
When executing a query, you can tell the framework to track the objects it will return or not. What does it mean? With tracking enabled (the default option), the framework will track what is going on with the object (has it been modified? Created? Deleted?) and will also link objects together, when further queries are made from the database, which is what is of interest here.
For example, lets assume that Dog with ID == 2 has an owner which ID == 10.
Dog dog = (from dog in YourContext.DogSet where dog.ID == 2 select dog).FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
Person owner = (from o in YourContext.PersonSet where o.ID == 10 select dog).FirstOrDefault();
//dog.OwnerReference.IsLoaded == true;
If we were to do the same with no tracking, the result would be different.
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)
(from dog in YourContext.DogSet where dog.ID == 2 select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
Dog dog = oDogQuery.FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)
(from o in YourContext.PersonSet where o.ID == 10 select o);
oPersonQuery.MergeOption = MergeOption.NoTracking;
Owner owner = oPersonQuery.FirstOrDefault();
//dog.OwnerReference.IsLoaded == false;
Tracking is very useful and in a perfect world without performance issue, it would always be on. But in this world, there is a price for it, in terms of performance. So, should you use NoTracking to speed things up? It depends on what you are planning to use the data for.
Is there any chance that the data your query with NoTracking can be used to make update/insert/delete in the database? If so, don't use NoTracking because associations are not tracked and will causes exceptions to be thrown.
In a page where there are absolutly no updates to the database, you can use NoTracking.
Mixing tracking and NoTracking is possible, but it requires you to be extra careful with updates/inserts/deletes. The problem is that if you mix then you risk having the framework trying to Attach() a NoTracking object to the context where another copy of the same object exist with tracking on. Basicly, what I am saying is that
Dog dog1 = (from dog in YourContext.DogSet where dog.ID == 2).FirstOrDefault();
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)
(from dog in YourContext.DogSet where dog.ID == 2 select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
Dog dog2 = oDogQuery.FirstOrDefault();
dog1 and dog2 are 2 different objects, one tracked and one not. Using the detached object in an update/insert will force an Attach() that will say "Wait a minute, I do already have an object here with the same database key. Fail". And when you Attach() one object, all of its hierarchy gets attached as well, causing problems everywhere. Be extra careful.
How much faster is it with NoTracking
It depends on the queries. Some are much more succeptible to tracking than other. I don't have a fast an easy rule for it, but it helps.
So I should use NoTracking everywhere then?
Not exactly. There are some advantages to tracking object. The first one is that the object is cached, so subsequent call for that object will not hit the database. That cache is only valid for the lifetime of the YourEntities object, which, if you use the singleton code above, is the same as the page lifetime. One page request == one YourEntity object. So for multiple calls for the same object, it will load only once per page request. (Other caching mechanism could extend that).
What happens when you are using NoTracking and try to load the same object multiple times? The database will be queried each time, so there is an impact there. How often do/should you call for the same object during a single page request? As little as possible of course, but it does happens.
Also remember the piece above about having the associations connected automatically for your? You don't have that with NoTracking, so if you load your data in multiple batches, you will not have a link to between them:
ObjectQuery<Dog> oDogQuery = (ObjectQuery<Dog>)(from dog in YourContext.DogSet select dog);
oDogQuery.MergeOption = MergeOption.NoTracking;
List<Dog> dogs = oDogQuery.ToList();
ObjectQuery<Person> oPersonQuery = (ObjectQuery<Person>)(from o in YourContext.PersonSet select o);
oPersonQuery.MergeOption = MergeOption.NoTracking;
List<Person> owners = oPersonQuery.ToList();
In this case, no dog will have its .Owner property set.
Some things to keep in mind when you are trying to optimize the performance.
No lazy loading, what am I to do?
This can be seen as a blessing in disguise. Of course it is annoying to load everything manually. However, it decreases the number of calls to the db and forces you to think about when you should load data. The more you can load in one database call the better. That was always true, but it is enforced now with this 'feature' of EF.
Of course, you can call
if( !ObjectReference.IsLoaded ) ObjectReference.Load();
if you want to, but a better practice is to force the framework to load the objects you know you will need in one shot. This is where the discussion about parametrized Includes begins to make sense.
Lets say you have you Dog object
public class Dog
{
public Dog Get(int id)
{
return YourContext.DogSet.FirstOrDefault(it => it.ID == id );
}
}
This is the type of function you work with all the time. It gets called from all over the place and once you have that Dog object, you will do very different things to it in different functions. First, it should be pre-compiled, because you will call that very often. Second, each different pages will want to have access to a different subset of the Dog data. Some will want the Owner, some the FavoriteToy, etc.
Of course, you could call Load() for each reference you need anytime you need one. But that will generate a call to the database each time. Bad idea. So instead, each page will ask for the data it wants to see when it first request for the Dog object:
static public Dog Get(int id) { return GetDog(entity,"");}
static public Dog Get(int id, string includePath)
{
string query = "select value o " +
" from YourEntities.DogSet as o " +
Please do not use all of the above info such as "Singleton access". You absolutely 100% should not be storing this context to be reused as it is not thread safe.
While informative I think it may be more helpful to share how all this fits into a complete solution architecture. Example- Got a solution showing where you use both EF inheritance and your alternative so that it shows their performance difference.

Firebase how to secure numeric data from manipulation by users, eg. game score

I am developing a multiplayer game with Firebase. Player score is recorded in firebase after each game, and also a playerTotalScore field is updated with the new total.
My question : Is it possible to secure playerTotalScore field against arbitrary manipulation by the user using only firebase security rules? If so, how?
I have perused firebase security information on the firebase website at length. While I understand that it is possible to implement some complex logic in the security rules (increment a number by a given amount such as this gist , or make field insert-only ( ".write": "!data.exists()" ), none of the information seems to help in this case. Increment-only rules will not be sufficient because the score can be manipulated by being incremented multiple times. Insert-only does appear to be an option for totalScore, because that is updated after each game.
Update
As requested by Kato, here is the specific use case.
The game I am developing is a quiz game in which players answer questions, and the players scores are displayed in real time.
During the course of the game, the score for that specific game is updated after each question by the following statement:
gameRef.child('players').child(UserId).child('score').set(gameScore)
After the game is over, the totalScore (all games played) for the player is calculated as totalScore=totalScore+gameScore and then the players total score is updated in Firebase using the following statement:
leaderboardRef.child(userId).setWithPriority({userName:userName, totalScore:totalScore}, totalScore)
Update2: Data Structure as requested by Kato
Here is the specific structure I currently have in place. This is not set in stone so I am open to changing it howsoever needed per the recommended approach to secure the data.
The score for each game played by a user(player) is stored in the following structure
<firebase_root>/app/games/<gameId>/players/<userId>/score/
<gameId> is the firebase generated key as a reult of calling firebase push() method.
<UserId> is the firebase simplelogin uid.
The totalScore (sum of all scores for all games played) for each user(player) is stored in the following data structure
<firebase_root>/app/leaderboard/<userId>/totalScore/
leaderboard data for totalScore is set using the totalScore as priority, for query purposes
leaderboardRef.child(userId).setWithPriority({userName:userName, totalScore:totalScore}, totalScore)
Both score and totalScore are numeric integer values.
That is all the detail to the current data structure that I can think of.
Your question is technically how to complete this using security rules, but as it's a bit of an XY problem, and none of the other possibilities have been ruled out, I'll tackle some of them here as well.
I'll be making a great deal of assumptions, since answering this question actually requires a fully specified set of rules that need to be followed and is really a matter of implementing an entire application (increasing a score is a result of the game logic rules, not a simple math problem).
Total the score at the client
Perhaps the simplest answer to this conundrum is to simply not have a total score. Just grab the list of players and total them manually.
When this might be useful:
the list of players is hundreds or less
the player data is appropriately small (not 500k each)
How to do it:
var ref = new Firebase(URL);
function getTotalScore(gameId, callback) {
ref.child('app/games/' + gameId + '/players').once('value', function(playerListSnap) {
var total = 0;
playerListSnap.forEach(function(playerSnap) {
var data = playerSnap.val();
total += data.totalScore || 0;
});
callback(gameId, total);
});
}
Use a privileged worker to update the score
A very sophisticated and also simple approach (because it only requires that the security rules be set to something like ".write": "auth.uid === 'SERVER_PROCESS'") would be to use a server process that simply monitors the games and accumulates the totals. This is probably the simplest solution to get right and the easiest to maintain, but has the downside of requiring another working part.
When this might be useful:
you can spin up a Heroku service or deploy a .js file to webscript.io
an extra monthly subscription in the $5-$30 range are not a deal-breaker
How to do it:
Obviously, this involves a great deal of application design and there are various levels this has to be accomplished at. Let's focus simply on closing games and tallying the leaderboards, since this is a good example.
Begin by splitting the scoring code out to its own path, such as
/scores_entries/$gameid/$scoreid = < player: ..., score: ... >
/game_scores/$gameid/$playerid = <integer>
Now monitor the games to see when they close:
var rootRef = new Firebase(URL);
var gamesRef = rootRef.child('app/games');
var lbRef = rootRef.child('leaderboards');
gamesRef.on('child_added', watchGame);
gamesRef.child('app/games').on('child_remove', unwatchGame);
function watchGame(snap) {
snap.ref().child('status').on('value', gameStatusChanged);
}
function unwatchGame(snap) {
snap.ref().child('status').off('value', gameStatusChanged);
}
function gameStatusChanged(snap) {
if( snap.val() === 'CLOSED' ) {
unwatchGame(snap);
calculateScores(snap.name());
}
}
function calculateScores(gameId) {
gamesRef.child(gameId).child('users').once('value', function(snap) {
var userScores = {};
snap.forEach(function(ss) {
var score = ss.val() || 0;
userScores[ss.name()] = score;
});
updateLeaderboards(userScores);
});
}
function updateLeaderboards(userScores) {
for(var userId in userScores) {
var score = userScores[userId];
lbRef.child(userId).transaction(function(currentValue) {
return (currentValue||0) + score;
});
}
}
Use an audit path and security rules
This will, of course, be the most sophisticated and difficult of the available choices.
When this might be useful:
when we refuse to utilize any other strategy involving a server process
when dreadfully worried about players cheating
when we have lots of extra time to burn
Obviously, I'm biased against this approach. Primarily because it's very difficult to get right and requires a lot of energy that could be replaced with a small monetary investment.
Getting this right requires scrutiny at each individual write request. There are several obvious points to secure (probably more):
Writing any game event that includes a score increment
Writing the total for the game per user
Writing the game's total to the leaderboard
Writing each audit record
Ensuring superfluous games can't be created and modified on the fly just to boost scores
Here are some basic fundamentals to securing each of these points:
use an audit trail where users can only add (not update or remove) entries
validate that each audit entry has a priority equal to the current timestamp
validate that each audit entry contains valid data according to the current game state
utilize the audit entries when trying to increment running totals
Let's take, for an example, updating the leaderboard securely. We'll assume the following:
the users' score in the game is valid
the user has created an audit entry to, say, leaderboard_audit/$userid/$gameid, with a current timestamp as the priority and the score as the value
each user record exists in the leaderboard ahead of time
only the user may update their own score
So here's our assumed data structure:
/games/$gameid/users/$userid/score
/leaderboard_audit/$userid/$gameid/score
/leaderboard/$userid = { last_game: $gameid, score: <int> }
Here's how our logic works:
game score is set at /games/$gameid/users/$userid/score
an audit record is created at /leaderboard_audit/$userid/games_played/$gameid
the value at /leaderboard_audit/$userid/last_game is updated to match $gameid
the leaderboard is updated by an amount exactly equal to last_game's audit record
And here's the actual rules:
{
"rules": {
"leaderboard_audit": {
"$userid": {
"$gameid": {
// newData.exists() ensures records cannot be deleted
".write": "auth.uid === $userid && newData.exists()",
".validate": "
// can only create new records
!data.exists()
// references a valid game
&& root.child('games/' + $gameid).exists()
// has the correct score as the value
&& newData.val() === root.child('games/' + $gameid + '/users/' + auth.uid + '/score').val()
// has a priority equal to the current timestamp
&& newData.getPriority() === now
// is created after the previous last_game or there isn't a last_game
(
!root.child('leaderboard/' + auth.uid + '/last_game').exists() ||
newData.getPriority() > data.parent().child(root.child('leaderboard/' + auth.uid + '/last_game').val()).getPriority()
)
"
}
}
},
"leaderboard": {
"$userid": {
".write": "auth.uid === $userid && newData.exists()",
".validate": "newData.hasChildren(['last_game', 'score'])",
"last_game": {
".validate": "
// must match the last_game entry
newData.val() === root.child('leaderboard_audit/' + auth.uid + '/last_game').val()
// must not be a duplicate
newData.val() !== data.val()
// must be a game created after the current last_game timestamp
(
!data.exists() ||
root.child('leaderboard_audit/' + auth.uid + '/' + data.val()).getPriority()
< root.child('leaderboard_audit/' + auth.uid + '/' + newData.val()).getPriority()
)
"
},
"score": {
".validate": "
// new score is equal to the old score plus the last_game's score
newData.val() === data.val() +
root.child('games/' + newData.parent().child('last_game').val() + '/users/' + auth.uid + '/score').val()
"
}
}
}
}
}
It will be tricky to guard against invalid values using rules. Since you're giving the user rights to write a value, they can also reverse-engineer your code and write values that you'd rather not see. You can do many things to make the hacker's job more difficult, but there'll always be someone who is able to work around it. That said: there are some easy things you can do to make things for hackers a bit less trivial.
Something you can easily do is record/store enough information about the gameplay so that you can later determine if it is legit.
So for example in a typing game I did, I not only stored the final score for the player, but also each key they pressed and when they pressed it.
https://<my>.firebaseio.com/highscores/game_1_time_15/puf
keystrokes: "[[747,'e'],[827,'i'],[971,'t'],[1036,'h']...[14880,'e']]"
score: 61
So at 747ms into the game, I typed an e then i, t, h and so on, until finally after 14.8s I pressed e.
Using these values I can check if the keys pressed indeed lead to a score of 61. I could also replay the game, or do some analysis on it to see if it seems like a real human playing pressing the keys. If the timestamps are 100, 200, 300, etc, you'd be quite suspicious (although I created some bots that type exactly at such intervals).
It's still no guarantee of course, but it's a least a first stumbling block for the ref.child('score').set(10000000) hackers.
I got this idea from John Resig's Deap Leap, but I can't find the page where he describes it.
I have an idea. - since this is a multiplayer game you are going to have multiple players in one particular game. this means each of the players after the game over message is going to update the partial and total score.
In security rules you can check if the opponent has written the partial value regarding the same game. - thats would be read only access. Or you can check if all opponents partial values gives the required total number etc.
Hacker would have to come up with some elaborate plan involving control of multiple accounts and synchronising the attack.
edit:
...and I can see the further question - What about the first player to update? That could be done via intents. So first all the players write an intent to write score where the partial score will be and once there are some values everywhere they will be clear to write the actual score.

How do accumulate functions actually work?

Let's say we have the next example :
There are certain products that belong to certain product groups, and we want the total price summed up in an logical fact as either the products in the product group change or as their price changes.
private class ProductGroup {
private String name;
}
public class Product {
private ProductGroup productGroup;
private int price;
}
This is the class that will be intended for the logical facts that will get inserted by the summation rule in Drools.
private class ProductGroupTotalPrice {
private ProductGroup productGroup;
private int totalPrice;
}
There is a rule that sums up the total price for a given ProductGroup.
rule "total price for product group"
when
$productGroup : ProductGroup()
$totalPrice : Number() from accumulate(
Product(productGroup == $productGroup, $price : price),
sum($price)
)
then
insertLogical(new ProductGroupTotalPrice($productGroup, $totalPrice));
end
So my question is what will the logic be when Products from a given ProductGroup are added/deleted from the working memory, they change the ProductGroup or their price is being changed?
- Lets say that the summation is done at the beggining of the application based on the current state and the logical fact is inserted into the working memory with the total price. Then the price for one Product is changed at one point so the totalPrice needs to be updated.
Here are three cases how the process would possibly be done :
Incrementally with doing a constant time calculation. Only take into account the change that has happened and subtract the old price from the total and add the new one for the one Product that was changed. (Excelent)
The whole summation is done again but the Product instances that meet the criteria(that are from the given ProductGroup) are already known, they are not searched for. (Good)
Besides the summation a loop through all the Product instances in the working memory is done to see which ones meet the criteria(that are from the given ProductGroup). (Bad)
Is the logic that is implemented one of these three cases or it is something else?
You can look at the documentation of the other form of accumulate, i.e., the one where you can define the steps for initialization, processings (note the plural!) and returning an arbitrary function. Some functions permit the reverse operation so that removing a fact that has been used for computing the function result can be handled: e.g., 'sum'. (But compare 'max'.)
So I think that your accumulate pattern will be updated efficiently.
However, I think that this does not mean that your logically inserted ProductGroupTotalPrice will be updated. (Try it, I may be wrong.)
I would use a simple rule
rule "total price for product group"
when
$productGroup: ProductGroup()
Number( $totalPrice: intValue ) from accumulate(
Product(productGroup == $productGroup, $price : price),
sum($price)
)
$pgtp: ProductGroupTotalPrice( productgroup == $productGroup,
totalPric != $totalPrice )
then
modify( $pgtp ){ setTotalPrice( $totalPrice ) }
end
and an addition rule to insert an initial ProductGroupTotalPrice for the product group with totalPrice 0.

Optaplanner VRP incremental score overconstrained planning

I want to create incremental score for VRP with overconstrained planning. I create one aditional dummy vehicle, which includes all unplanned customers.
The problem is when optaplanner move customer to other vehicle, it call afterVariableChanged with variable name previousStandstill and vehicle for that customer are not refreshed. Then i don't know vehicle for that customer, i don't know if i need add soft cost or no (for dummy vehicle i could not add cost).
How to solve this problem?
Example:
Optaplanner move Customer1 from Vehicle1 to Vehicle2:
beforeVariableChanged: previousStandstill(Customer1), customer.GetVehicle() = Vehicle1
beforeVariableChanged: nextCustomer(Customer0), customer.GetVehicle() = Vehicle1
afterVariableChanged: nextCustomer(Customer0), customer.GetVehicle() = Vehicle1
afterVariableChanged: previousStandstill(Customer1), customer.GetVehicle() = Vehicle1
beforeVariableChanged: vehicle(Customer1), customer.GetVehicle() = Vehicle1
afterVariableChanged: vehicle(Customer1), customer.GetVehicle() = Vehicle2
When I get afterVariableChanged: previousStandstill(Customer1), in customer.GetVehicle() I have old Vehicle value and I don't know if need to add soft cost (for dummy vehicle cost are ignored).
Is there any way to get actual vehicle in afterVariableChanged - previousStandstill rather than in afterVariableChanged - vehicle.
Check if you annotated the vechicle field with AnchorShadowVariable:
#AnchorShadowVariable(sourceVariableName = "previousStandstill")
public VehicleNode getVehicleNode() {
return vehicleNode;
}
That annotation says to Optaplanner to update the vechicle field.

Optaplanner VehicleRouting First Pick, Last drop condition

I am new to optaplanner and am looking for a way to define Customers gender and enforce that when the route is created, No Female Customer gets picked first or dropped last. I am using incremental solver & ROAD_DISTANCE xml and have tried decreasing the hardScore in InsertCustomer and reset it back in retractCustomer. It doesn't seem to work. Please help me get through this.
Thanks in advance for all the help.
Based on the VRP example, add this method in the Customer class:
public boolean isFemaleAndFirstOrLast() {
return gender == FEMALE
&& (previousStandstill instanceof Vehicle
|| nextStandstill == null);
}
And then add a score rule to punish that
when
Customer(femaleAndFirstOrLast == true)
then
scoreHolder.add...(...);
end

Resources