A LINQ provider for RDF files - part 2
For the simple Rdf queries like
IQueryable<string> q = from x in rdf
from y in rdf
where rdf.A(germany, hasAdminDiv, x)
&& rdf.A(x, isOfType, germanState)
&& rdf.A(x, hasName, y)
select y.Val + " [" + x.Val + "]";
which we are going to support here there is a “normal form” given by
- a set of variables, which denote resources or values in an RDF document – in the example above this is {x,y}.
- a set of constraint triples (subj, pred, obj) where subj, pred, obj are either variables or constants. This is the query condition – in the above example it is
{(germany,hasAdminDiv,x),(x,isOfType,germanState),(x,hasName,y) }
- a “projection function” using these variables which denotes the value which we associate with each “row” – in the above example this is
(x,y) => y.Val + " [" + x.Val + "]"
To execute such a query means finding all possible assignments of resources / values to the variables such that all resulting triples are in the axioms of the RDF file, and then applying the projection function to get a set of objects of a certain type (the return type of the projection function – in the above example this is string).
The compiler will treat the above query expression as syntactic sugar for an expression like:
rdf.SelectMany(x => rdf.Where(y => Cond(x,y))
.Select(y => f(x,y))
)
where Cond(x,y) is the condition involving rdf.A and f(x,y) is the function that assigns a string to each pair (x,y) of values in the Rdf document.
The same query could be written in different forms: For example replacing an expression
rdf.Where(y => Cond1(x,y) && Cond2(x,y))
by
rdf.Where(y => Cond1(x,y)).Where(z => Cond2(x,z))
should lead to the same normal form.
So how do we get LINQ to translate these expressions to the above normal form?
To get LINQ started, our Rdf type has to implement an IQueryable<T> interface, like the System.Data.DLinq.Table<T> does. When we query a database table without conditions, we get the set of all rows in the table. The analog notion for an RDF file (or RDF files, or any set of Rdf triples) is the set of all “Values” in the RDF document, so we implement the interface IQueryable<Value> on Rdf.
“Value” is the common base type of Literal (meaning a string occurring in an object position in an axiom) and Resource (given by a URI occurring in any position in any axiom).
Since we usually do not really want to retrieve all values occurring in a document, it does not matter too much what exactly we get when we foreach over a document (e.g. all values or only the resources?), what is more important is the IQueryable part, since that means that now the query operators Where, Select, SelectMany are defined for Rdf.
The basic observation is that we now can give the normal form of a query corresponding to a Rdf object (variables: {x}, constraints: {}, projection: x => x), and we can recursively determine the normal form of a query which is constructed out of these with the operators Where, Select and SelectMany.
There is some fine print:
1) Variables and variable names:
In rdf.Where(y => Cond1(x,y)).Where(z => Cond2(x,z)) the names y and z correspond to the same variable (which runs over the rdf at the beginning of this expression). We have to be careful to distinguish between variables (that the solver will assign to values) and named references to these variables (like “y” and “z” above).
2) Variables can be defined outside of a (sub)expression:
In rdf.Where(y => Cond1(x,y)) the variable x is defined in an enclosing scope. When we translate a (sub)expression, we always have to give the list of variables in the enclosing scope as a parameter.
3) Some restrictions apply:
- We only deal with Where, Select, SelectMany when applied to a Rdf query with identity projection function, i.e. the output is given by a variable and is a sequence of objects of type Value (e.g. not to a sequence of strings).
- The conditions in the Where clause only are of the form Rdf.A(?,?,?), the predicate is always given as a constant, and at least one of the entries is a variable.
With these caveats, here is what this recursive algorithm does:
- Where:
Source.Where(v => Cond(v)):
Translate the query expression Source. Assume the output of Source is a variable. Make the name v point to the same variable, translate the condition and add the result to the list of constraints.
The output variable of the new query expression is the same as for Source.
- SelectMany:
Source.SelectMany(v => Seq(v)):
Translate the query expression Source. Assume the output of Source is given by a variable. Make the name v point to the same variable. Add the variables and constraints of Source and Seq together. The projection function of the result is the projection function of Seq.
- Select:
Source.Select(v=>f(v)):
Translate the query expression Source. Assume the output of Source is given by a variable. Make the name v point to the same variable. Determine all parameters occurring in f, build a Lambda expression (v1,v2,..,vn) => f(v1,v2,…,vn) and compile it. This is the projection function of the result. The variables and constraints of the result are the same as from source.
I attach a VS2005 solution which implements this algorithm. It assumes the May LINQ CTP is installed.
It contains four projects:
- LinqToRdf is the main project which implements this algorithm
It uses an ITriplePovider object which enumerates triples, and an ISolver object that implements a solution algorithm that takes “local information” about the possibilities to complete a triple when the predicate and maybe one of subject and object are given, and computes all the possible solutions of a given query (given as a set of query triples).
- RdfXmlReader is an implementation of ITripleProvider which reads in an RdfXml file. It uses Drive (see last blog entry), you have to modify the reference to Drive.dll in this project to point to your copy of Drive.dll.
- SimpleSolver implements a simple algorithm to solve an Rdf query in the above normal form.
- Demo uses these assemblies to read in the RDF files containing information about Germany and France and list all “administrative divisions” of Germany and France.
As always, this sample code is the product of Weekend Evening Rapid Prototyping, it is provided as-is and does not come with any warranty.
You can copy, modify, and use the code for commercial and non-commercial purposes.
To build the RdfXmlReader project, you need to download Drive.dll from https://www.driverdf.org/, see there for legal restrictions which may apply to this DLL.
Comments
Anonymous
July 25, 2006
Excellent post, I have some comments about this on my blog.Anonymous
September 08, 2006
The comment has been removedAnonymous
November 19, 2007
Welcome to the thirty-sixth issue of Community Convergence. This is the big day, with Visual Studio 2008Anonymous
December 15, 2007
V poslednej dobe sa všade skloňuje skratka LINQ (Language-Integrated Query). Trošku som popátral a jeAnonymous
February 28, 2008
Here are some useful links to LINQ information. Use the comments or write me if you want to add to thisAnonymous
February 28, 2008
I've recently updated the list of LINQ Providers found on my Links to LINQ page, accessible from theAnonymous
February 28, 2008
I've recently updated the list of LINQ Providers found on my Links to LINQ page, accessible fromAnonymous
February 29, 2008
The comment has been removedAnonymous
March 02, 2008
PingBack from http://www.hecgo.com/2008/03/03/linq-to-everything-a-list-of-linq-providers/Anonymous
March 18, 2008
I mentioned in a post a little while ago about the various LINQ To projects I had seen, but Charlie CalvertAnonymous
March 18, 2008
PingBack from http://www.jacquessnyman.co.za/?p=20Anonymous
March 22, 2008
LINQ Providers LINQ to Amazon LINQ to Active Directory LINQ over C# project LINQ to CRM LINQ To GeoAnonymous
March 22, 2008
LINQ Providers LINQ to Amazon LINQ to Active Directory LINQ over C# project LINQ to CRM LINQ To GeoAnonymous
April 06, 2008
Researching on this great feature in .NET 3.5, I found a lot of useful information for anyone who intendAnonymous
April 09, 2008
PingBack from http://blog.windows2.webhome.at/post/2008/04/LINQ-to-AnyWhere.aspxAnonymous
April 22, 2008
PingBack from http://blog.web-crossing.com/post/2008/04/LINQ-to-AnyWhere.aspxAnonymous
September 19, 2008
Here are some useful links to LINQ information. Use the comments or write me if you want to add to thisAnonymous
November 11, 2008
Офіційні: LINQ to SQL (DLINQ) LINQ to XML (XLINQ) LINQ to XSD LINQ to Entities BLINQ PLINQ НеофіційніAnonymous
November 17, 2008
Офіційні: LINQ to SQL (DLINQ) LINQ to XML (XLINQ) LINQ to XSD LINQ to Entities BLINQ PLINQ НеофіційніAnonymous
November 29, 2008
PingBack from http://vincenthomedev.wordpress.com/2008/11/29/a-list-of-linq-providers/Anonymous
April 26, 2009
This weekend I’ve built a small application, which queries the “Simpsons” seasons guide data and updatesAnonymous
May 10, 2009
PingBack from http://www.devdotnet.com.br/?p=512Anonymous
June 09, 2009
PingBack from http://insomniacuresite.info/story.php?id=172Anonymous
June 17, 2009
PingBack from http://pooltoysite.info/story.php?id=49