Skip to content

Commit

Permalink
Replaced Microsoft.Hadoop.Avro with Apache Avro CSharp Library
Browse files Browse the repository at this point in the history
  • Loading branch information
flomader committed Oct 9, 2017
1 parent 23c6351 commit cc1124b
Show file tree
Hide file tree
Showing 12 changed files with 117 additions and 95 deletions.
8 changes: 5 additions & 3 deletions Examples/AvroExamples/AvroExamples/2-RegisterAssemblies.usql
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
DROP ASSEMBLY IF EXISTS [Microsoft.Hadoop.Avro];
CREATE ASSEMBLY [Microsoft.Hadoop.Avro] FROM @"/Assemblies/Avro/Microsoft.Hadoop.Avro.dll";
DROP ASSEMBLY IF EXISTS [Avro];
CREATE ASSEMBLY [Avro] FROM @"/Assemblies/Avro/Avro.dll";
DROP ASSEMBLY IF EXISTS [Microsoft.Analytics.Samples.Formats];
CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"/Assemblies/Avro/Microsoft.Analytics.Samples.Formats.dll";
DROP ASSEMBLY IF EXISTS [Newtonsoft.Json];
CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll";
CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll";
DROP ASSEMBLY IF EXISTS [log4net];
CREATE ASSEMBLY [log4net] FROM @"/Assemblies/Avro/log4net.dll";
7 changes: 4 additions & 3 deletions Examples/AvroExamples/AvroExamples/3-SimpleAvro.usql
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Hadoop.Avro];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
REFERENCE ASSEMBLY [log4net];
REFERENCE ASSEMBLY [Avro];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @input_file string = @"\TwitterStream\{*}\{*}\{*}.avro";
DECLARE @output_file string = @"\output\twitter.csv";
Expand All @@ -14,7 +15,7 @@ DECLARE @output_file string = @"\output\twitter.csv";
partitionid long,
eventenqueuedutctime string
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.Avro.AvroExtractor(@"
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
{
""type"" : ""record"",
""name"" : ""GenericFromIRecord0"",
Expand Down
16 changes: 6 additions & 10 deletions Examples/AvroExamples/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
# U-SQL Avro Example
This example demonstrates how you can use U-SQL to analyze data stored in Avro files.

## Deploying
The Avro Extractor requires Microsoft.Analytics.Samples.Formats and an updated version of the Microsoft.Hadoop.Avro library which can be found [here](https://github.com/flomader/hadoopsdk).

1. Download the latest version of Microsoft.Hadoop.Avro.zip from [here]( https://github.com/flomader/hadoopsdk/releases).
2. Extract Microsoft.Hadoop.Avro.dll from Microsoft.Hadoop.Avro.zip
3. Clone and open the Microsoft.Analytics.Samples.Formats solution in Visual Studio.
4. Update the reference of the file Microsoft.Hadoop.Avro.dll
5. Build the Microsoft.Analytics.Samples.Formats solution
## Build
1. Open Microsoft.Analytics.Samples.sln in Visual Studio 2017

This comment has been minimized.

Copy link
@DTU-STR

DTU-STR Mar 11, 2019

This is not really clear, there is no project like this in VS2017, pls clarify this in documentation.

This comment has been minimized.

Copy link
@flmader

flmader Mar 25, 2019

Has been fixed with pull request #152

2. Build the Microsoft.Analytics.Samples solution

### Register assemblies
1. Copy the following files to a directory in Azure Data Lake Store (e.g. \Assemblies\Avro):
1. Copy the following files from your build directory to a directory in Azure Data Lake Store (e.g. \Assemblies\Avro):
* Microsoft.Analytics.Samples.Formats.dll
* Microsoft.Hadoop.Avro.dll
* Avro.dll
* log4net.dll
* Newtonsoft.Json.dll
2. Create a database (e.g. run 1-CreateDB.usql.cs), switch to the new database
3. Check file paths in 2-RegisterAssemblies.usql and update them if necessary
Expand Down
Binary file added Examples/DataFormats/Lib/Avro.dll
Binary file not shown.
Binary file added Examples/DataFormats/Lib/Newtonsoft.Json.dll
Binary file not shown.
Binary file added Examples/DataFormats/Lib/log4net.dll
Binary file not shown.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,15 @@
<AppDesignerFolder>Properties</AppDesignerFolder>
<RootNamespace>Microsoft.Analytics.Samples.Formats.Tests</RootNamespace>
<AssemblyName>Microsoft.Analytics.Samples.Formats.Tests</AssemblyName>
<TargetFrameworkVersion>v4.6.1</TargetFrameworkVersion>
<TargetFrameworkVersion>v4.7</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
<ProjectTypeGuids>{4D4E14FB-86F2-46A5-8BFB-41569A68D9E8};{3AC096D0-A1C2-E12C-1390-A8335801FDAB};{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}</ProjectTypeGuids>
<VisualStudioVersion Condition="'$(VisualStudioVersion)' == ''">10.0</VisualStudioVersion>
<VSToolsPath Condition="'$(VSToolsPath)' == ''">$(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)</VSToolsPath>
<ReferencePath>$(ProgramFiles)\Common Files\microsoft shared\VSTT\$(VisualStudioVersion)\UITestExtensionPackages</ReferencePath>
<IsCodedUITest>False</IsCodedUITest>
<TestProjectType>USQLUnitTest</TestProjectType>
<TargetFrameworkProfile />
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|AnyCPU' ">
<DebugSymbols>true</DebugSymbols>
Expand All @@ -36,8 +37,15 @@
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<ItemGroup>
<Reference Include="Microsoft.Hadoop.Avro">
<HintPath>..\..\..\..\..\..\..\Temp\Avro\Assemblies\Microsoft.Hadoop.Avro.dll</HintPath>
<Reference Include="Avro">
<HintPath>..\Lib\Avro.dll</HintPath>
</Reference>
<Reference Include="log4net">
<HintPath>..\Lib\log4net.dll</HintPath>
</Reference>
<Reference Include="Newtonsoft.Json, Version=6.0.0.0, Culture=neutral, PublicKeyToken=30ad4fe6b2a6aeed, processorArchitecture=MSIL">
<SpecificVersion>False</SpecificVersion>
<HintPath>..\Lib\Newtonsoft.Json.dll</HintPath>
</Reference>
<Reference Include="System" />
<Reference Include="Microsoft.Analytics.Interfaces" />
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,13 @@
using System.Text;
using System.Threading.Tasks;
using System.Runtime.Serialization;
using Microsoft.Hadoop.Avro;

namespace Microsoft.Analytics.Samples.Formats.Tests
{
[DataContract(Name = "Foo", Namespace = "Microsoft.Analytics.Samples.Formats.Tests")]
[DataContract(Name = "SingleColumnPoco", Namespace = "Microsoft.Analytics.Samples.Formats.Tests")]
public class SingleColumnPoco<T>
{
[DataMember]
[NullableSchema]
public T Value { get; set; }

}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@
//
using System.Collections.Generic;
using Microsoft.Analytics.Interfaces;
using Microsoft.Hadoop.Avro;
using Microsoft.Hadoop.Avro.Container;
using Avro.File;
using Avro.Generic;
using System.IO;

namespace Microsoft.Analytics.Samples.Formats.Avro
namespace Microsoft.Analytics.Samples.Formats.ApacheAvro
{
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
public class AvroExtractor : IExtractor
Expand All @@ -32,29 +33,40 @@ public AvroExtractor(string avroSchema)

public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
var serializer = AvroSerializer.CreateGeneric(avroSchema);
using (var genericReader = AvroContainer.CreateGenericReader(input.BaseStream))
var avschema = Avro.Schema.Parse(avroSchema);
var reader = new GenericDatumReader<GenericRecord>(avschema, avschema);

using (var ms = new MemoryStream())
{
using (var reader = new SequentialReader<dynamic>(genericReader))
CreateSeekableStream(input, ms);
ms.Position = 0;

var fileReader = DataFileReader<GenericRecord>.OpenReader(ms, avschema);

while (fileReader.HasNext())
{
foreach (var obj in reader.Objects)
var avroRecord = fileReader.Next();

foreach (var column in output.Schema)
{
foreach (var column in output.Schema)
if (avroRecord[column.Name] != null)
{
if (obj[column.Name] != null)
{
output.Set(column.Name, obj[column.Name]);
}
else
{
output.Set<object>(column.Name, null);
}
output.Set(column.Name, avroRecord[column.Name]);
}
else
{
output.Set<object>(column.Name, null);
}

yield return output.AsReadOnly();
}
}
}
}

private void CreateSeekableStream(IUnstructuredReader input, MemoryStream output)
{
input.BaseStream.CopyTo(output);
}
}
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Project DefaultTargets="Build" ToolsVersion="12.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
Expand All @@ -13,7 +13,7 @@
<AssemblyName>Microsoft.Analytics.Samples.Formats</AssemblyName>
<AssemblyVersion>1.0.0.0</AssemblyVersion>
<OutputType>Library</OutputType>
<TargetFrameworkVersion>v4.5</TargetFrameworkVersion>
<TargetFrameworkVersion>v4.6.1</TargetFrameworkVersion>
<FileAlignment>512</FileAlignment>
<TargetFrameworkProfile />
</PropertyGroup>
Expand All @@ -35,13 +35,16 @@
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<ItemGroup>
<Reference Include="Microsoft.CSharp" />
<Reference Include="Microsoft.Hadoop.Avro">
<HintPath>..\..\..\..\..\..\..\Temp\Avro\Assemblies\Microsoft.Hadoop.Avro.dll</HintPath>
<Reference Include="Avro">
<HintPath>..\Lib\Avro.dll</HintPath>
</Reference>
<Reference Include="log4net">
<HintPath>..\Lib\log4net.dll</HintPath>
</Reference>
<Reference Include="Microsoft.CSharp" />
<Reference Include="Newtonsoft.Json, Version=6.0.0.0, Culture=neutral, PublicKeyToken=30ad4fe6b2a6aeed, processorArchitecture=MSIL">
<SpecificVersion>False</SpecificVersion>
<HintPath>..\..\..\..\..\..\..\Temp\Avro\Assemblies\Newtonsoft.Json.dll</HintPath>
<HintPath>..\Lib\Newtonsoft.Json.dll</HintPath>
</Reference>
<Reference Include="System" />
<Reference Include="System.Core" />
Expand All @@ -63,5 +66,6 @@
<Compile Include="Xml\XmlOutputter.cs" />
<Compile Include="Xml\XPath.cs" />
</ItemGroup>
<ItemGroup />
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
</Project>
2 changes: 1 addition & 1 deletion Examples/DataFormats/Microsoft.Analytics.Samples.sln
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 15
VisualStudioVersion = 15.0.26730.12
VisualStudioVersion = 15.0.26730.16
MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Microsoft.Analytics.Samples.Formats", "Microsoft.Analytics.Samples.Formats\Microsoft.Analytics.Samples.Formats.csproj", "{1B3E7106-6D16-4B96-87C5-F15E18FFC08F}"
EndProject
Expand Down

0 comments on commit cc1124b

Please sign in to comment.