Release Notes - Spark - Version 2.0.1 - HTML format

Sub-task

  • [SPARK-15232] - Add subquery SQL building tests to LogicalPlanToSQLSuite
  • [SPARK-15698] - Ability to remove old metadata for structure streaming MetadataLog
  • [SPARK-15814] - Aggregator can return null result
  • [SPARK-16287] - Implement str_to_map SQL function
  • [SPARK-16312] - Docs for Kafka 0.10 consumer integration
  • [SPARK-16380] - Update SQL examples and programming guide for Python language binding
  • [SPARK-16391] - KeyValueGroupedDataset.reduceGroups should support partial aggregation
  • [SPARK-16508] - Fix documentation warnings found by R CMD check
  • [SPARK-16510] - Move SparkR test JAR into Spark, include its source code
  • [SPARK-16519] - Handle SparkR RDD generics that create warnings in R CMD check
  • [SPARK-16577] - Add check-cran script to Jenkins
  • [SPARK-16579] - Add a spark install function
  • [SPARK-16581] - Making JVM backend calling functions public
  • [SPARK-16621] - Generate stable SQLs in SQLBuilder
  • [SPARK-16734] - Make sure examples in all language bindings are consistent
  • [SPARK-16735] - Fail to create a map contains decimal type with literals having different inferred precessions and scales
  • [SPARK-16774] - Fix use of deprecated TimeStamp constructor (also providing incorrect results)
  • [SPARK-16776] - Fix Kafka deprecation warnings
  • [SPARK-16778] - Fix use of deprecated SQLContext constructor
  • [SPARK-16800] - Fix Java Examples that throw exception
  • [SPARK-16866] - Basic infrastructure for file-based SQL end-to-end tests
  • [SPARK-17007] - Move test data files into a test-data folder
  • [SPARK-17008] - Normalize query results using sorting
  • [SPARK-17009] - Use a new SparkSession for each test case
  • [SPARK-17011] - Support testing exceptions in queries
  • [SPARK-17015] - group-by-ordinal and order-by-ordinal test cases
  • [SPARK-17018] - literals.sql for testing literal parsing
  • [SPARK-17042] - Repl-defined classes cannot be replicated
  • [SPARK-17096] - Fix StreamingQueryListener to return message and stacktrace of actual exception
  • [SPARK-17149] - array.sql for testing array related functions
  • [SPARK-17165] - FileStreamSource should not track the list of seen files indefinitely
  • [SPARK-17235] - MetadataLog should support purging old logs
  • [SPARK-17269] - Move finish analysis stage into its own file
  • [SPARK-17270] - Move object optimization rules into its own file
  • [SPARK-17274] - Move join optimizer rules into a separate file
  • [SPARK-17372] - Running a file stream on a directory with partitioned subdirs throw NotSerializableException/StackOverflowError
  • [SPARK-17513] - StreamExecution should discard unneeded metadata
  • [SPARK-17586] - Use Static member not via instance reference
  • [SPARK-18151] - CLONE - MetadataLog should support purging old logs
  • [SPARK-18152] - CLONE - FileStreamSource should not track the list of seen files indefinitely
  • [SPARK-18153] - CLONE - Ability to remove old metadata for structure streaming MetadataLog
  • [SPARK-18156] - CLONE - StreamExecution should discard unneeded metadata

Bug

  • [SPARK-10683] - Source code missing for SparkR test JAR
  • [SPARK-11227] - Spark1.5+ HDFS HA mode throw java.net.UnknownHostException: nameservice1
  • [SPARK-12666] - spark-shell --packages cannot load artifacts which are publishLocal'd by SBT
  • [SPARK-14204] - [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
  • [SPARK-14209] - Application failure during preemption.
  • [SPARK-14818] - Move sketch and mllibLocal out from mima exclusion
  • [SPARK-15083] - History Server would OOM due to unlimited TaskUIData in some stages
  • [SPARK-15285] - Generated SpecificSafeProjection.apply method grows beyond 64 KB
  • [SPARK-15382] - monotonicallyIncreasingId doesn't work when data is upsampled
  • [SPARK-15390] - Memory management issue in complex DataFrame join and filter
  • [SPARK-15541] - SparkContext.stop throws error
  • [SPARK-15869] - HTTP 500 and NPE on streaming batch details page
  • [SPARK-15899] - file scheme should be used correctly
  • [SPARK-15989] - PySpark SQL python-only UDTs don't support nested types
  • [SPARK-16062] - PySpark SQL python-only UDTs don't work well
  • [SPARK-16321] - [Spark 2.0] Performance regression when reading parquet and using PPD and non-vectorized reader
  • [SPARK-16334] - SQL query on parquet table java.lang.ArrayIndexOutOfBoundsException
  • [SPARK-16409] - regexp_extract with optional groups causes NPE
  • [SPARK-16439] - Incorrect information in SQL Query details
  • [SPARK-16440] - Undeleted broadcast variables in Word2Vec causing OoM for long runs
  • [SPARK-16457] - Wrong messages when CTAS with a Partition By clause
  • [SPARK-16460] - Spark 2.0 CSV ignores NULL value in Date format
  • [SPARK-16462] - Spark 2.0 CSV does not cast null values to certain data types properly
  • [SPARK-16522] - [MESOS] Spark application throws exception on exit
  • [SPARK-16533] - Spark application not handling preemption messages
  • [SPARK-16550] - Caching data with replication doesn't replicate data
  • [SPARK-16558] - examples/mllib/LDAExample should use MLVector instead of MLlib Vector
  • [SPARK-16563] - Repeat calling Spark SQL thrift server fetchResults return empty for ExecuteStatement operation
  • [SPARK-16586] - spark-class crash with "[: too many arguments" instead of displaying the correct error message
  • [SPARK-16597] - DataFrame DateType is written as an int(Days since epoch) by csv writer
  • [SPARK-16610] - When writing ORC files, orc.compress should not be overridden if users do not set "compression" in the options
  • [SPARK-16613] - RDD.pipe returns values for empty partitions
  • [SPARK-16632] - Vectorized parquet reader fails to read certain fields from Hive tables
  • [SPARK-16633] - lag/lead using constant input values does not return the default value when the offset row does not exist
  • [SPARK-16634] - GenericArrayData can't be loaded in certain JVMs
  • [SPARK-16639] - query fails if having condition contains grouping column
  • [SPARK-16642] - ResolveWindowFrame should not be triggered on UnresolvedFunctions.
  • [SPARK-16644] - constraints propagation may fail the query
  • [SPARK-16646] - LEAST doesn't accept numeric arguments with different data types
  • [SPARK-16648] - LAST_VALUE(FALSE) OVER () throws IndexOutOfBoundsException
  • [SPARK-16656] - CreateTableAsSelectSuite is flaky
  • [SPARK-16664] - Spark 1.6.2 - Persist call on Data frames with more than 200 columns is wiping out the data.
  • [SPARK-16672] - SQLBuilder should not raise exceptions on EXISTS queries
  • [SPARK-16686] - Dataset.sample with seed: result seems to depend on downstream usage
  • [SPARK-16698] - json parsing regression - "." in keys
  • [SPARK-16699] - Fix performance bug in hash aggregate on long string keys
  • [SPARK-16700] - StructType doesn't accept Python dicts anymore
  • [SPARK-16703] - Extra space in WindowSpecDefinition SQL representation
  • [SPARK-16711] - YarnShuffleService doesn't re-init properly on YARN rolling upgrade
  • [SPARK-16714] - Fail to create a decimal arrays with literals having different inferred precessions and scales
  • [SPARK-16715] - Fix a potential ExprId conflict for SubexpressionEliminationSuite."Semantic equals and hash"
  • [SPARK-16721] - Lead/lag needs to respect nulls
  • [SPARK-16724] - Expose DefinedByConstructorParams
  • [SPARK-16729] - Spark should throw analysis exception for invalid casts to date type
  • [SPARK-16730] - Spark 2.0 breaks various Hive cast functions
  • [SPARK-16740] - joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
  • [SPARK-16748] - Errors thrown by UDFs cause TreeNodeException when the query has an ORDER BY clause
  • [SPARK-16750] - ML GaussianMixture training failed due to feature column type mistake
  • [SPARK-16751] - Upgrade derby to 10.12.1.1 from 10.11.1.1
  • [SPARK-16770] - Spark shell not usable with german keyboard due to JLine version
  • [SPARK-16781] - java launched by PySpark as gateway may not be the same java used in the spark environment
  • [SPARK-16785] - dapply doesn't return array or raw columns
  • [SPARK-16787] - SparkContext.addFile() should not fail if called twice with the same file
  • [SPARK-16791] - casting structs fails on Timestamp fields (interpreted mode only)
  • [SPARK-16802] - joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
  • [SPARK-16818] - Exchange reuse incorrectly reuses scans over different sets of partitions
  • [SPARK-16831] - PySpark CrossValidator reports incorrect avgMetrics
  • [SPARK-16836] - Hive date/time function error
  • [SPARK-16837] - TimeWindow incorrectly drops slideDuration in constructors
  • [SPARK-16850] - Improve error message for greatest/least
  • [SPARK-16873] - force spill NPE
  • [SPARK-16880] - Improve ANN training, add training data persist if needed
  • [SPARK-16883] - SQL decimal type is not properly cast to number when collecting SparkDataFrame
  • [SPARK-16901] - Hive settings in hive-site.xml may be overridden by Hive's default values
  • [SPARK-16905] - Support SQL DDL: MSCK REPAIR TABLE
  • [SPARK-16907] - Parquet table reading performance regression when vectorized record reader is not used
  • [SPARK-16922] - Query with Broadcast Hash join fails due to executor OOM in Spark 2.0
  • [SPARK-16925] - Spark tasks which cause JVM to exit with a zero exit code may cause app to hang in Standalone mode
  • [SPARK-16926] - Partition columns are present in columns metadata for partition but not table
  • [SPARK-16936] - Case Sensitivity Support for Refresh Temp Table
  • [SPARK-16942] - CREATE TABLE LIKE generates External table when source table is an External Hive Serde table
  • [SPARK-16943] - CREATE TABLE LIKE generates a non-empty table when source is a data source table
  • [SPARK-16950] - fromOffsets parameter in Kafka's Direct Streams does not work in python3
  • [SPARK-16953] - Make requestTotalExecutors public to be consistent with requestExecutors/killExecutors
  • [SPARK-16955] - Using ordinals in ORDER BY causes an analysis error when the query has a GROUP BY clause using ordinals
  • [SPARK-16959] - Table Comment in the CatalogTable returned from HiveMetastore is Always Empty
  • [SPARK-16961] - Utils.randomizeInPlace does not shuffle arrays uniformly
  • [SPARK-16966] - App Name is a randomUUID even when "spark.app.name" exists
  • [SPARK-16975] - Spark-2.0.0 unable to infer schema for parquet data written by Spark-1.6.2
  • [SPARK-16991] - Full outer join followed by inner join produces wrong results
  • [SPARK-16994] - Filter and limit are illegally permuted.
  • [SPARK-16995] - TreeNodeException when flat mapping RelationalGroupedDataset created from DataFrame containing a column created with lit/expr
  • [SPARK-17010] - [MINOR]Wrong description in memory management document
  • [SPARK-17013] - negative numeric literal parsing
  • [SPARK-17016] - group-by/order-by ordinal should throw AnalysisException instead of UnresolvedException
  • [SPARK-17022] - Potential deadlock in driver handling message
  • [SPARK-17027] - PolynomialExpansion.choose is prone to integer overflow
  • [SPARK-17038] - StreamingSource reports metrics for lastCompletedBatch instead of lastReceivedBatch
  • [SPARK-17051] - we should use hadoopConf in InsertIntoHiveTable
  • [SPARK-17056] - Fix a wrong assert in MemoryStore
  • [SPARK-17061] - Incorrect results returned following a join of two datasets and a map step where total number of columns >100
  • [SPARK-17065] - Improve the error message when encountering an incompatible DataSourceRegister
  • [SPARK-17066] - dateFormat should be used when writing dataframes as csv files
  • [SPARK-17086] - QuantileDiscretizer throws InvalidArgumentException (parameter splits given invalid value) on valid data
  • [SPARK-17093] - Roundtrip encoding of array<struct<>> fields is wrong when whole-stage codegen is disabled
  • [SPARK-17098] - "SELECT COUNT(NULL) OVER ()" throws UnsupportedOperationException during analysis
  • [SPARK-17099] - Incorrect result when HAVING clause is added to group by query
  • [SPARK-17100] - pyspark filter on a udf column after join gives java.lang.UnsupportedOperationException
  • [SPARK-17104] - LogicalRelation.newInstance should follow the semantics of MultiInstanceRelation
  • [SPARK-17110] - Pyspark with locality ANY throw java.io.StreamCorruptedException
  • [SPARK-17113] - Job failure due to Executor OOM in offheap mode
  • [SPARK-17114] - Adding a 'GROUP BY 1' where first column is literal results in wrong answer
  • [SPARK-17115] - Improve the performance of UnsafeProjection for wide table
  • [SPARK-17117] - 'SELECT 1 / NULL` throws AnalysisException, while 'SELECT 1 * NULL` works
  • [SPARK-17120] - Analyzer incorrectly optimizes plan to empty LocalRelation
  • [SPARK-17124] - RelationalGroupedDataset.agg should be order preserving and allow duplicate column names
  • [SPARK-17158] - Improve error message for numeric literal parsing
  • [SPARK-17160] - GetExternalRowField does not properly escape field names, causing generated code not to compile
  • [SPARK-17162] - Range does not support SQL generation
  • [SPARK-17167] - Issue Exceptions when Analyze Table on In-Memory Cataloged Tables
  • [SPARK-17180] - Unable to Alter the Temporary View Using ALTER VIEW command
  • [SPARK-17182] - CollectList and CollectSet should be marked as non-deterministic
  • [SPARK-17194] - When emitting SQL for string literals Spark should use single quotes, not double
  • [SPARK-17205] - Literal.sql does not properly convert NaN and Infinity literals
  • [SPARK-17210] - sparkr.zip is not distributed to executors when run sparkr in RStudio
  • [SPARK-17211] - Broadcast join produces incorrect results when compressed Oops differs between driver, executor
  • [SPARK-17216] - Even timeline for a stage doesn't core 100% of the bar timeline bar in chrome
  • [SPARK-17228] - Not infer/propagate non-deterministic constraints
  • [SPARK-17230] - Writing decimal to csv will result empty string if the decimal exceeds (20, 18)
  • [SPARK-17243] - Spark 2.0 history server summary page gets stuck at "loading history summary" with 10K+ application history
  • [SPARK-17244] - Joins should not pushdown non-deterministic conditions
  • [SPARK-17252] - Performing arithmetic in VALUES can lead to ClassCastException / MatchErrors during query parsing
  • [SPARK-17253] - Left join where ON clause does not reference the right table produces analysis error
  • [SPARK-17261] - Using HiveContext after re-creating SparkContext in Spark 2.0 throws "Java.lang.illegalStateException: Cannot call methods on a stopped sparkContext"
  • [SPARK-17264] - DataStreamWriter should document that it only supports Parquet for now
  • [SPARK-17296] - Spark SQL: cross join + two joins = BUG
  • [SPARK-17299] - TRIM/LTRIM/RTRIM strips characters other than spaces
  • [SPARK-17306] - QuantileSummaries doesn't compress
  • [SPARK-17309] - ALTER VIEW should throw exception if view not exist
  • [SPARK-17323] - ALTER VIEW AS should keep the previous table properties, comment, create_time, etc.
  • [SPARK-17335] - Creating Hive table from Spark data
  • [SPARK-17336] - Repeated calls sbin/spark-config.sh file Causes ${PYTHONPATH} Value duplicate
  • [SPARK-17339] - Fix SparkR tests on Windows
  • [SPARK-17342] - Style of event timeline is broken
  • [SPARK-17352] - Executor computing time can be negative-number because of calculation error
  • [SPARK-17353] - CREATE TABLE LIKE statements when Source is a VIEW
  • [SPARK-17354] - java.lang.ClassCastException: java.lang.Integer cannot be cast to java.sql.Date
  • [SPARK-17355] - Work around exception thrown by HiveResultSetMetaData.isSigned
  • [SPARK-17356] - A large Metadata filed in Alias can cause OOM when calling TreeNode.toJSON
  • [SPARK-17358] - Cached table(parquet/orc) should be shard between beelines
  • [SPARK-17364] - Can not query hive table starting with number
  • [SPARK-17369] - MetastoreRelation toJSON throws exception
  • [SPARK-17370] - Shuffle service files not invalidated when a slave is lost
  • [SPARK-17376] - Spark version should be available in R
  • [SPARK-17391] - Fix Two Test Failures After Backport
  • [SPARK-17396] - Threads number keep increasing when query on external CSV partitioned table
  • [SPARK-17418] - Spark release must NOT distribute Kinesis related assembly artifact
  • [SPARK-17438] - Master UI should show the correct core limit when `ApplicationInfo.executorLimit` is set
  • [SPARK-17439] - QuantilesSummaries returns the wrong result after compression
  • [SPARK-17442] - Additional arguments in write.df are not passed to data source
  • [SPARK-17463] - Serialization of accumulators in heartbeats is not thread-safe
  • [SPARK-17465] - Inappropriate memory management in `org.apache.spark.storage.MemoryStore` may lead to memory leak
  • [SPARK-17474] - Python UDF does not work between Sort and Limit
  • [SPARK-17491] - MemoryStore.putIteratorAsBytes() may silently lose values when KryoSerializer is used
  • [SPARK-17494] - Floor/ceil of decimal returns wrong result if it's in compact format
  • [SPARK-17502] - Multiple Bugs in DDL Statements on Temporary Views
  • [SPARK-17503] - Memory leak in Memory store when unable to cache the whole RDD in memory
  • [SPARK-17511] - Dynamic allocation race condition: Containers getting marked failed while releasing
  • [SPARK-17512] - Specifying remote files for Python based Spark jobs in Yarn cluster mode not working
  • [SPARK-17514] - df.take(1) and df.limit(1).collect() perform differently in Python
  • [SPARK-17515] - CollectLimit.execute() should perform per-partition limits
  • [SPARK-17521] - Error when I use sparkContext.makeRDD(Seq())
  • [SPARK-17525] - SparkContext.clearFiles() still present in the PySpark bindings though the underlying Scala method was removed in Spark 2.0
  • [SPARK-17531] - Don't initialize Hive Listeners for the Execution Client
  • [SPARK-17541] - fix some DDL bugs about table management when same-name temp view exists
  • [SPARK-17545] - Spark SQL Catalyst doesn't handle ISO 8601 date without colon in offset
  • [SPARK-17546] - start-* scripts should use hostname -f
  • [SPARK-17547] - Temporary shuffle data files may be leaked following exception in write
  • [SPARK-17548] - Word2VecModel.findSynonyms can spuriously reject the best match when invoked with a vector
  • [SPARK-17567] - Broken link to Spark paper
  • [SPARK-17571] - AssertOnQuery.condition should be consistent in requiring Boolean return type
  • [SPARK-17599] - Folder deletion after globbing may fail StructuredStreaming jobs
  • [SPARK-17613] - PartitioningAwareFileCatalog.allFiles doesn't handle URI specified path at parent
  • [SPARK-17616] - Getting "java.lang.RuntimeException: Distinct columns cannot exist in Aggregate "
  • [SPARK-17617] - Remainder(%) expression.eval returns incorrect result
  • [SPARK-17618] - Dataframe except returns incorrect results when combined with coalesce
  • [SPARK-17627] - Streaming Providers should be labeled Experimental
  • [SPARK-17641] - collect_set should ignore null values
  • [SPARK-17644] - The failed stage never resubmitted due to abort stage in another thread
  • [SPARK-17650] - Adding a malformed URL to sc.addJar and/or sc.addFile bricks Executors
  • [SPARK-17652] - Fix confusing exception message while reserving capacity
  • [SPARK-17666] - take() or isEmpty() on dataset leaks s3a connections
  • [SPARK-17672] - Spark 2.0 history server web Ui takes too long for a single application
  • [SPARK-17673] - Reused Exchange Aggregations Produce Incorrect Results
  • [SPARK-17752] - Spark returns incorrect result when 'collect()'ing a cached Dataset with many columns
  • [SPARK-17809] - scala.MatchError: BooleanType when casting a struct

New Feature

  • [SPARK-16956] - Make ApplicationState.MAX_NUM_RETRY configurable
  • [SPARK-17069] - Expose spark.range() as table-valued function in SQL
  • [SPARK-17150] - Support SQL generation for inline tables
  • [SPARK-17456] - Utility for parsing Spark versions

Improvement

  • [SPARK-2424] - ApplicationState.MAX_NUM_RETRY should be configurable
  • [SPARK-10835] - Word2Vec should accept non-null string array, in addition to existing null string array
  • [SPARK-12370] - Documentation should link to examples from its own release version
  • [SPARK-13286] - JDBC driver doesn't report full exception
  • [SPARK-15639] - Try to push down filter at RowGroups level for parquet reader
  • [SPARK-15703] - Make ListenerBus event queue size configurable
  • [SPARK-15923] - Spark Application rest api returns "no such app: <appId>"
  • [SPARK-16216] - CSV data source does not write date and timestamp correctly
  • [SPARK-16240] - model loading backward compatibility for ml.clustering.LDA
  • [SPARK-16320] - Document G1 heap region's effect on spark 2.0 vs 1.6
  • [SPARK-16324] - regexp_extract should doc that it returns empty string when match fails
  • [SPARK-16568] - update sql programing guide refreshTable API
  • [SPARK-16650] - Improve documentation of spark.task.maxFailures
  • [SPARK-16651] - Document no exception using DataFrame.withColumnRenamed when existing column doesn't exist
  • [SPARK-16663] - desc table should be consistent between data source and hive serde tables
  • [SPARK-16764] - Recommend disabling vectorized parquet reader on OutOfMemoryError
  • [SPARK-16772] - Correct API doc references to PySpark classes + formatting fixes
  • [SPARK-16796] - Visible passwords on Spark environment page
  • [SPARK-16805] - Log timezone when query result does not match
  • [SPARK-16812] - Open up SparkILoop.getAddedJars
  • [SPARK-16813] - Remove private[sql] and private[spark] from catalyst package
  • [SPARK-16865] - A file-based end-to-end SQL query suite
  • [SPARK-16870] - add "spark.sql.broadcastTimeout" into docs/sql-programming-guide.md to help people to how to fix this timeout error when it happenned
  • [SPARK-16875] - Add args checking for DataSet randomSplit and sample
  • [SPARK-16877] - Add a rule for preventing use Java's Override annotation
  • [SPARK-16932] - Programming-guide Accumulator section should be more clear w.r.t new API
  • [SPARK-16935] - Verification of Function-related ExternalCatalog APIs
  • [SPARK-16947] - Support type coercion and foldable expression for inline tables
  • [SPARK-16964] - Remove private[sql] and private[spark] from sql.execution package
  • [SPARK-17023] - Update Kafka connetor to use Kafka 0.10.0.1
  • [SPARK-17063] - MSCK REPAIR TABLE is super slow with Hive metastore
  • [SPARK-17084] - Rename ParserUtils.assert to validate
  • [SPARK-17186] - remove catalog table type INDEX
  • [SPARK-17193] - HadoopRDD NPE at DEBUG log level when getLocationInfo == null
  • [SPARK-17231] - Avoid building debug or trace log messages unless the respective log level is enabled
  • [SPARK-17246] - Support BigDecimal literal parsing
  • [SPARK-17279] - better error message for exceptions during ScalaUDF execution
  • [SPARK-17297] - Clarify window/slide duration as absolute time, not relative to a calendar
  • [SPARK-17301] - Remove unused classTag field from AtomicType base class
  • [SPARK-17316] - Don't block StandaloneSchedulerBackend.executorRemoved
  • [SPARK-17347] - Encoder in Dataset example has incorrect type
  • [SPARK-17378] - Upgrade snappy-java to 1.1.2.6
  • [SPARK-17421] - Document warnings about "MaxPermSize" parameter when building with Maven and Java 8
  • [SPARK-17445] - Reference an ASF page as the main place to find third-party packages
  • [SPARK-17480] - CompressibleColumnBuilder inefficiently call gatherCompressibilityStats
  • [SPARK-17483] - Minor refactoring and cleanup in BlockManager block status reporting and block removal
  • [SPARK-17484] - Race condition when cancelling a job during a cache write can lead to block fetch failures
  • [SPARK-17485] - Failed remote cached block reads can lead to whole job failure
  • [SPARK-17486] - Remove unused TaskMetricsUIData.updatedBlockStatuses field
  • [SPARK-17558] - Bump Hadoop 2.7 version from 2.7.2 to 2.7.3
  • [SPARK-17569] - Don't recheck existence of files when generating File Relation resolution in StructuredStreaming
  • [SPARK-17577] - SparkR support add files to Spark job and get by executors
  • [SPARK-17609] - SessionCatalog.tableExists should not check temp view
  • [SPARK-17638] - Stop JVM StreamingContext when the Python process is dead
  • [SPARK-17640] - Avoid using -1 as the default batchId for FileStreamSource.FileEntry
  • [SPARK-17649] - Log how many Spark events got dropped in LiveListenerBus
  • [SPARK-17651] - Automate Spark version update for documentations
  • [SPARK-18391] - Openstack deployment scenarios

Test

  • [SPARK-16690] - rename SQLTestUtils.withTempTable to withTempView
  • [SPARK-16722] - Fix a StreamingContext leak in StreamingContextSuite when eventually fails
  • [SPARK-17102] - bypass UserDefinedGenerator for json format check
  • [SPARK-17318] - Fix flaky test: o.a.s.repl.ReplSuite replicating blocks of object with class defined in repl
  • [SPARK-17326] - Tests with HiveContext in SparkR being skipped always
  • [SPARK-17473] - jdbc docker tests are failing with java.lang.AbstractMethodError:
  • [SPARK-17589] - Fix test case `create external table`

Question

Documentation

  • [SPARK-16295] - Extract SQL programming guide example snippets from source files instead of hard code them
  • [SPARK-16761] - Fix doc link in docs/ml-guide.md
  • [SPARK-16911] - Remove migrating to a Spark 1.x version in programming guide documentation
  • [SPARK-17085] - Documentation and actual code differs - Unsupported Operations
  • [SPARK-17089] - Remove link of api doc for mapReduceTriplets because its removed from api.
  • [SPARK-17242] - Update links of external dstream projects
  • [SPARK-17561] - DataFrameWriter documentation formatting problems
  • [SPARK-17575] - Make correction in configuration documentation table tags

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.