-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: 3626 add option enable license content; disable by default #3631
Conversation
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
95cf36a
to
b04732d
Compare
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
licenseScanner := licenses.NewScanner(licensecheck.Scan, float64(75)) | ||
sc := &licenses.ScannerConfig{ | ||
CoverageThreshold: 75, | ||
Scanner: licensecheck.Scan, | ||
} | ||
licenseScanner := licenses.NewScanner(sc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the scanner (and ScannerConfig) held the configuration for whether to include licenses or not? IdentifyLicenseIDs could just return nil if configured to exclude contents. This would avoid storing this configuration in the context, and the Search
function just only includes contents if returned. I don't know if returning no license and nil contents would be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two functions where we can make a decision on content are Search
and IdentifyLicenseIDs
IdentifyLicenseIDs could just return nil if configured to exclude contents
I don't know if returning no license and nil contents would be a problem.
In short IdentifyLicenseIDs
which is the method on scanner
needs to return content in its current form:
syft/internal/licenses/search.go
Lines 45 to 69 in 578a3d4
// IdentifyLicenseIDs can only return a list of ID or content | |
// These return values are mutually exclusive. | |
// If the scanner threshold for matching scores < 75% then we return the license full content | |
if len(ids) > 0 { | |
for _, id := range ids { | |
lic := pkg.NewLicenseFromLocations(id, reader.Location) | |
lic.Type = license.Concluded | |
licenses = append(licenses, lic) | |
} | |
} else if len(content) > 0 { | |
// harmonize line endings to unix compatible first: | |
// 1. \r\n => \n (Windows => UNIX) | |
// 2. \r => \n (Macintosh => UNIX) | |
content = []byte(strings.ReplaceAll(strings.ReplaceAll(string(content), "\r\n", "\n"), "\r", "\n")) | |
lic := pkg.NewLicenseFromLocations(unknownLicenseType, reader.Location) | |
lic.SPDXExpression = UnknownLicensePrefix + getCustomLicenseContentHash(content) | |
if includeUnknownLicenseContent { | |
lic.Contents = string(content) | |
} | |
lic.Type = license.Declared | |
licenses = append(licenses, lic) | |
} |
content
is used to form the unknown unique ID when matches don't occur.
We're still creating a license with the location for the user to go investigate, we're just not including that files contents in the final SBOM that did not match the license match threshold.
There was an initial attempt to include the config on the scanner
struct and not use context, but we need to return content
to Search
from IdentifyLicenseIDs
no matter what.
This is because we're still creating a license for unknown licenses with locations and an ID that takes the sum of the content, just not including the entire text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only hold up for me is using context to pass configuration in this way, since we're not using this same pattern elsewhere; everything else is 👍.
Outside of tests, IdentifyLicenseIDs
is only used by license.Search
, as far as I can tell, and since both of these are internal
we can modify the functions/interfaces as much as we want. I don't think we need to change the Search
signature at all (though I'm confused why it takes Context and Scanner; I was under the impression it was supposed to get the scanner out of the context or use a default, maybe the scanner is that default, but it's referenced without being checked for nil).
Why don't we just change the Scanner
interface to instead of IdentifyLicenseIDs
has IdentifyLicenses
which returns the ([]license.License, error)
, matching Search
, and this code just gets moved to the scanner, this allows the scanner to have all the appropriate configuration for what gets returned and do whatever the right thing is there.
* main: Add file catalogers to selection configuration (#3505) chore: replace all shorthand tags of mapstruct -> mapstructure (#3633) chore(deps): update tools to latest versions (#3637) chore(deps): update CPE dictionary index (#3638) feat: syft 3435 - add file components to cyclonedx bom output when file metadata is available (#3539) chore(deps): update tools to latest versions (#3635) chore(deps): bump github/codeql-action from 3.28.7 to 3.28.8 (#3634)
Signed-off-by: Christopher Phillips <[email protected]>
b495e75
to
db41a2e
Compare
Signed-off-by: Christopher Phillips <[email protected]>
@kzantow I've updated this PR so we're now passing configuration to the scanner constructor at the initialization of the SBOM process and not mixing its config in context here. The only ctx value we're using at the moment is to access the scanner. The Cover functionality is thread safe: I'm open to maybe using another PR in the future to do a refactor that moves licenses.Scanner out of a context lookup and onto the config object of each cataloger that invokes it, but that would require a larger refactor of: The problem being that we only want one Scanner because of the cost of initialization for all of the default licenses it knows about: syft/internal/licenses/scanner.go Lines 26 to 36 in 684b6e3
We only want It's currently very difficult with the way the task package injects configuration into the different catalogers. Take the go module catalog constructor as an example: https://github.com/anchore/syft/blob/684b6e3f9809a95ba08cb83c8d0661ab22eed8cc/internal/task/package_tasks.go#L82C3-L87 We have access to CatalogingFactoryConfig which is built by DefaultCatalogingFactoryConfig , but I'm not a fan of initializing the scanner on this struct and passing it around the catalogers since it's currently part of the public syft API. It's also called in multiple places and currently results in multiple scanners being initalized. |
Signed-off-by: Christopher Phillips <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
Signed-off-by: Christopher Phillips <[email protected]>
…suport-package-supplier * 'main' of https://github.com/anchore/syft: chore(deps): bump github/codeql-action from 3.28.8 to 3.28.9 (#3648) feat: 3626 add option enable license content; disable by default (#3631)
…hore#3631) --------- Signed-off-by: Christopher Phillips <[email protected]>
Description
With #3366 landing the other day we need a sensible way to toggle the new content option before the next syft release.
This PR adds a new top level syft configuration for licenses
The new syft configuration keeps the current behavior where license content is disabled by default.
Content for unknown licenses to be included in the SBOM can be enabled on request
When this lands I plan on updating the wiki under configuration with examples:
https://github.com/anchore/syft/wiki/configuration
How to Validate
I've added a new CLI image that has a jar with an unknown license.
Reviewers can scan this jar using syft on this branch to see how contents can be optionally toggled on:
Content Enabled
Type of change
Checklist: